연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin 2015.11.23

  • Published on
    19-Jan-2016

  • View
    217

  • Download
    3

Embed Size (px)

Transcript

Data Processing Systems for Solid State DriveYonsei UniversityMincheol Shin2015.11.23Yonsei University1OverviewMain Target : Data Processing Systems with SSDPurpose : Improving I/O Performance

Data Processing SystemRelational Database Management Systeme.g. Oracle, MySQL, PostgreSQL, SQLite

Distributed Data Processing Systeme.g. Hadoop Distributed File System, MapReduce, Hive, Hbase, Tajo, Spark

Key-value Storee.g. Redis

2OutlineSolid State Drive (SSD)RDBMS on Solid State DriveBig Data Processing for Solid State Drive

Solid State Drive: Flash Memory [VLDB2011Tut2]Great Performance !!High I/O Performance: 41 MB/s Read, 7.5 MB/s Program [Micron 2014]Fast Random Access: Under 0.1 ms (HDD: 2.9 to 12 ms)Low Energy Consumption

Four Constraints of NAND Flash MemoryC1: Program granularity (2KB~16KB)C2: Must erase a block before updating a page (256KB ~ 1MB)C3: Pages must be programmed sequentially within a blockC4: Limited lifetime (104 ~ 105)4k Page4k PageA Erase Block (1 MB)[VLDB2011Tut2] P. Bonnet, L. Bouganim, I. Koltsidas, S. D. Viglas, VLDB 2011 Tutorial: System Co-Design and Data management for Flash Devices4Solid State DriveSolid State Drive (SSD)Definition: Persistent data storage without disks nor a drive motor.Support Traditional Block I/O

Characteristics for SSDFast Random Access (inherited from flash memory)Read/Write Imbalance (inherited from flash memory)Exploiting Internal Parallelism (SSD internal structure)In-Storage Processing

SSDHostI/F(SATA, SAS, PCIE)Read(addr)Write(addr, data)Internal Algorithm (FTL)MappingWear levelingGarbage CollectionPhysical StorageFlash ChipsFlash ChipsFlash ChipsFlash ChipsFlash ChipsFlash ChipsReadProgramEraseSolid State Drive: Flash Translation Layer (FTL)Flash Translation LayerConvert the block I/O operations to internal operations

Three Major Components MappingMap Logical Block Address(LBA) to physical page

Garbage Collection

Wear LevelingTo extend lifetime of SSD

LogicalPhysicalBlock 1Block 2Block 3Block 4UpdatevvvvIIvIvvBlock 2Block 3Block 4vvvvIIIIvvvBlock 2Block 3Block 4EraseSolid State Drive: Internal ParallelismSSD can read/write the data in parallelSSDHostI/F(SATA, SAS, PCIE)Flash PackageFlash PackageFlash PackageFlash PackageFlash PackageFlash PackageFlash PackageFlash PackageChannel-level Parallelism(N Parallel Channels)Package-level parallelism(Interleaving)MemoryTimeRead 1Transfer 1Read 3Transfer 3Read 5Transfer 5Read 7Transfer 7Read 2Transfer 2Read 4Transfer 4Read 6Transfer 6Read 8Transfer 8Package 1 (Ch. 1)Package 2 (Ch. 1)Package 3 (Ch. 2)Package 4 (Ch. 2)Channel 1Channel 2Data 2Data 4Data 6Data 8Data 1Data 3Data 5Data 7Solid State Drive: Internal ParallelismUsing internal parallelism, SSD achieves High performance for sequential I/OSimilar to Striping (RAID 0)Seq. bw for SATA SSDWrite : 450 MB/sRead : 500 MB/s

High performance for concurrent I/O

[VLDB2012Roh] H. Roh, S. Park, S. Kim, M. Shin, S-W. Lee,B+-tree index optimization by exploiting internal parallelism of flash-based Solid State Drives8Solid State Drive: In-Storage ProcessingSSD has CPU and Memory for FTL

Host Interface is bottleneck !H/I has lower bandwidth than internal bandwidth of SSD

Two approachesLight-weight filter in SSDTransfer smaller data through H/FFilter tuples using predicatesSub-modules in SSDe.g. Transaction management with COW

Need special SSD to implement ISPOpenSSD, SmartSSD and so or

DBMS on Solid State DriveMain research areas:Buffer ManagementIndex ManagementQuery ProcessingTransaction Management

Most of researches using SSDs focused on storage I/O

DBMS on Solid State Drive: Index ManagementFD-treeExploit sequential bandwidths of SSDsB-Tree + sorted runs

PIO B-treeExploit internal parallelism ofSSDsAccess to multiple B-tree nodealong multiple paths

DBMS on Solid State Drive: Query ProcessingFlashJoin: PAX based query processing NSM layoutMost typical page layoutTuples are stored in a contiguousregionPAX layoutValues of columns are storedin contiguous region (minipage)Originally, PAX is designed for reducing cache miss in CPU cacheFlashScan reads only needed minipagesFlashJoin joins minipages read by flashScan

DBMS on Solid State Drive: Query ProcessingFMSortExploit internal parallelism of SSDDuring merge phase,

DBMS on Solid State Drive: Transaction Mgmt.X-FTL: Shadow Paging in SSDWriting operations of SSD is similar to Copy-on-writeWhen a page is updated, the modified page is written to an empty page.And then, invalidate old pageX-FTL maintains old pages until transaction is committed.There is no copying the original pages

Big Data on Solid State Drive3 approaches to improve performance using SSDsComplete replacementHigher cost per capacitySelective replacemente.g. intermediate results on SSDs, HDFS data on HDDsSSD as a cacheCommercial/Noncommercial cache SW existOpen source : bcache, flashcache, enhanced IO, DM-cache Project with SK Telecom

Archival Storage of HDFSStore replica into 4 tiers of storageARHIVE : slowest and biggest capacity storage (petabyte of storage)DISK, SSD, RAM_DISKhttps://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISKIssuesIndustry leads Big Data processing platform areaThere is no standard modelBecause CPU overhead are too high

Recommended

View more >