RDMA

  • Published on
    02-Jun-2015

  • View
    5.226

  • Download
    3

DESCRIPTION

2010/11/17 InfiniBand Day 02 RDMA VIOPSHP

Transcript

1. InfiniBand & Manycore Day RDMA 2. 00 01 InfiniBand & Manycore Day 02 InfiniBand 03 RDMA 04 RDMA1SDP 05 InfiniBand1vSMP 06 07 Copyright 2010 NTT DATA CORPORATION 1 3. 00 ( ) NTT IT NTT BizXaaS (http://bizxaas.net/) OSS http://www.nttdata.co.jp/release/2010/040801.htmlOpenStackOpen Cloud CampusOpenStack(JOSUG)JEUGVIOPS InterCloud SIGGICTFCopyright 2010 NTT DATA CORPORATION2 4. 00 DisclaimerInfiniBand Then, why me? InfiniBand InfiniBand HCA Linux SDP DAPL (Direct Access Provider Library) RDMAICSC Socket Extension Copyright 2010 NTT DATA CORPORATION 3 5. 01 InfiniBand & Manycore Day InfiniBand 1 Manycore RDMAInfiniBand RDMA RDMA InfiniBand SDP, vSMPCopyright 2010 NTT DATA CORPORATION4 6. 02 InfiniBandSpecification 1.2.1Errata http://members.infinibandta.org/kwspub/spec/ Volume 1 : 1727pp Volume 2 : 834pp Volume 1+2 2500page L1L4) Copyright 2010 NTT DATA CORPORATION 5 7. 02 InfiniBand Volume1Chapter 3 Architectural Overview 55page Addressing LID Local Identifier 16bit GID Global Identifier 128bit GID IPv6 InfiniBand Copyright 2010 NTT DATA CORPORATION6 8. 02 InfiniBand RC, RD, UC, UD RDMA : Remote DMA Atomic Operation CompareAndSwap Congestion Control Slow Drain Copyright 2010 NTT DATA CORPORATION7 9. 03 RDMAInfiniBand HCA IB spec. 1.2.1 Vol1. p96 Copyright 2010 NTT DATA CORPORATION 8 10. 03 RDMAHCANICWR Work RequestQP Queue Pair CQ Completion QueueQPWR WRCQEPD Protection DomainMR Memory RegionHCAMW Memory WindowMRCopyright 2010 NTT DATA CORPORATION 9 11. 02 InfiniBandInfiniBand1. HCAPDCQ 2. QP 3. RC 4. HCA2/34 5. WRQPSend/RecvRDMARDMASend/Recv 6. CQCopyright 2010 NTT DATA CORPORATION10 12. 03 RDMA I/OI/O DMAI/O RDMA SMPCopyright 2010 NTT DATA CORPORATION11 13. 03 RDMA RDMA (user/kernel space) CPU Remote Direct Memory AccessRDMACopyright 2010 NTT DATA CORPORATION 12 14. 03 RDMA1. socket 2. RDMA(ULT) Copyright 2010 NTT DATA CORPORATION 13 15. 03 RDMASocket Send/Recv Main MemoryHCAI/OI/O BusBridgeKernel Buffer User Buffer System Bus CPUca CPU #1cheCopyright 2010 NTT DATA CORPORATION14 16. 03 RDMARDMA(ULT)RDMA Write Main MemoryHCAI/OI/O BusBridgeKernel Buffer User Buffer System Bus CPU ca CPU #1cheCopyright 2010 NTT DATA CORPORATION15 17. 03 RDMA1. socket -> I/O -> I/O -> -> (kernel buffer) -> (user buffer)2. RDMA(ULT) -> I/O -> I/O -> -> (user buffer) RDMA SMPCopyright 2010 NTT DATA CORPORATION16 18. 03 RDMAI/O DMA READ/WRITE RDMA Read/WriteRDMA WriteHCADMA DMA READ RDMA WriteHCADMA DMA WRITE READWRITECopyright 2010 NTT DATA CORPORATION 17 19. 04 RDMA1SDPRDMARDMAUCB FastSockets [1] [1] Steven H.Rodrigues, Thomas E.Anderson, and David E.Culler. High-performancelocal area communication with fast sockets. USENIX, 1997.Copyright 2010 NTT DATA CORPORATION 18 20. 04 RDMA1SDPSDP TCPSocket Direct Protocol byte-stream Recv PostingSend/Recv RDMA socketLD_PRELOAD libsdp.so Copyright 2010 NTT DATA CORPORATION19 21. 04 RDMA1SDP 9 SDR7.6Gbps SDPTCPRC 0-copy RCCopyright 2010 NTT DATA CORPORATION20 22. 05 InfiniBand1vSMPScaleMP IBSMPvSMP 8core16GB 432core64GB SSI Virtual SMP Node#1 Node#2 2QC/16GB 2QC/16GB 2port InfiniBand HCA HCAHCA HCA 2QC/16GB 2QC/16GB Node#3 Node#4Copyright 2010 NTT DATA CORPORATION 21 23. 05 InfiniBand1vSMPvSMP N IBN Subnet manager Subnet manager SSI Subnet Manager SSI vSMP OS SSICopyright 2010 NTT DATA CORPORATION 22 24. 05 InfiniBand1vSMP spinlock spinlock CASInfiniBand Atomic Operation I/OI/O etc.ScaleMP Copyright 2010 NTT DATA CORPORATION 23 25. 06 InfiniBand & Manycore Day InfiniBandRDMA NFS/RDMA Oracle RAC RDS : Reliable Datagram Socket DAPLRDMA RDMASocket APICopyright 2010 NTT DATA CORPORATION 26. 07 RDMA RDMA Copyright 2010 NTT DATA CORPORATION 27. Copyright 2010 NTT DATA CORPORATION26 28. TM 29. Copyright 2010 NTT DATA CORPORATION 28 30. ReferencesInfiniBandhttp://www.infinibandta.org/http://members.infinibandta.org/kwspub/spec/OpenFabricshttp://www.openfabrics.org/InterConnect Software Consortiumhttp://www.opengroup.org/icsc/Copyright 2010 NTT DATA CORPORATION29 31. 0x Copyright 2010 NTT DATA CORPORATION 32. 0x Copyright 2010 NTT DATA CORPORATION 33. Interconnect Ethernet (1973-)Token RingFDDIFiber ChannelSONET/SDHHIPPI, SP-Switch(IBM) , AP-net(Fujitsu), Memory-Channel(DEC), etc.Myrinet, SCI, Giganet,... (academic NW)Virtual Interface Architecture (1997)NGIO + Future I/O = InfiniBand (2000-) 3GIO => PCI-Express (2002)Quadrics, PathScale 34. Interconnect 35. NNInfiniBandInterconnect NNInfiniBand NNRDMA NNRDMANFS/RDMA (RFC5532) NNRDMASRP NNRD : Reliable Datagram NNEnd-to-end latency 1us NNNWAPM NNSDP NNvSMP Virtual Iron (OLS2005) NN socket Copyright 2010 NTT DATA CORPORATION 34 36. E2E latency 1us CPU busy poll poll blocking wait 37. APMRC Failover Standby HCA HCA ModifyQP SW 38. 01 Copyright 2010 NTT DATA CORPORATION 37 39. 05 IaaSEucalyptus IaaSPCAWS S3 APIAWS EC2 API NATF/W SC : Storage CLC : CloudCC : ClusterWalrusControllerController Controller VM VM VM VMVMEBSEBSVMImageNC : NodeNC : NodeControllerController Copyright 2010 NTT DATA CORPORATION 38 40. 05 EucalyptusEucalyptusWalrus CLCS3 SC SCCCCCEBS EBS NCNCNC NCVM VM VM VM VM VMVM VM VM VM VM VM VM VMVM VM #1 #2Copyright 2010 NTT DATA CORPORATION39 41. 05 EucalyptusEucalyptus Cloud Controller CLC eucalyptus-cloud Walrus WalrusS3 StorageSCEBS Controller Cluster Controller CCNATVLAN CCDHCP Node ControllerNCVM NCNCopyright 2010 NTT DATA CORPORATION40 42. TIPS : OSS NASA Nebula OpenNebula NASA NebulaNASAOpenNebulaEUJOSUG (Japan OpenStack Users Group)OpenStack 2nd Bexar /br/ Eucalyptus 2.0 Eucalyptus Enterprise Edition 2.0 Copyright 2010 NTT DATA CORPORATION41 43. References: IaaSOpenStack http://www.openstack.org/Eucalyptushttp://www.eucalyptus.com/OpenNebulahttp://www.opennebula.org/Nimbushttp://www.nimbusproject.org/Wakame-vdchttp://wakame.axsh.jp/vdc.htmlKaresansuihttp://karesansui.sourceforge.jp/CloudStackhttp://cloud.com/communityMorph http://www.mor.ph/ja/Enomaly http://www.enomaly.com/Nimbula http://www.nimbula.com/Copyright 2010 NTT DATA CORPORATION 42 44. References: IaaSNASA Nebula http://nebula.nasa.gov/NII edubase http://grace-center.jp/prj_educloud.htmlNIINASA Nebula NII edubase http://www.nii.ac.jp/index.php?action=pages_view_main&page_id=1106WIDEhttp://www.wide.ad.jp/project/wg/wide-cloud-j.htmlCopyright 2010 NTT DATA CORPORATION43 45. References: SheepDoghttp://www.osrg.net/sheepdog/Ceph/RADOShttp://ceph.newdream.net/Vastsky http://sourceforge.net/projects/vastsky/ etc.NWVyattahttp://www.vyatta.com/http://www.vyatta-users.jp/Open vSwitchhttp://openvswitch.org/CloudSwitch http://www.cloudswitch.com/etc.Copyright 2010 NTT DATA CORPORATION44 46. References: PaaSHeroku http://heroku.com/Ruby on RailsPaaS Heroku by @nabehiro_ FluxFlex http://www.fluxflex.com/JAWS-UGLTAppScale http://appscale.cs.ucsb.edu/OSSGAE etc. H21 http://www.idg.co.jp/expo/cns/ Copyright 2010 NTT DATA CORPORATION 45 47. References: OpenStack http://openstack.org/Eucalyptushttp://eucalyptus-users.jp/JAWSUGhttp://jaws-ug.jp/JAZUG http://jazug.jp/Copyright 2010 NTT DATA CORPORATION 46 48. 0x Copyright 2010 NTT DATA CORPORATION 49. OpenStackOpenStack OpenStack Yet Another Eucalyptus NASAIaaS Nova(EC2) RackSpaceNova(EC2)CloudFiles Swift (S3) (S3) Python Open NO Enterprise EditionURLhttp ://www.openstack.org/http://launchpad.net/openstack/ 50. 02 1Eucalyptus Eucalytpus1.6.2Eucalyputs 2OpenStackEucalyptus 1.6.2 3CloudStack 4 Copyright 2010 NTT DATA CORPORATION 49 51. Place holderCopyright 2010 NTT DATA CORPORATION 50 52. Sheepdog Hadoop HDFSqemu block device Corosync 20107qemu (^o^)vCopyright 2010 NTT DATA CORPORATION 51 53. 0x 2006/8 Cloud Computing2006/8 Amazon EC2 Beta2006/x RightScale2008/4 Google App Engine2008/10 Windows Azure2009/4 Google App Engine2008/5 Eucalyptus 1.0 2009/3 AppScale 1.0 2010/2 Eucalyptus 1.6.2 2010/1 Windows Azure 2010/8 Eucalyptus 2.0 2010/7/19 OpenStack2010/10/21 OpenStack 1st (Austin Release)VMware vCloud (vExpress)(Vmware, EMC, Cisco)Oracle Nifty Cloud API 10/XCP20084GAE20092GAE20094JavaCopyright 2010 NTT DATA CORPORATION 54. 04 RDMA1NFS/RDMANFSVA Linux Zerocopy NFS POSIX copy-in/copy-out Registerread() -> NFS RDMA Writewrite() -> NFS RDMA ReadCopyright 2010 NTT DATA CORPORATION53