Суперкомпьютеры сегодня и завтра архитектура, проблемы, перспективы (Андрей Слепухин)

  • Published on
    18-Nov-2014

  • View
    2.222

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Transcript

  • 1. :, , , - andrey.slepuhin@t-platforms.ru

2. AT A GLANCE 3. 4. Linpack Top500 Linpack: Ax=B : 2/3n3+2n2 Top500: 2 , 2003 1997: 1Tflops, ASCI Red 2008: 1Pflops, IBM Roadrunner 2018: 1Eflops??? 5. #1: LLNL Sequoia (IBM) 20.1/16.3Pflops 96 98304 7.9MW #2: K Computer (Fujitsu) 11.3/10.5Pflops 918 88128 12.7MW 6. Top500 20 (~70Gflops) (~1Gflops) 7. ? (/) (// ..) ( ) 8. vs. HighLoad HighLoad , .. / / Ethernet TCP/IP 9. Cray Y-MP, Cray C90/T90, NEC SX-* SGI Altix, SGI UltraViolet x86- Intel Paragon, ASCI Red, IBM BlueGene, Cray XE6 QPACE, MD-GRAPE, ANTON 10. - - - - () / flops/ , , , GHzGflops W ,GB/secIntelXeon E5-8/16 8 2.0-2.9 128-185.6 95-13551.2 22nm 26xxAMDOpteron16/16 4 2.1-2.4 134.4-153.611551.2 32nm 63xxIBMA2 (BG/Q) 16+2/648 1.6 204.855 42.4 45nmIBMPOWER7+ 8/32 8 ~5 ~320 ??10032nmFujitsuSPARC6416/32 81.85 236.611510232nm IXfxFujitsuSPARC64 X16/32 8 ~3 382??10228nmJingnn ShenWei16/16 4 1.8 140.8 ?? 6865nmLab. SW1600 11. GPGPU MIC Custom ASIC FPGA (GPGPU/MIC) >1 Tflops (DP) 225-300W : CUDA (Nvidia), OpenCL, OpenACC, compiler offload extensions (Intel) Pros: / Cons: 12. Max Bandwidth: 14Gbps/lane (FDRInfiniBand) Min Latency: 300ns (BG/Q), 700ns (InfiniBand) Max Message rate: >108 (BlueGene/Q, Cray Aries) RDMA Lossless transmission 13. Broadcast Reduce Read-Modify-Write , InfiniBand : /, Deadlock-free 14. Fat Tree (Clos Network) InfiniBand Pros: Deadlock-free Cons: - : 15. N- / Pros: ( ) Cons: : K Computer (2011) 6D BlueGene/Q 5D 100000 (BlueGene/Q) 16. N- N- 2 Pros: Cons: 17. Dragonfly exascale- : IBM PERCS Cray Aries (2013) Pros: all-to-all Cons: 18. InfinibandCray Gemini 56Gbps 64Gbps Fat Tree, 3D-torus, hypercube 3D IBM BlueGene/QCray Aries (2013) 20Gbps 30-40Gbps 5D DragonflyIBM PERCS Extoll (2013) 60-100Gbps 120Gbps Dragonfly 3D Tofu (K Computer) SGI NUMAlink 6 40Gbps >50Gbps 6D - 19. 1MW $1M PUE: PUE 1.5 PUE 1.05-1.08 PUE ! , : , .. () 20. 90% Top500 Linux OS jitter - Linux HPC CNL (Compute Node Linux): low-jitter Linux Cray ZeptoOS: Linux BlueGene Kitten: , Linux- CNK BlueGene/Q Linux 21. Lustre Open Source Top500 Intel IBM GPFS Panasas PanFS - software RAID HPC , (HDF5, ) , 22. Lustre Colibri (Xyratex, Peter Braam Lustre) , BigData (distributed key-value storage, etc.) PCIe flash, (random access, , ) Exascale I/O Workgroup (EIOW) , Exascale http://www.eiow.org 23. MPI (Message Passing Interface) 90% MPI 5 MPI 3.0 ( 21 2012) , Fortran 2008 PGAS (Partitioned Global Address Space) GASnet UPC, CAF PGAS , 2-3 24. OpenMP GPU CUDA, OpenCL, OpenACC, OpenHMPP 25. EXASCALE CHALLENGE: 26. Exascale Challenge 2012 (BG/Q) 2018-2020 (Exascale) 2012System peak 20 Pflops1 EflopsO(102)Power8.6 MW ~20 MWSystem memory1.6 PB 32-64 PB O(10)Node performance 205 Gflops1.2 or 15 Tflops O(10)-O(102)Node memory BW42.6 GB/s 2-4 TB/s O(103)Node concurrency 64 threadsO(103) or O(104) O(102)-O(103)Node interconnect BW 20 GB/s200-400 GB/s O(10)System size (nodes)98304 O(105) or O(106) O(10)-O(102)Total concurrency5.97 MO(109)O(103)MTTI 4 days O(Memory, O(GB)100 cycles Latency Gap : NAND flashDisk, O(TB) ~2015 -2016: PCM (Phase-Change Memory), 10000 cycles ~2018-2020: STT-MRAM DRAM 2020: (, Redox Memory) 39. 10Gbps=>14Gbps=>25Gbps=>40Gbps=> 10Gbps FR4 ~30 10Gbps (, Megtron 6) 5 : 10Gbps=>7, 14Gbps=>4, 25Gbps=>~2 5 , 10-100 40. ( ) ( ): 2015-2016 :2017-2018 , : 2018 41. Big data is coming! Graph500: data-oriented , data mining : FD+mechanics+chemistry + => (SMPSs, ETI SWARM, etc.) 42. 1990- , , (BigData, Cloud Computing, Web, etc.) 43. Exascale- ?... ( ) 256 100-200KW A 512K / , 32 @2GHz, 16 flops/cycle 5D/6D , ~150-200Gbps/link B 64K /256 4 Tflops/CPU Dragonfly, 48-64 port high-radix router, ~10TB/sec switching capacity C 32K /128 GPU 8 Tflops/CPU Dragonfly, 48-64 port high-radix router, ~10TB/sec switching capacity 44. ?

Recommended

View more >