Введение в параллельные вычисления. Основы программирования на языке СИ с использованием интерфейса МРI

  • Published on
    08-Dec-2016

  • View
    228

  • Download
    8

Embed Size (px)

Transcript

  • . ..

    .. , .. , .. , ..

    .

    MPI

    2009

    681.3.06

    .., .., .., -

    .. .

    -

    MPI. .: , 2009. 123 .

    -

    .

    . -

    -

    MPI. - -

    Linux. - .

    ,

    , -

    .

    : ... . . ..-.. . . .

    .

    ,

    ISBN 978-5-91450-031-0

  • 3

    1. ................................................................................................ 4 1.1. .................................................................................... 4 1.2. ....... 7 1.3. ............................................................................................... 9 1.4. - ........................................... 11 1.5. (SMP NUMA) ............................................................................... 13 1.6. (MPP)........................ 16 1.7. ............................................................... 19 2. ............................................................. 22 2.1. ............... 22 2.2. 24 2.3. ......................................... 25 2.4. ....... 31 2.5. ....................................... 32 3. ............................................... 42 3.1. GNU/Linux....................................... 43 3.2. .............................................. 50 4. MPI.............................................. 61 4.1. MPI ............................................................... 63 4.2. / ......................... 68 4.3. / ....................... 75 4.4. / ...... 85 4.5. ...................................................... 90 4.6. ................................................................... 92 4.7. ........................................................... 99 4.8. .......................... 104 4.9. ......................................................................... 106 ................................................................................... 123

    4

    1.

    1.1. -

    () .

    ,

    -

    .

    , -

    ,

    , .. , . -

    , , -

    , , -

    .

    , (. supercomputer) (George Michael) (Sidney Fernbach), -

    . . . (. Lawrence Livermore National Laboratory, LLNL) .

    -

    (. The University of California, UC) - - (. Los Alamos National Laboratory, LANL) - - -

    , .. ,

    .

    ,

  • 5

    ,

    , ( ).

    , IBM Blue Gene/L 2004-2008 .

    -

    (Seymour Cray), Cray-1, Cray-2 . -

    , ,

    ,

    .

    ,

    ,

    , (Steve Chen), - -

    Cray X-MP, - .

    Cray Inc., - .

    -

    , -

    . -

    , IBM Hewlett-Packard.

    - ,

    , -

    -. , 1989 -

    VAX (Gordon Bell) - ,

    .

    - ,

    6

    , .. -

    , .

    -

    , - , -

    .

    .

    -

    . , -

    ( , ..), (, - ..), . - ,

    ; - .

    -

    . , -

    , -

    ( ) . -

    , . -

    ABC, 1942 (. Iowa State University of Science and Technology, ISU) 30 , 2008 IBM Roadrunner - 1 - (1015) .

    , 65 - 30 . ,

  • 7

    .

    , -

    . ,

    -

    .

    .

    . ,

    , -

    , .

    -

    :

    ; , -

    ;

    .

    1.2.

    - .

    , , , .

    -

    . , - , . -

    60- .

    8

    .

    :

    SISD (. Single Instruction Single Data) . -

    .

    .

    SIMD (Single Instruction Multiple Data) . - ,

    , ( ) . - -

    , - .

    MISD (Multiple Instruction Single Date) . .

    -

    , . - -

    - ,

    -.

    MIMD (Multiple Instruction Multiple Date) - .

    -

    .

    ,

    - . ,

    -

    .

    , . ,

  • 9

    . , , -

    .

    ,

    -

    . -

    . , -

    , -

    , .

    1.3.

    .

    , .

    -

    () .

    1976 - (. University of Illinois at Urbana-Champaign, UIUC) ILLIAC IV, 64 ( 256) - (), , . -

    .

    -

    .

    -

    -

    , MPP- (. Massive Parallel

    10

    Processing, MPP). , , MPP- , ( ).

    , -

    . , ICL DAP (. Distributed Array Processor), 8192 , , -

    ( , ). -

    -

    .

    . -

    . ,

    2 16 , - () . -

    (. Symmetric Multiprocessing, SMP).

    -

    -

    -. , -

    SMP - - .

    -

    ,

    .

    -

    ,

    -

    ,

    (. Message Passing). -

  • 11

    -

    MPP-. ( ) .

    SMP MPP NUMA- (. Non-Uniform Memory Access), , -.

    . NUMA- Cray T3D, 1993 , ,

    .

    -

    :

    - ; SMP- NUMA-; MPP-; .

    1.4. - - Cray-1

    1976 . , . ,

    :

    ; ,

    -

    .

    - , , 128 256 . ,

    .

    , -

    . -

    12

    , -

    . ,

    -

    . - -

    ,

    . , - ,

    .

    ,

    .

    , - ( ) (SMP), , -

    , NUMA- MPP-. -

    Cray J90 (1994 ), Cray T90 (1995 ), NEC SX-4 (1995 ), Cray SV1 (1998 ), NEC SX-5 (1999 ).

    ,

    . , 90- ,

    ,

    .

    - . -

    2002 . NEC Earth Simulator (ES) 5120 - , 5 , MPP- ASCI White (2000 ), 8192 . - -

    . NEC ES - 2004 IBM Blue Gene/L MPP-.

  • 13

    1.5. (SMP NUMA)

    , - . SMP-

    ,

    . -

    , -

    .

    -

    .

    ,

    . -

    32 .

    SMP , -, - ,

    ,

    .

    . 1. SMP-

    CPU

    CPU

    CPU

    14

    -

    , -

    . - , .

    , , . ,

    ,

    .

    ,

    (-), , -

    .

    -

    - -

    1999 Pentium III. - :

    L1 ( ) 32 , 9976 /;

    L2 ( ) 256 , 4446 /.

    -

    255 /. , - -

    40 . , 10 / ( CPU) Intel Core i7 2009 .

    ,

    .

    ,

    . , -

    -,

    . , - - ,

  • 15

    . , -

    , -

    - ,

    , - .

    ccNUMA (. Cache coherent Non-Uniform Memory Access). , -

    .

    ,

    .

    . 2. NUMA- -

    , , ,

    . SMP- :

    - ; ; .

    CPU

    CPU

    CPU

    CPU

    16

    ,

    .

    ccNUMA .

    , , 256 200 . . -

    -

    2 128 -

    . -

    -

    AMD Opteron Intel Itanium 2, , HP Integrity Superdome (2002 ).

    2008 ccNUMA - Nehalem, Core 2, - Intel Core i7.

    -

    , ,

    . , -

    .

    1.6. (MPP) , -

    ,

    .

    - , -

    .

    -

    , ,

    , /, ..

  • 17

    .

    , , -

    , .

    . 3. MPP-

    . -

    .

    - ,

    -.

    , . -

    MPP- -,

    .

    MPP- . -

    .

    , , . MPP-

    CPU

    CPU

    CPU

    18

    . -

    .

    , N , - N 1 , -. MPP- - . Intel Paragon (1992 ) .

    . Cray T3D (1993 ) Cray T3E (1995 ) . , -

    .

    nCUBE - n- .

    -

    . ,

    , , - . -

    , - -

    ,

    . -

    ,

    IBM SP2 (1994 ). -

    , .

    ,

    - .

    , SMP-, . MPP- :

    - ; ; .

  • 19

    -

    , .

    , -

    . . ,

    ,

    .

    , SMP-.

    1.7.

    , MPP. - MPP- ,

    : .

    , ,

    , MPI, -

    , -

    .

    ,

    .

    , -

    - , -

    -.

    -

    , -

    . -

    1998 - (. Pennsylvania State University) COCOA (. The Cost Effective

    20

    Computing Array), 25 - 100 . . , -

    48- Cray T3D . . .

    , -

    , -

    .

    : ( ) ( ). Cray T3D 1 480 / . COCOA, -

    Fast Ethernet, - 100 10 / . ,

    .

    -

    . , InfiniBand 4X SDR, ,

    200 8 / ( MPI 1 800 / ).

    ,

    , .

    . -,

    - ,

    , - .

    , -

    . -, -

    .

  • 21

    ,

    SMP-.

    . -

    . - UNIX- , (Solaris, Tru64 Unix), (GNU/Linux, FreeBSD). , , -

    .

    : 1. -

    ,

    , , .

    - .

    2. , -

    . -

    -

    .

    . - -

    InfiniBand. - ,

    2,5 / 96 /, .

    ScaLAPACK, - -

    ,

    ,

    : , /,

    22

    1/10 , . ,

    2006 Intel Core 2 Duo - 15 , InfiniBand ,

    Gigabit Ethernet . , InfiniBand -, -

    , -

    .

    2.

    2.1. SMP-

    , NUMA- - - . -

    : , ,

    -. -

    . UNIX- , - , -

    .

    . -

    . ,

    . ,

    , -

    .

    (. threads) , ,

  • 23

    . -

    , Fortran - ,

    ( ).

    -

    PThreads (POSIX Threads) - -

    . ,

    SMP- (Sun, HP, SGI) - -

    . , -, , -, . -

    -

    .

    , -

    Sun Solaris GNU/Linux, - OpenMP (. Open Multi-Processing), .

    OpenMP - . OpenMP - ,

    . ,

    , - , -

    OpenMP-. - OpenMP -

    .

    -

    .

    , ,

    -

    24

    . , OpenMP- -

    , .

    OpenMP . , -

    , -

    . - SMP- , MPI PVM. .

    2.2.

    -

    ,

    . , .

    .

    -

    . -

    , .

    -

    , -

    . 1995 . -

    MPI (. Message Passing Interface). 1992 1994 . Message Passing Interface Forum,

    . , -

    MPI , MPI, - , - -

  • 25

    .

    . - , ,

    ,

    . MPI MPP-, SMP- ( ).

    MPI , - -

    . MPI - , C++ Fortran. MPI -,

    .

    MPI , - , -

    ,

    , . -

    . MPI -

    .

    -

    , - .

    2.3. -

    , MPP-, , ,

    ,

    .

    26

    -

    .

    :

    -

    ; , , -

    -; -

    , .. -

    .

    , -

    , -

    .

    : , -

    . , -

    , . -

    -

    . .

    ScaLAPACK - -

    P,

    (1) P = m n / 106, m n . ,

    1000 1000. (1) ( ), , ScaLAPACK.

  • 27

    n3, -

    n2.

    . -

    - .

    MPP- , -

    . :

    SPMD (. Single Program Multiple Date) - , -

    ; MPMD (. Multiple Program Multiple Date)

    , -

    .

    -

    . , ,

    , -

    .

    , .

    , , -

    , .

    -

    .

    -

    P , .

    . MPMD- .

    28

    SPMD- - .

    , , -

    , :

    if (proc_id == 1) task1 (); if (proc_id == 2) task2 ();

    result = reduce (result1, result2, );

    proc_id , reduce -

    task1, task2 . . P , .

    ,

    , P .

    -

    .

    , . :

    for (i=1, i

  • 29

    , -

    . -

    , ,

    . 2. ,

    .. 1 / P .

    , ,

    . , -

    -

    , -

    .

    . -

    -

    ,

    -

    .

    ,

    ,

    . -

    , (Ian Foster) [4]. :

    1. - (. partitioning).

    2. (. communication).

    3. - (. agglomeration).

    30

    4.

    (. mapping). -

    , -

    - -

    MPP-. 1- 2- , -

    3- 4- .

    , -

    .

    , -

    .

    , ,

    ,

    .

    ,

    .

    MPP-,

    . :

    -

    (); -

    ; -

    : .

    . -

    .

    , .

  • 31

    .

    2.4.

    P - P , , / , P . ,

    : (2) S 1 / (f + (1 f) / P), S P ; f

    . (2) ,

    ,

    . , SMP-, -

    -

    . , MPP-, - ,

    .

    1 -

    .

    1. P , %

    50 25 10 5 2

    2 1,33 1,60 1,82 1,90 1,96 4 1,60 2,28 3,07 3,48 3,77 8 1,78 2,91 4,71 5,93 7,02

    16 1,88 3,36 6,40 9,14 12,31 32 1,94 3,66 7,80 12,55 19,75

    512 1,99 3,97 9,83 19,28 45,63 2048 2,00 3,99 9,96 19,82 48,83

    32

    f .

    . (2) , P-

    , , .

    ,

    . ,

    , -

    -

    . ,

    (2) ,

    . ,

    . ,

    ,

    . -

    - -

    . -

    , -

    .

    2.5. , -

    -

    - .

    , -

    ,

    .

    , ,

  • 33

    . -

    n n (2 n 1) n n . ,

    - .

    , -

    , ( - ), - . , -

    . , -

    -

    .

    ,

    ,

    ,

    .

    1000 1000 - ,

    -

    .

    2003 .. , .. ..

    ; - . ,

    :

    DEC Alpha 667 ; Intel Pentium III 500 ; SUN UltraSPARC II 300 ; Intel Quad Core Xeon 2,33 .

    , -

    34

    .

    :

    Alpha 1 ; Pentium 500 ; Ultra 800 ; Xeon 37 (9 ). -

    Fortran, . , -

    -

    .

    matrix.c, :

    #include #include #include

    double dseconds (void);

    int main (int argc, char *argv[]) { int i, j, k; double s; const n = 1000; // double *a, *b, *c; // double time; //

    // ( ) a = (double *) malloc (n * n * sizeof (double)); b = (double *) malloc (n * n * sizeof (double)); c = (double *) malloc (n * n * sizeof (double));

    // for (i=0; i

  • 35

    { a[i*n + j] = (double) i + 1; b[i*n + j] = (double) 1 / (j + 1); } }

    // time = dseconds (); for (i=0; i

  • 37

    -

    :

    20,75 ; 48,12 . ,

    .

    ,

    ,

    . -

    . ,

    -

    > gcc O3 matrix.c

    -

    , 3, , - .

    -

    . , Intel -

    .

    3.

    Alpha Pentium Ultra Xeon (1 )

    , 58,84 134,76 90,84 7,30 ,

    33,97 14,83 22,00 136,69

    .

    , -

    b (k, j) ( ). -

    38

    -. -

    *d double, - b, n sizeof (double) :

    for (i=0; i

  • 39

    -

    BLAS (. Basic Linear Algebra Subprograms),

    . - -

    BLAS, -

    .

    BLAS - ATLAS (. Automatically Tuned Linear Algebra Software), - ,

    cblas_dgemm. /opt/atlas/include/cblas.h, -

    #include "/opt/atlas/include/cblas.h"

    -

    cblas_dgemm (CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1, a, n, b, n, 0, c, n);

    -

    ATLAS > gcc -O3 matrix.c -L"/opt/atlas/lib" -lcblas -latlas

    -

    BLAS ATLAS, 5, ATLAS, ,

    40

    (Sun Performance Library, Common Extended Math Library).

    5.

    Alpha Pentium Ultra Xeon (1 )

    , 5,36 2,72 2,24 0,29 ,

    372,9 734,8 894,0 3372,17

    , . -

    Intel Quad Core Xeon, . -

    : 1. ( -

    ,

    ). 2. (

    ). 3. BLAS

    ATLAS ( ATLAS ).

    , .. ,

    37 . -

    SMP- , .. ,

    > gcc -O3 matrix.c -L"/opt/atlas/lib" -lm -lpthread -lptcblas -latlas

    17 , ..

    , . , -

    ,

  • 41

    -

    ( 6 ). -

    , - .

    -

    10000 10000. -

    , 20 , .. ,

    .

    , -

    .

    ,

    ATLAS .

    Intel MKL (. Math Kernel Library), BLAS.

    cblas_dgemm Intel MKL,

    #include "/opt/intel/mkl/10.1.0.015/include/mkl_cblas.h"

    -

    ( -lmkl_sequential -lmkl_intel_lp64 -lmkl_core -lm)

    > gcc -O3 mmatrix.c -L"/opt/intel/mkl/10.1.0.015/lib/em64t" -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm

    32,6 , .. . -

    - .

    42

    3. , -

    -

    - . -

Recommended

View more >