Алгоритмы в биоинформатике, осень 2015: Множественные выравнивания и профили

  • View
    511

  • Download
    4

Embed Size (px)

Transcript

  • 2

    1. .1.

    2.

    3.

    1.

    2.

    3.

    4.

    1.

    2.

    3.

    .

    . .

    3-level Manhattan. .

    HMM.

    . .

    . Forward-Backward.

    HMM .

    .

    WPGMA UPGMA.

    Neighbour-Joining.

    6. .1.

    2.

    1.

    .

    .

    .

    Zuker folding algorithm.

    2. .

    7. .3. .

    5. .

    1. . .

    4. - (). .

    4. .

    4. .1.

    2.

    -. -.

    . --.

    3. ???

    2. . ClustalW.

    3. .

    ???

  • 2-

    3- 3- .

    A T _ G C G _A _ C G T _ AA T C A C _ A

    : ,

  • 3-

  • 3-D

    (i-1,j-1,k-1) (i-1,j,k-1)

    (i-1,j-1,k) (i-1,j,k)

    (i,j,k-1)

    (i,j-1,k-1)

    (i,j-1,k) (i,j,k)

  • si-1,j-1,k-1 + (vi, wj, uk)

    si-1,j-1,k + (vi, wj, _ )

    si-1,j,k-1 + (vi, _, uk)

    si,j-1,k-1 + (_, wj, uk)

    si-1,j,k + (vi, _ , _)

    si,j-1,k + (_, wj, _)

    si,j,k-1 + (_, _, uk)

    si,j,k = max

    (x, y, z)

  • 3- n, 7n3; O(n3)

    k - (2k-1)(nk);O(2knk)

  • x: AC-GCGG-Cy: AC-GC-GAGz: GCCGC-GAG

    :

    x: ACGCGG-C;

    y: ACGC-GAC;

    x: AC-GCGG-C;

    z: GCCGC-GAG;

    y: AC-GCGAG

    z: GCCGCGAG

  • 3 :

    x: ACGCGG-C;

    y: ACGC-GAC;

    x: AC-GCGG-C;

    z: GCCGC-GAG;

    y: AC-GCGAG

    z: GCCGCGAG

  • x GGGCACTGCAT

    y GGTTACGTC--

    z GGGAACTGCAG

    w GGACGTACC--

    v GGACCT-----

    Alignment 1

    Alignment 2

  • GTCTGAGTCAGC

    GTC[TA]G[AC] - G[5X][6X]

    x

    y

    z

    w

    v

    GGGCACTGCAT

    GGTTACGTC--

    GGGAACTGCAG

    GGACGTACC--

    GGACCT-----

    GGACACAGCAT -

  • 1.

    bk (i) = Ek (i)/ j Ek(j)

    2.

    bk (i) = (Ek(i) +1) / (j Ek(j)+ N)

    3.

    bk (i) = (Ek(i) +Aqi) / (j Ek(j)+ A)

    4.

    Ek(i) = Aj fkj P(ij)

    5.

    bk (i) = j Pk(ij) P(predk=j | alignment)

  • u1= ACGTACGTACGT

    u2 = TTAATTAATTAA

    u3 = ACTACTACTACT

    u1= AC[GT]TAC[GT]TAC[GT]T

    u2 = TTAATTAATTAA

    k-1

    k

    uk = CCGGCCGGCCGG

    uk = CCGGCCGGCCGG

    k n O(n2k2)

  • ClustalW

    .

    1.)

    2.) -

    3.) -

  • 1:

    identity

    v1 v2 v3 v4

    -

    .17 -

    .87 .28 -

    .59 .33 .62 -

    v1

    v2

    v3

    v4

    (.17 17 % )

  • 2: -

    v1

    v2

    v3

    v4

    v1 v2 v3 v4

    -

    .17 -

    .87 .28 -

    .59 .33 .62 -

    v1

    v3

    v4

    v2

    :v1,3 = (v1, v3)v1,3,4 = ((v1,3),v4)v1,2,3,4 = ((v1,3,4),v2)

  • 3:

    2

    .

    - ,

    FOS_RAT

    FOS_MOUSE

    FOS_CHICK

    FOSB_MOUSE

    FOSB_HUMAN

    PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFD

    PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD

    SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFD

    PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP-----------------LPFQ

    PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP-----------------LPFQ

    . . : ** . :.. *:.* * . * **:

    .

  • :

    (SP-Score)

  • AAAAAAAATATC

  • AAAAAAAATATC

  • (SP-Score)

    s(ai, aj)

    : s(a1,,ak) = i,j s (ai, aj)

  • : :

    -( pA logpA + pC logpC + pG logpG + pT logpT)

    A A A

    A C C

    1 = -[1*log(1) + 0*log0 + 0*log0 +0*log0]=0

    2 = -[(1/4)*log(1/4) + (3/4)*log(3/4) + 0*log0 + 0*log0]= -[ (1/4)*(-2) + (3/4)*(-.415) ] = +0.811

    A C G

    A C T

    3 = -[(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4)] = 4* -[(1/4)*(-2)] = +2.0

    = 0 + 0.811 + 2.0 = +2.811