Bioinformatika Predikce genů, Fylogenetická analýza

  • Published on
    12-Jan-2016

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatika Predikce gen, Fylogenetick analza. http://bio.img.cas.cz/P r fUK200 2. Ji Vondrek stav organick chemie a biochemie vondrasek@uochb.cas.cz. Jan Paes stav molekulrn genetiky hpaces@img.cas.cz. Predikce gen. - PowerPoint PPT Presentation

Transcript

<ul><li><p>Jan Paesstav molekulrn genetikyhpaces@img.cas.czJi Vondrekstav organick chemie a biochemievondrasek@uochb.cas.czBioinformatikaPredikce gen, Fylogenetick analza </p><p>http://bio.img.cas.cz/PrfUK2002</p></li><li><p>Predikce genZaloena na nestejnm informanm obsahu v kdujcch a nekdujcch oblastech (kdujc potencil).Rozdln metody pro prokaryotn a eukaryotn geny.Potebujeme ji existujc data.Principy metod homologie (exofish, ..) statistick metody (codonuse, genscan, ..) neuronov st (genemark, ..)</p></li><li><p>Vbr kodn LeucinRhodobacter capsulatus</p><p>antikodnpoet % CUA 3 </p></li><li><p>Predikce gen - statistick vpoetPravdpodobnost vskytu znaku (etzce) na pozici i:</p><p>Pi = fi / S f</p><p>Pravdpodobnost vskytu uritho seku (okna):</p><p>Pw = P1 . P2 .. Pwkde w je dlka seku = S logPikde i = 1..w</p><p>Pro dan sek zskme est hodnot, kter normujeme, nap:</p><p>CPf = CPi / S CPkde i = 1..6</p></li><li><p>codonuse grafick rozhran ke statistickmu vpotu pouv dicodon preferences variabiln okno pi prohledvn </p></li><li><p>CRITICA prokaryotick geny hledn RBS (ribosomal binding site, Shine-Dalgarnova sekvence)</p><p>Princip: TBLASTP proti proteinov databzi a vytypovn "jist" kdujcch sekvenc (vtinou nekompletnch gen). Vytvoen statistickho modelu. Predikce gen. Vytvoen dalho statistickho modelu a predikce gen.</p></li><li><p>Genscan eukaryotick geny pot rzn prvn, prostedn a posledn exon promotory, termintory, polyA rzn statistick parametry pro rzn GC</p><p>www: http://genes.mit.edu/GENSCAN.html</p><p>Pravdpodobnostn rozsahExonyPesnstenPekryvChybn0.00 - 0.5024829.8%27.8%4.0%38.3%0.50 - 0.7536254.1%26.2%2.2%17.4%0.75 - 0.9033774.8%16.0%1.2%8.0%0.90 - 0.9526387.8%6.1%0.4%5.7%0.95 - 0.9955192.4%3.4%0.2%4.0%0.99 - 1.0091797.7%0.9%0.0%1.4%</p></li><li><p>GENSCAN 1.0 Date run: 31-Oct-100 Time: 15:54:20</p><p>Sequence HERV17_004640 : 40714 bp : 37.79% C+G : Isochore 1 ( 0.00 - 43.00 C+G%)</p><p>Parameter matrix: HumanIso.smat</p><p>Predicted genes/exons:</p><p>Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------</p><p> 1.01 Init + 1825 1853 29 0 2 86 71 45 0.579 1.72 1.02 Term + 3886 4075 190 1 1 85 44 198 0.941 11.04 1.03 PlyA + 4961 4966 6 1.05</p><p> 2.00 Prom + 6668 6707 40 -4.65 2.01 Init + 17251 17375 125 0 2 45 72 80 0.590 1.81 2.02 Term + 20137 20329 193 1 1 85 43 196 0.990 10.71 2.03 PlyA + 20809 20814 6 1.05</p><p> 3.08 PlyA - 21608 21603 6 -3.24 3.07 Term - 22315 21651 665 2 2 -17 55 522 0.952 31.44 3.06 Intr - 24268 22592 1677 2 0 81 94 2124 0.885 198.67 3.05 Intr - 24877 24728 150 2 0 34 91 101 0.783 4.21 3.04 Intr - 29976 29878 99 1 0 48 111 82 0.473 5.66 3.03 Intr - 31296 31170 127 0 1 89 82 101 0.997 8.93 3.02 Intr - 32563 32418 146 2 2 46 70 132 0.303 6.28 3.01 Init - 33114 33006 109 0 1 79 12 93 0.406 1.25 3.00 Prom - 35592 35553 40 -5.85</p><p> 4.00 Prom + 36433 36472 40 -4.25 4.01 Init + 37863 37909 47 2 2 71 58 16 0.307 -2.89 4.02 Intr + 38032 38102 71 1 2 33 67 79 0.531 -1.79 4.03 Term + 38614 39059 446 2 2 66 49 276 0.577 15.91 4.04 PlyA + 39744 39749 6 1.05</p><p>Genscan - pkladSuboptimal exons with probability &gt; 0.100</p><p>Exnum Type S .Begin ...End .Len Fr Ph B/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------</p><p>S.001 Init + 2937 3136 200 2 2 67 -22 154 0.301 0.72S.002 Intr + 3239 3325 87 2 0 43 23 121 0.358 -0.73S.003 Intr + 17250 17375 126 0 0 66 72 94 0.141 4.47S.004 Init + 17311 17375 65 0 2 55 72 45 0.204 0.27S.005 Intr - 24927 24728 200 2 2 12 91 115 0.146 2.27S.006 Intr - 25129 25003 127 2 1 51 92 37 0.117 -0.78S.007 Intr - 29973 29878 96 1 0 44 111 87 0.473 5.66S.008 Intr - 32589 32418 172 2 1 19 70 151 0.336 5.42S.009 Intr - 32563 32427 137 2 2 46 70 116 0.122 4.97S.010 Intr - 32589 32427 163 2 1 19 70 135 0.114 3.86S.011 Intr - 32857 32804 54 0 0 104 103 2 0.262 0.48S.012 Init - 33114 33008 107 0 2 79 17 87 0.296 0.46S.013 Init + 37062 37067 6 2 0 53 68 1 0.115 -4.38S.014 Intr + 38237 38315 79 1 1 35 38 94 0.175 -2.69S.015 Intr + 38270 38315 46 1 1 81 38 59 0.170 -2.74S.016 Term + 38623 39059 437 2 2 55 49 266 0.139 13.86S.017 Term + 38872 39059 188 2 2 62 49 243 0.212 14.47</p><p>Predicted peptide sequence(s):</p></li><li><p>Programy a www serveryRozcestnk: http://www.hgc.ims.utokyo.ac.jp//~katsu/genefinding/programs.html </p><p>Obecn a multi: http://dot.imgen.bcm.tmc.edu:9331/seq-search/gene-search.html http://bioweb.pasteur.fr/seqanal</p><p>Jednotliv programy: http://genes.mit.edu/GENSCAN.html http://www.tigr.org/tdb/glimmerm/glmr_form.html http://www.tigr.org/~salzberg/veil.html http://www.tigr.org/~salzberg/morgan.html http://kicy.genoscope.cns.fr/cgi-bin/exofish_kicy.cgi http://www.fruitfly.org/~martinr/doc/genie.html http://www.resp-sci.arizona.edu/genlab/genehunter.htm</p></li><li><p>Fylogenetick analzaOdhaduje evolun souvislosti mezi daty</p><p>Vchoz pedpoklady:kumulace jednotlivch zmnzmny jsou nhodnpiblin stejn evolun rychlost(molekulrn as)</p></li><li><p>Multile alignment</p></li><li><p>Evolun stromy - terminologie nody (uzly) vnitn vnj vtve topologie stromu bifurkan strom aditivn strom ultrametrick strom koen (root) stromu</p><p> pravdiv (korektn) odvozen </p></li><li><p>( ( ( ( polyA_26:0.042779, HERV17_27:0.049179 ):0.008643, polyA_410:0.045034 ):0.001912, ( ( polyA_20:0.039953, HERV17_15:0.034230 ):0.003074, HERV17_76:0.041414 ):0.002812 ):0.001440, polyA_30:0.042838, ( polyA_99:0.052972, HERV17_19:0.041888 ):0.003257)Evolun stromy - pklad</p></li><li><p>Evolun stromy - pkladEvolun strom pTR5 rodiny lidskch endogennch retrovir </p></li><li><p>Evolun stromy - tvorbaAlgoritmick metody - rychl, dvaj jednoznan vsledek, ale ne vdy nejlep (lokln optimum).Optimalizan metody - pomalej, ale naleznou globln optimum.</p><p>Poadavky na vstupn sekvenn data:Alignment pouze homolognch stVynechat gaps</p><p>(Stromy zaloen na binrnch datech, jako je restrikn analza nebo unikatn inzerce a delece.)</p></li><li><p>Algoritmick (distann) metodyMetoda: shlukov analza Vstup: matice distanc (substitun model) UGPMA (Unweighted pair group method with arithmetic averages) WGPMA Neighbour-joining</p></li><li><p>Neighbour-joiningStar decomposition method</p></li><li><p>Substitun modelyPro DNA: Jednoparametrick: Jukes-Cantor</p><p> Dvouparametrick: KimuraTransice: purin - purinTransverze: pyrimidin - purin</p><p>Pro proteiny: Substitun matice (Blosum etc)</p></li><li><p> 9 polyA_26 polyA_30 0.1102 polyA_20 0.1144 0.1027 polyA_99 0.1326 0.1100 0.1237 polyA_410 0.1089 0.1009 0.1067 0.1150 HERV17_27 0.1070 0.1263 0.1285 0.1504 0.1198 HERV17_76 0.0960 0.1024 0.0953 0.1221 0.1036 0.1188 HERV17_19 0.1045 0.0994 0.1019 0.1097 0.1059 0.1304 0.0975 HERV17_15 0.0980 0.0975 0.0841 0.1170 0.0977 0.1127 0.0860 0.0927</p><p>Matice distanc</p></li><li><p>Optimalizan metodyMetoda: hledn optimlnho stromuVstup: multiple alignment parsimonie (parsimony) maximln vrohodnost (maximum likehood - ML) prov distann metody (pairwise distance methods).</p></li><li><p>ParsimonieA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAA</p></li><li><p>Parsimonie 1ACBDA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAAABCDACDB1</p><p>1</p><p>1</p></li><li><p>Parsimonie 2ACBDA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAAABCDACDB1 + 1</p><p>1 + 2</p><p>1 + 2</p></li><li><p>Parsimonie 3ACBDA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAAABCDACDB2 + 2</p><p>3 + 1</p><p>3 + 2</p></li><li><p>Parsimonie 4ACBDA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAAABCDACDB4 + 1</p><p>4 + 2</p><p>5 + 2</p></li><li><p>Parsimonie 5ACBDA: TATGTTCB: TATTTTCC: TACGTACD: GACTTAAABCDACDB6</p><p>7</p><p>8</p></li><li><p>Optimalizan metodyParsimonie nebere v vahu dlky vtv a pravdpodobnosti jednotlivch pechod</p><p>Maximln vrohodnost vybr stromy, kde nepravdpodobn udlosti jsou na delch vtvch</p></li><li><p>Testovn topologieBootstrap: vbr s opakovnmJack Knife: vbr bez opakovn, ale men poet sekvenc</p></li><li><p>Koen stromu</p></li><li><p>Koen stromu</p></li><li><p>Programyhttp://geta.life.uiuc.edu/~nikos/LINKS/biocomputing_servers.htmlhttp://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.htmlhttp://evolution.genetics.washington.edu/phylip/software.html</p></li><li><p>DodatkyPseudogeny: pomr synonymnch a nesynonymnch mutac </p><p>vodn strnka</p></li></ul>