BGI Webinar June 6, 2014 "Genomic Big Data Analysis and Customised Analysis with RNA-Seq"

  • Published on
    25-Jun-2015

  • View
    842

  • Download
    0

Embed Size (px)

Transcript

  • 1. RNA-Seq 1 BGI 2014613()14:00~15:00

2. : RNA-Seq UNIX SlideShare 3. 3 4. 5. 6 1,000,000,000 1,000,000,000,000 1,000,000,000,000,000 1,000,000,000,000,000,000 1,000,000,000,000,000,000,000 6. 4 HPC 1. 2. HPCBig iron () HP4TBCPU: Xeon E7 (80 160) SCLS 3. Hadoop on AWS Amazon Elastic MapReduceAmazon EMR 4. Hadoop usp-BOA USP 7. 8. BIG IRON, (HPC) 9. SCLS HPCI 10. 804TB 11. Hadoop on AWS 12. BIGData 13. Hadoop = HDFS + MapReduce 14. Hadoop File System (HDFS) 15 15. MapReduce 16. 17 MapReduce 17. AWS(Amazon Web Service) 18. 19. 20. 22 Unicage Cluster (BOA) (BOA: Bigdata Oriented Architecture) 4. Hadoop 21. 23 Unicage Cluster Basic Cluster Configuration boam boas001 Bubun file system Infini-band High Speed Network boas002 boas00n Shell Script is written only on this server Master Server Bubun file system Bubun file system Bubun file system Bubun file system Slave Servers Master server: - Unix/linux OS - Unicage commands - Custom shell, ush - Bubun File System Slave server: -Unix/linux OS - Unicage commands - Custom shell, ush - Bubun File System Perform actual processing USP2013MIT 22. Human DNA Sequance Quality Check - Data: FASTQ sequences of human genome a person's DNA sequence is about 35GB 1: @ABCD:77:64U9X:6:3:11306:1000 2:N: 2: TTCCAGTACTTCCGCCAGGCACG 3: +B-=DFFBBDCF=@=? 4: @AE3A?:BB2B.fa 2. TopHat tophat -p 14 G -o 3. cufflinks cufflinks -p 14 G -o cuffdiff, cuffmerge) cufflinksR tU http://cat.hackingisbelieving.org/lecture/biwako/NGS-R-Bioconductor-2nd.html http://crusade1096.web.fc2.com/katei.html#3-2 http://cell-innovation.nig.ac.jp/wiki/tiki-index.php?page=TopHat 45. TopHat http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml GTFGFF GTF http://support.illumina.com/sequencing/sequencing_software/igenome.ilmn GTFGenbank Bioperl(bp_genbank2gff.pl) 46. Bowtie2 1. bowtie2-build bowtie2-build .fa 2. bowtie2 bowtie2 [options]* -x -U -S bowtie2 [options]* -x - -2 -S 47. bowtie tophat 90 50 GFF 48. OpenMPMPI 25 49. HPCI 50. : bowtie bowtie2 p -x index1.fa U test1.fastq S out1.sam & Bowtie2 p -x index2.fa U test2.fastq S out2.sam & Bowtie2 p -x index3.fa U test3.fastq S out3.sam & Bowtie2 p -x index4.fa U test4.fastq S out4.sam & Bowtie2 p -x index5.fa U test5.fastq S out5.sam :samtoolstophat-cufflinks samtools view -bS aln.sam > aln.bam && samtools sort aln.bam aln && samtools index aln.bam && samtools faidx ref.fa && samtools mpileup -uf ref.fa aln.bam | bcftools view -cg tophat -p 14 G -o && cufflinks -p 14 G -o http://jehupc.exblog.jp/15729095/ 51. Trinity De novo RNA-Seq 1. wget http://sourceforge.net/projects/trinityrnaseq/files/trinityrnaseq_r20140413p1.tar.gz zxvf trinityrnaseq_r20140413p1.tar.gz cd trinityrnaseq_r20140413p1 make Java make Java alternatives 2. Trinity --seqType fq --JM 100G --left reads_1.fq --right reads_2.fq --CPU 6 52. De novo RNA-Seq Trinity, Velvet-Oases Fasta FastaRN50 samtools 53. Velvet-Oases De novo RNA-Seq Velvet : wget http://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.08.tgz tar zxvf velvet_1.2.08.tgz $ cd velvet_1.2.08 make 'MAXKMERLENGTH=101 ./velveth ./velvetg Oases: wget http://www.ebi.ac.uk/~zerbino/oases/oases_0.2.08.tgz tar zxvf oases_0.2.08.tgz # cd oases_0.2.08 gmake 'VELVET_DIR=/path/to/velvet/velvet_0.2.08' 'MAXKMERLENGTH=101 ./oases 54. Velvet-Oases() ManualFor Impatient People velveth directory 21,23 data/reads.fa velvetg directory_21 -read_trkg yes oases directory_21 ls directory_21 velvetg directory_23 -read_trkg yes oases directory_23 velveth mergedAssembly 23 -long directory*/transcripts.fa velvetg mergedAssembly -read_trkg yes -conserveLong yes oases mergedAssembly -merge Or use the python script oases pipeline.py. python oases_pipeline.py -m 21 -M 23 data/reads.fa 55. IGV IGV IGV wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.2.5.zip unzup IGV_2.2.5.zip cd IGV_2.2.5 ./igv.sh igv.sh igv.jar samtools view -bS aln.sam > aln.bam && samtools sort aln.bam aln && samtools index aln.bam && samtools faidx ref.fa https://www.youtube.com/watch?v=5kkPnCV06dE 56. IGV 57. BLAST, BLAST+ BLAST , . wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/blast-2.2.26-x64-linux.tar.gz tar zxvf blast-2.2.26-x64-linux.tar.gz BLAST+ , . wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.28+-x64-linux.tar.gz tar zxvf ncbi-blast-2.2.28+-x64-linux.tar.gz blastall -p blastx d Osativa_193_peptide.fa -i Consensus.fa -m 9 > Consensus_Osativa_193_peptide_BLASTXOUT.txt R http://www.geocities.jp/ancientfishtree/LocalBlast_JI.html 58. GO GUI DAVIDGO http://david.abcc.ncifcrf.gov/ GUI http://togotv.dbcls.jp/20090925.html BLAST2GO:GO http://blast2go.com/b2ghome GSEA: GO http://array.cell-innovator.com/?p=2030 GO 59. R RRedHat Linux wget http://cran.ism.ac.jp/src/base/R-3/R-*.tar.gz tar zxvf R-*.tar.gz mkdir R_* cd R-* sudo yum install readline readline-devel sudo yum install libXt libXt-devel sudo yum install libX11 libX11-devel sudo yum install cairo cairo-devel sudo yum install libjpeg libjpeg-devel sudo yum install libpng libpng-devel sudo yum install libdiff libdiff-devel sudo yum install texlive* sudo yum install texinfo wget http://genome.lab.tuat.ac.jp/~kazuoishii/files/inconsolata.sty ./configure --prefix=/home/genome/Packages/R/R_* sudo make sudo make install 60. R R HaskellLisp LispScheme S Ruby RubyLispR Lisp http://cse.naro.affrc.go.jp/takezawa/r-tips/r.html http://www.iu.a.u-tokyo.ac.jp/~kadota/r.html http://www.iu.a.u-tokyo.ac.jp/~kadota/r_seq.html http://www.ospn.jp/press/20130124no32-1-useit-oss.html 61. R...