...

Genome sequences as media files

by tparidae

on

Report

Category:

Science

Download: 0

Comment: 0

81

views

Comments

Description

Compression algorithms can be valued along 3 different axes: Efficiency (speed), Effectiveness (compression ratio) and Functionality. Almost all genome and general compression algorithms offer high performance when looking at efficiency and effectiveness but lack the functionality that can be found in media file compression and container formats such as random access, stream processing and streaming. In this presentation, we discuss our first results and a general discussion on how we can use media file technologies to create a genomic data compression algorithm that is both efficient, effective and functional.

Based on our paper "Genome sequences as media files"
Presented at Biostec 2014 Doctoral Symposium
Paper available at http://www.scribd.com/doc/232691456/Genome-Sequences-as-Media-Files-Towards-Effective-Efficient-And-Functional-Compression-of-Genomic-Data
Download Genome sequences as media files

Transcript

  • 1/59 ELIS – Multimedia Lab <Title> <Author> <Date> Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014 Genome Sequences as Media Files Towards effective, efficient, and functional compression of genomic data
  • 2/59 ELIS – Multimedia Lab <Title> <Author> <Date> Video Coding Adap- tation Analysis Experience Satellite DVB- S2 VSAT Ground Station CT Scans Com- pression Trans- mission Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 3/59 ELIS – Multimedia Lab <Title> <Author> <Date> Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 4/59 ELIS – Multimedia Lab <Title> <Author> <Date> Compression Triangle Efficiency FunctionalityEffectiveness Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 5/59 ELIS – Multimedia Lab <Title> <Author> <Date> Compression Triangle Efficiency FunctionalityEffectiveness Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 6/59 ELIS – Multimedia Lab <Title> <Author> <Date> Effectiveness - SOTA • 2 bits per nucleotideBit Encoding • Matching groups of nucleotides Dictionary- based Encoding • Prediction using a probabilistic model Statistical Encoding • Matching with another genome Reference- based Encoding Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 7/59 ELIS – Multimedia Lab <Title> <Author> <Date> Split sequence in blocks Select prediction tool Encode tool parameters Encode residue Block-based compression Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 8/59 ELIS – Multimedia Lab <Title> <Author> <Date> Split sequence in blocks Select prediction tool Encode tool parameters Encode residue Block-based compression Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 9/59 ELIS – Multimedia Lab <Title> <Author> <Date> Prediction Residue Genomic data Prediction & Residue Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 10/59 ELIS – Multimedia Lab <Title> <Author> <Date> Prediction INTRA “AAA…AA” “TTT…TT” “ATAT…AT” “CGCG…CG” Huffman Encode in 2- base Huffman INTER Search similar block Search inverse complement Selecting Prediction Tools Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 11/59 ELIS – Multimedia Lab <Title> <Author> <Date> Split sequence in blocks Select prediction tool Encode tool parameters Encode residue Encoding prediction tool & residue Context Adaptive Binary Arithmetic Coding Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 12/59 ELIS – Multimedia Lab <Title> <Author> <Date> Compression Triangle Efficiency FunctionalityEffectiveness Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 13/59 ELIS – Multimedia Lab <Title> <Author> <Date> Efficiency - SOTA • Hashing • Quick match detection tools (e.g. Pattern Hunter) • Single-threaded processing • Decoding & transmitting complete files Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 14/59 ELIS – Multimedia Lab <Title> <Author> <Date> Block-based compression Parallel processing Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014 Split sequence in blocks Select prediction tool Encode tool parameters Encode residue
  • 15/59 ELIS – Multimedia Lab <Title> <Author> <Date> Efficiency - Research • Partial decoding (@ block level) • Live encoding/streaming • Smart and adaptive use of compression tools – Load balancing – Compression speed vs ratio Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 16/59 ELIS – Multimedia Lab <Title> <Author> <Date> Compression Triangle Efficiency FunctionalityEffectiveness Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 17/59 ELIS – Multimedia Lab <Title> <Author> <Date> Functionality - SOTA • Random access • Metadata • Encryption @ full-file level Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 18/59 ELIS – Multimedia Lab <Title> <Author> <Date> Functionality - Research • Random access • Metadata – File adaptation Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 19/59 ELIS – Multimedia Lab <Title> <Author> <Date> Functionality - Research • Compressed-domain analysis and adaptation – Selecting parts of the genome for transmission – Using INTER information to track repeats Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 20/59 ELIS – Multimedia Lab <Title> <Author> <Date> Functionality - Research • DRM/Encryption – @(sub)block level • Random access • Adaptation • Some compressed-domain analysis – @accuracy level • Lossless for trusted researchers • Near-lossless for everybody else Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
  • 21/59 ELIS – Multimedia Lab <Title> <Author> <Date> Can we apply media file technology on genomic data? Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014 Benchmarking sets? Please…
  • 22/59 ELIS – Multimedia Lab <Title> <Author> <Date> Compression Triangle Efficiency FunctionalityEffectiveness Genome Sequences as Media Files Tom Paridaens, Wesley De Neve, Peter Lambert and Rik Van de Walle 05-03-2014
Fly UP