Genomic Data Compression

MPEG issues Call for Evidence (CfE) for Genome Compression and Storage

At its 113th meeting, MPEG has taken its first formal step toward leveraging its compression expertise to code an entirely new kind of essential information, i.e. the single recipe that describes each one of us as an individual — the human genome.  A sequenced genome is comprised of DNA sequences that may contain up to 300 billion DNA base pairs, that make up the genetic information within each human cell. It is fundamentally the complete set of our hereditary information.

To aid in the representation and storage of this unique information, MPEG has issued a Call for Evidence (CfE) on Genome Compression and Storage with the goal to assess the performance of new technologies for the efficient compression of genomic information when compared to currently used file formats. This is vitally important because the amount of genomic and related information from a sequenced genome can be as high as several Tbytes (trillion bytes).

Additional purposes of the call are to:

  • become aware of which additional functionalities (e.g. non sequential access, lossy compression efficiency, etc. ) are provided by these new technologies
  • collect information that may be used in drafting a future Call for Proposals

Responses to the CfE will be evaluated during the 114th MPEG meeting in February 2016.

Detailed information, including how to respond to the CfE, will soon be available as documents N15740 and N15739 at the 113th meeting website.