MPEG Genome Compression

At its 114th meeting, MPEG has progressed its exploration of genome compression toward formal standardization. The 114th meeting included a seminar to collect additional perspectives on genome data standardization, and a review of technologies that had been submitted in response to a Call for Evidence. The purpose of that CfE, which had been previously issued at the 113th meeting, was to assess whether new technologies could achieve better performance in terms of compression efficiency compared with currently used formats.

In all, 22 tools were evaluated. The results demonstrate that by integrating a multiple of these tools, it is possible to improve the compression of up to 27% with respect to the best state-of-the-art tool. With this evidence, MPEG has issued a Draft Call for Proposals (CfP) on Genomic Information Representation and Compression. The Draft CfP targets technologies for compressing raw and aligned genomic data and metadata for efficient storage and analysis.

As demonstrated by the results of the Call for Evidence, improved lossless compression of genomic data beyond the current state-of-the-art tools is achievable by combining and further developing them. The call also addresses lossy compression of the metadata which make up the dominant volume of the resulting compressed data. The Draft CfP seeks lossy compression technologies that can provide higher compression performance without affecting the accuracy of analysis application results. Responses to the Genomic Information Representation and Compression CfP will be evaluated prior to the 116th MPEG meeting in October 2016 (in Chengdu, China). An ad hoc group, co-chaired by Martin Golobiewski, convenor of Working Group 5 of ISO TC 276 (i.e. the ISO committee for Biotechnology) and Dr. Marco Mattavelli (of MPEG) will coordinate the receipt and pre-analysis of submissions received in response to the call. Detailed results to the CfE and the presentations shown during the seminar will soon be available as MPEG documents N16137 and N16147 at: http://mpeg.chiariglione.org/meetings/114.