Emerging Standard For Genomic Data Compression

At its 115th meeting, MPEG issued a Call for Proposals (CfP) for Genomic Information Compression and Storage in conjunction with the working group for standardisation of data processing and integration of the ISO Technical Committee for biotechnology standards (ISO/TC 276/WG5). The call sought submissions of technologies that can provide efficient compression of genomic data and metadata for storage and processing applications. During the 116th MPEG meeting, responses to this CfP have been collected and evaluated by a joint ad-hoc group of both working groups, comprising twelve distinct technologies submitted. An initial assessment of the performance of the best s elevenolutions for the different categories reported compression factors ranging from 8 to 58 for the different classes of data.

The submitted twelve technologies show consistent improvements versus the results assessed as an answer to the Call for Evidence in February 2016. Further improvements of the technologies under consideration are expected with the first phase of core experiments that has been defined at the 116th MPEG meeting. The open core experiments process planned in the next 12 months will address multiple, independent, directly comparable rigorous experiments performed by independent entities to determine the specific merit of each technology and their mutual integration into a single solution for standardisation. The core experiment process will consider submitted technologies as well as new solutions in the scope of each specific core experiment. The final inclusion of submitted technologies into the standard will be based on the experimental comparison of performance, as well as on the validation of requirements and inclusion of essential metadata describing the context of the sequence data, and will be reached by consensus within and across both committees.