Transport and Storage Of Genomic Information

The availability of high-throughput DNA sequencing technologies opens up new perspectives in the treatment of several diseases, making possible the introduction of new global approaches in public health known as “precision medicine”. While routine DNA sequencing in the doctor’s office is still not current practice, medical centres have begun to use sequencing to identify cancer and other diseases and to find effective treatments. As DNA sequencing technologies produce extremely large amounts of DNA sequence data and related information, the ICT costs of storage, transmission, and processing are also very high. The MPEG-G standard addresses and solves the problem of efficient and economical handling of genomic data by providing new compression and transport technologies.

The MPEG-G standards are the results of the synthesis of technologies collected in response to a Call for Proposals issued at MPEG’s 115th meeting in collaboration with the working group for standardization of data processing and integration of the ISO Technical Committee for biotechnology standards (ISO TC 276/WG 5).

At its 120th meeting, MPEG promoted its first set of specifications of the family of MPEG-G standards to Committee Draft (CD) level. These standards provide a new compression technology (ISO/IEC 23092-2) for genomic sequencing data and a set of technologies (ISO/IEC 23092-1) supporting rich functionality for the transport of genomic data on networks and the storage of the data in files. The further standardization plan for MPEG-G includes the Committee Drafts for metadata and APIs (ISO/IEC 23092-3) and reference software (ISO/IEC 23092-4), which are to be issued at the next MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.

Emerging Standard For Genomic Data Compression

At its 115th meeting, MPEG issued a Call for Proposals (CfP) for Genomic Information Compression and Storage in conjunction with the working group for standardisation of data processing and integration of the ISO Technical Committee for biotechnology standards (ISO/TC 276/WG5). The call sought submissions of technologies that can provide efficient compression of genomic data and metadata for storage and processing applications. During the 116th MPEG meeting, responses to this CfP have been collected and evaluated by a joint ad-hoc group of both working groups, comprising twelve distinct technologies submitted. An initial assessment of the performance of the best s elevenolutions for the different categories reported compression factors ranging from 8 to 58 for the different classes of data.

The submitted twelve technologies show consistent improvements versus the results assessed as an answer to the Call for Evidence in February 2016. Further improvements of the technologies under consideration are expected with the first phase of core experiments that has been defined at the 116th MPEG meeting. The open core experiments process planned in the next 12 months will address multiple, independent, directly comparable rigorous experiments performed by independent entities to determine the specific merit of each technology and their mutual integration into a single solution for standardisation. The core experiment process will consider submitted technologies as well as new solutions in the scope of each specific core experiment. The final inclusion of submitted technologies into the standard will be based on the experimental comparison of performance, as well as on the validation of requirements and inclusion of essential metadata describing the context of the sequence data, and will be reached by consensus within and across both committees.