The extensive usage of high-throughput deoxyribonucleic acid (DNA) sequencing technologies opens up new perspectives in the treatment of several diseases and enables “precision medicine”. As DNA sequencing technologies produce extremely large amounts of raw data, the ICT costs for the storage, transmission, and processing of DNA sequence data and related information, result to be very high due to the lack of universal standards preventing timely application of effective treatments.
The MPEG-G standard jointly developed by MPEG and ISO Technical Committee for biotechnology standards (ISO TC 276/WG 5) is the first international standard to address and solve the problem of efficient and cost-effective handling of genomic data by providing, not only new compression and transport technologies, but also a family of standard specifications associating relevant information in the form of metadata and a rich set of Application Programming Interfaces (APIs) for building a full ecosystem of interoperable applications and services capable of efficiently processing sequencing data.
At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. This will enable the industry to rely on a final specification in October 2018. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.
Beside standardization achievements, a workshop on the “applications of genomic information processing” has been held in conjunction with the 122nd MPEG meeting discussing requirements, open problems of genome information processing, and solutions provided by MPEG-G standards. Use cases representative of selective remote access with streaming and the execution of the Genome Analysis Toolkit (GATK) and equivalent processing pipelines using sequencing data in MPEG-G compressed forms have also been demonstrated.