MPEG-G Standard

The extensive usage of high-throughput deoxyribonucleic acid (DNA) sequencing technologies opens up new perspectives in the treatment of several diseases and enables “precision medicine”. As DNA sequencing technologies produce extremely large amounts of raw data, the ICT costs for the storage, transmission, and processing of DNA sequence data and related information, result to be very high due to the lack of universal standards preventing timely application of effective treatments.

The MPEG-G standard jointly developed by MPEG and ISO Technical Committee for biotechnology standards (ISO TC 276/WG 5) is the first international standard to address and solve the problem of efficient and cost-effective handling of genomic data by providing, not only new compression and transport technologies, but also a family of standard specifications associating relevant information in the form of metadata and a rich set of Application Programming Interfaces (APIs) for building a full ecosystem of interoperable applications and services capable of efficiently processing sequencing data.

At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. This will enable the industry to rely on a final specification in October 2018. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Beside standardization achievements, a workshop on the “applications of genomic information processing” has been held in conjunction with the 122nd MPEG meeting discussing requirements, open problems of genome information processing, and solutions provided by MPEG-G standards. Use cases representative of selective remote access with streaming and the execution of the Genome Analysis Toolkit (GATK) and equivalent processing pipelines using sequencing data in MPEG-G compressed forms have also been demonstrated.

Genomic Information Representation Metadata

A workshop on applications of genomic information processing has been held on 18th April 2018 co-located with the 122nd MPEG meeting in San Diego.

The Workshop explored the opportunities for improved genome sequencing data processing services enabled by the availability in late 2018 of an ISO standard called MPEG-G on the compression of genomic information and its impact on the relevant industry.

Specifically the workshop addressed:

  • The perspectives and potential of genomic information usage in medicine and public health
  • The vision of interdisciplinary approaches to the analysis of genome sequencing data
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status, progresses of sequencing technology and associate data generation features
  • The reasons for supporting seamless availability and exchange of genome sequencing data for improving scientific progress yielded by wider data volume analysis
  • A status report on the development of the ISO genomic compression standard and an overview of its new features and performance

Venue:
San Diego Marriott La Jolla, 4240 La Jolla Village Drive
San Diego, CA 92037, United States
(see also the 122nd MPEG meeting for more details)

Organizing Committee:
Joern Ostermann (TNT-LUH), Claudio Alberti (GenomSys), Rongshan Yu (Aginome Scientific), Tom Paridaens (imec and UGent)

Program

Start End What Who
12:30 13:00 Registration  
13:00 13:15 Welcome & workshop goals  
13:15 13:40 “Genome and medical information portability, retrieval and analysis” Amalio Telenti (Scripps Research Institute, USA)
13:40 14:05 “From womb to tomb sequencing: on the advantages on bringing multidisciplinary R&D to develop standards and analytics” Ioannis Xenarios, (SIB Switzerland)
14:05 14:30  “Future of Genomics and Big Data” Dawn Barry (Luna DNA, USA)
14:30 14:55 “Generation and Management of Large Sequence Files: Perspectives from the DNA Sequencing Core” Alvaro G. Hernandez (UIUC DNA Services, USA)
14:55 15:10 Presentation of demonstrations GenomSys, Aginome Scientific
15:10 15:40 Demo session and Coffee Break  
15:40 16:05 “The role of compression in the genomics data life cycle” Come Raczy (Illumina Inc., USA)
16:05 16:30 Genomics at Rady’s Children Hospital San Diego Ray Veeraraghavan (Rady’s Children Hospital San Diego, USA)
16:30 16:55 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the workshop GenomSys and Aginome Scientific showed demos, prototypes and products related to genome sequencing data processing analytics, compression and storage.