MPEG-G utilizes the latest technology to compress and transport sequencing data for complex use cases including:
- Selective access to compressed data
- Data streaming
- Compressed file concatenation
- Genomic studies aggregation
- Enforcement of privacy rules
- Selective encryption of sequencing data and metadata
- Annotation and linkage of genomic segments
- Interoperability with main existing technologies and legacy formats
- Incremental update of sequencing data and metadata
Do you want to know more? Have a look at our Frequently Asked Questions!
In order to better know how MPEG standardization and MPEG-G work, please read our FAQ section
The MPEG-G standard is composed by six parts:
Part 1: Transport and Storage
This part of the standard deals with data formats for both Transport and Storage of Genomic Information, with reference conversion process and informative annexes. The main topics covered by this part are genomic data streaming and file format.
Part 2: Compression
This part provides specifications for the normative representation of genomic sequence reads identifiers, genomic sequence reads (both unaligned reads and aligned reads), reference sequences and quality values. This is the part where compression is specified in terms of normative bitstream syntax and decoding behavior.
Part 3: Metadata and APIs
This part of the standard specifies information metadata, SAM interoperability, protection metadata and programming interfaces to access genomic information. The main goals are to enable (controlled) access to MPEG-G data from external applications and to add metadata to compressed genomic information.
Part 4: Reference Software
To support and guide potential implementers of MPEG-G, the standard includes a normative Reference Software. The Reference Software is normative in the sense that any conforming implementation of the decoder, taking the same conformant compressed bitstreams, using the same normative output data structures, will output the same data.
Part 5: Conformance
Conformance testing is fundamental in providing means to validate the correct implementation of the MPEG-G technology in different devices and applications and the interoperability among all systems. This part of the standard specifies a normative procedure to assess conformity to the standard on an exhaustive dataset of compressed data.
Part 6: Genomic Annotation
The output of most biological studies based on sequencing protocols is usually represented as different types of annotations (meta-information), all associated with one or more intervals on the reference genome. This part of the ISO/IEC 23092 standard series augments the MPEG-G hierarchy with the concept of meta-information related to intervals to support some additional use cases relevant to secondary data analysis.
A test set of reference genomic data has been defined to perform tests during the process of standardization, definition of conformance test procedures and other experiments. More information on the MPEG-G database is available here.
How to participate?
If you are interested in MPEG-G and the related activities you are welcome to join the open mailing list and contribute to the discussions