MPEG-G Part 4

Reference Software

The extensive usage of high-throughput deoxyribonucleic acid (DNA) sequencing technologies opens up new perspectives in the treatment of several diseases and enables the implementation of a new approach to healthcare known as “precision medicine”. 

DNA sequencing technologies produce extremely large amounts of raw data which are stored in different repositories worldwide. The processing, analysis, and comparison of such distributed data is a fundamental element for the effective usage of sequencing data for clinical and scientific purposes. Standard Application Program Interfaces (APIs) and Metadata, obviously, are the basis for interoperable and automated data access and processing systems that can efficiently operate on the worldwide available sets of sequencing data.

To support and guide potential implementers of MPEG-G, the standard includes a normative Reference Software. The Reference Software is normative in the sense that any conforming implementation of the decoder, taking the same conformant compressed bitstreams and using the same normative output data structures, will output the same data.

There is an open source implementation of a normative decoder and informative encoder together with bitstreams. Other open source implementation exist developed by independent groups, such as the GENIE project.

GENIE is the first open source implementation of an encoder-decoder pair that is compliant with the MPEG-G specifications and delivers all its benefits. GENIE Is now focused on compression, but also supports development of efficient data transfer and APIs for operating directly on the compressed data. It supports lossless and lossy compression of genomic data in the form of FASTA, FASTQ and SAM files and is based on the FAIR (Findable, Accessible, Interoperable, and Reusable) principles.

In October 2020 the fourth part of MPEG-G, Coding of Genomic Information (ISO/IEC 23092-4) was published. This version for the standard ISO/IEC 23092-4:2020 for part 4: Reference software provides specifications for the genomic information representation reference software, referred to as the genomic model (GM). This decoding software is provided to assess conformance to the requirements of ISO/IEC 23092-1 and ISO/IEC 23092-2.

The document is now available to here.

The other MPEG-G standard parts:

Part 1

Transport and Storage

Read more

Part 2:

Compression

Read more

Part 3:

Metadata and APIs

Read more

Part 4:

Reference Software

Read more  

Part 5:

Conformance

Read more 

Part 6:

Genomic Annotation

Read more