MPEG Meeting

With the ongoing COVID-19 situation, the 132nd MPEG meeting has been held again digitally from October 12th until 16th, 2020. Nonetheless, many important topics have been discussed, and progress and improvements have been made in the specific ISO 23092 MPEG-G subject.

MPEG evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies

The extensive use of high-throughput DNA sequencing technologies enables a new approach to healthcare known as “precision medicine”. DNA sequencing technologies produce extremely large amounts of raw data that are stored in various repositories around the world. One challenge is to efficiently handle the increasing flood of sequencing data. A second challenge is the ability to process such a flood of data in order to 1) expand scientific knowledge of genome sequence information and 2) search genome databases for diagnostic and therapeutic purposes. High-performance compression of genomic data is required to reduce the storage size and increase transmission speed of large data sets.

The current MPEG-G standard series (ISO/IEC 23092) deals with the representation, compression, and transport of genome sequencing data, with support for annotation data under development. These specifications provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for accessing genomic data in its native compressed format.

In response to a call for proposals issued at the 131st meeting, MPEG received submissions addressing low-complexity coding modes that directly improve coding and decoding speed to enable access to data with lower latency, and for advanced sequencing data and metadata indexing and search that can be applied to both aligned and unaligned data directly in the compressed domain. In addition, technologies for compressing and indexing of aligned and unaligned read data were proposed. MPEG is currently evaluating the integration of these new technologies into the MEPG-G standard series.

In line with MPEG’s traditional practice of continuously improving the quality and performance of its standards, MPEG issued a public Call for Evidence (CfE) at its 132nd meeting. This CfE aims to evaluate the performance of new technologies that 1) can demonstrate that the current compression, transport, and indexing technology of the ISO/IEC 23092 series can be improved with new compression technologies, especially for very long reads, and 2) can yield higher compression rate, support new features or improve the performance of other important metrics.

MPEG evaluates extensions and improvements to MPEG-G and announces Call for Evidence on new advanced genomics features and technologies

The extensive usage of high-throughput DNA sequencing technologies enables a new approach to healthcare known as “precision medicine”. DNA sequencing technologies produce extremely large amounts of raw data which are stored in different repositories worldwide. One challenge is to efficiently handle the increasing flood of sequencing data. A second challenge is the ability to process such a deluge of data in order to 1) increase the scientific knowledge of genome sequence information and 2) search genome databases for diagnosis and therapy purposes. High-performance compression of genomic data is required to reduce the storage size and increase transmission speed of large data sets.

Current MPEG-G standard series (ISO/IEC 23092) address the representation, compression and transport of genome sequencing data with support for annotation data under development. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of genomic data in the native compressed format.

In response to a Call for Proposals issued at the 131st meeting, MPEG is evaluating extensions to the MPEG-G standard series. Submissions have been received addressing low complexity coding modes to directly improve the speed of encoding and decoding to provide faster, reduced latency access to data, as well as advanced sequencing data and metadata indexing and search which can be applied to both aligned and unaligned data directly in the compressed domain. In addition, technologies have been proposed for compressing and indexing aligned and unaligned read data.

In line with the traditional MPEG practice of continuous improvement of the quality and performance of its standards, at its 1st SC29/WG8 meeting, MPEG has issued a public Call for Evidence (CfE). This CfE aims to assess the performance of new technologies that can 1) demonstrate that current compression, transport and indexing technology of ISO/IEC 23092 series can be improved with new compression technologies, particularly applied to very long reads, and 2) can yield higher compression rate, support new functionality or improve performance of other metrics.

A summary of the main achievements from the MPEG meeting can be found here.

Picture Source: StockSnap/ pixabay

MPEG Meeting

Due to the COVID-19 situation, the 131st MPEG meeting has been held online from July 6th until 10th, 2020. Nonetheless, many important topics have been discussed, and progress and improvements have been made in the specific ISO 23092 MPEG-G subject.

WG11 (MPEG) issues a Call for Proposals on extension and improvements to ISO/IEC 23092 standard series

The current MPEG-G standard series (ISO/IEC 23092) is the first generation of MPEG standards that address the representation, compression, and transport of genome sequencing data, supporting with a single unified approach data from the output of sequencing machines up to secondary and tertiary analysis. New technology for compressing and indexing a wide variety of annotation data is currently under advanced standardization phase. 

In line with the traditional MPEG practice of investigating and applying whenever possible improvements to the performance and functionality of its standards, at its 131st meeting, MPEG has issued a Call for Proposals (CfP) addressing two specific objectives: (i) to increase the speed performance of massively parallel codec implementations and (ii) to enable advanced queries and search capabilities on the compressed data.

Answers to the CfP are expected to be evaluated prior to the 132nd MPEG meeting. Best performing technology are expected to be introduced in a new high-performance profile of current ISO/IEC 23092 standard series.

During the Meeting MPEG published standard documents for MPEG-G including MPEG-G Genomic Information DatabaseFinal Joint Call for Proposals for extensions and improvements of ISO/IEC 23092 seriesFinal Requirements for ISO/IEC 23092 series extensions and Evaluation procedure for the Call for Proposals for extensions and improvements of ISO/IEC 23092 series.

A summary of the main achievements from the MPEG meeting can be found here.

Traditionally regrouping all the world’s MPEG experts in a single location, the 130th MPEG meeting has been held online due to the COVID-19 situation, from April 20th to 24th. Nonetheless, lots of important topics have been discussed and many progress and improvements have been made in the specific ISO 23092 MPEG-G subject.

The Workshop provides at first, as usual, an overview of MPEG-G the new ISO standard on the compression and optimized access to genomic information, its impact on the relevant industry, on the various related standardization initiatives, use cases, sequencing technology evolution and perspectives for standardization in other –omics fields.

In line with the traditional MPEG practice of continuous improvement of the quality and performance of its standards, at its 130th meeting, MPEG promoted to FDIS a new edition of Part 1 and 2 and to FDIS Part 4 “Reference Software” and Part 5 “Conformance”. Such components of the MPEG-G standard series provide important supports to those willing to implement the standard or interested to verify the correctness and interoperability of their own implementations.

Compared to the first edition, the second editions of ISO/IEC 23092-1 and ISO/IEC 23092-2, haves been improved by taking into accounts comments received from users.

The ISO/IEC 23092-4 (MPEG-G Reference Software) standard provides a normative implementation of the standard. In conjunction with the ISO/IEC 23092-5 (MPEG-G Conformance) standard, it provides a comprehensive specification and validation support for the development of conforming decoder implementations. Interoperability of applications relying on normative decoding processes is facilitated by a reference normative decoding process and a rich set of tests and corresponding golden references.

A workshop on applications of genomic information processing was hosted by Tencent in Shenzhen on 13th October 2018, the day after the closing of the 124th MPEG meeting in Macau.

The Workshop has provided an overview of MPEG-G the new ISO standard on the compression and optimized access to genomic information, its impact on the relevant industry, on the various related standardization initiatives, use cases, sequencing technology evolution and perspectives for standardization in other –omics fields.

Specifically the workshop addressed:

  • An overview of the ISO genomic compression standard and its new features and performance
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status and future perspectives of sequencing technology and genomic data generation
  • The vision of genomic information storage and processing on the cloud
  • The vision of further standardization objectives in the –omics fields

Final Program

Start End What Who
12:30 13:00 Registration  
13:00 13:10 Welcome & workshop goals Leonardo Chiariglione (MPEG Convener)
13:10 13:40 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)
13:40 14:10 “An overview of standardization progress in genomics data” Yong Zhang (ISO/TC 276/ WG2 & WG 5 Convenor)
14:10 14:40 “GSA: Genome Sequence Archive, in China” Yanqing Wang (BIG Data Center, BIG, CAS)
14:40 14:45 Short presentation of demos Alvaro G. Hernandez (UIUC DNA Services, USA)
14:50 15:20 Demo session and Coffee Break
15:20 15:50 “State-of-the-art and future of NGS, a standard perspective”  Ming NI (BGI-Shenzhen and MGI)
15:50 16:20 “Constructing an open ecosystem for bioinformatics and genomic big data” Chen Shifu (Haplox)
16:20 16:50 “Practice and Challenges of 20,000 human WGS data analysis on BGI Online” Kang FANG (BGI-Online, BGI)
16:50 17:20 Panel discussion,  Q&A and concluding remarks All speakers

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the workshop GenomSys showed demos, prototypes and products related to genome sequencing data processing analytics, compression and storage.

Whole Genome Sequencing Data Analysis
MPEG-G Genomic Information Representation

Workshop on Genomic Sequencing Data Compression

GA4GH – MPEG, Basel 3rd October 2018

Call for Contributions

The amount of genome sequencing data generated day by day is either comparable or larger than other big data problems. Shrinking costs of data alone do not provide affordable solutions to the ambition of making genomic medicine common practice. However, storage costs are not the only factor to consider because genomic data, once generated, have to be made available to the scientific community for frequent and repeated accesses.

Current sequencing technologies compensate the errors generated by intrinsic noisy processes by generating redundant data and associated metadata (i.e. quality values). Thus compression approaches are effective solutions to reduce and mitigate the costs and the technological limitations related to the handling of extremely large volumes of data.

The heterogeneity of genome sequencing data and the diversity of the available compression solutions pose several challenges to the quest for an ideal technology able to deliver, at the same time, high compression ratios, high coding and decoding speed, efficient selective access to data and guaranteed interoperability among applications while respecting a variety of data protection and privacy requirements.

The goal of this workshop is to collect technical contributions on emerging and new compression technologies with particular attention to:

  • DNA sequencing data compression
  • Selective access and processing in the compressed domain
  • Emerging standard frameworks for the specification, representation and compression of genomic sequencing data
  • Interoperability of genomic sequencing data formats, applications standard frameworks and APIs
  • Use cases and processing applications requiring genomic data/metadata compression and protection·

Interested authors are invited to submit an abstract of no more than 600 words (excluding pictures and graphics which are welcome) describing their technical work by 31st August 2018.

The submission should indicate the preferred form of the contribution:

  • Oral presentation
  • Poster
  • Demonstration

Submissions of abstracts must be sent by email to:

GA4GH assembly program

Registration

Genomic Information Representation Metadata

A workshop on applications of genomic information processing has been held on 18th April 2018 co-located with the 122nd MPEG meeting in San Diego.

The Workshop explored the opportunities for improved genome sequencing data processing services enabled by the availability in late 2018 of an ISO standard called MPEG-G on the compression of genomic information and its impact on the relevant industry.

Specifically the workshop addressed:

  • The perspectives and potential of genomic information usage in medicine and public health
  • The vision of interdisciplinary approaches to the analysis of genome sequencing data
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status, progresses of sequencing technology and associate data generation features
  • The reasons for supporting seamless availability and exchange of genome sequencing data for improving scientific progress yielded by wider data volume analysis
  • A status report on the development of the ISO genomic compression standard and an overview of its new features and performance

Venue:
San Diego Marriott La Jolla, 4240 La Jolla Village Drive
San Diego, CA 92037, United States
(see also the 122nd MPEG meeting for more details)

Organizing Committee:
Joern Ostermann (TNT-LUH), Claudio Alberti (GenomSys), Rongshan Yu (Aginome Scientific), Tom Paridaens (imec and UGent)

Program

Start End What Who
12:30 13:00 Registration  
13:00 13:15 Welcome & workshop goals  
13:15 13:40 “Genome and medical information portability, retrieval and analysis” Amalio Telenti (Scripps Research Institute, USA)
13:40 14:05 “From womb to tomb sequencing: on the advantages on bringing multidisciplinary R&D to develop standards and analytics” Ioannis Xenarios, (SIB Switzerland)
14:05 14:30  “Future of Genomics and Big Data” Dawn Barry (Luna DNA, USA)
14:30 14:55 “Generation and Management of Large Sequence Files: Perspectives from the DNA Sequencing Core” Alvaro G. Hernandez (UIUC DNA Services, USA)
14:55 15:10 Presentation of demonstrations GenomSys, Aginome Scientific
15:10 15:40 Demo session and Coffee Break  
15:40 16:05 “The role of compression in the genomics data life cycle” Come Raczy (Illumina Inc., USA)
16:05 16:30 Genomics at Rady’s Children Hospital San Diego Ray Veeraraghavan (Rady’s Children Hospital San Diego, USA)
16:30 16:55 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the workshop GenomSys and Aginome Scientific showed demos, prototypes and products related to genome sequencing data processing analytics, compression and storage.