The extensive usage of high-throughput deoxyribonucleic acid (DNA) sequencing technologies opens up new perspectives in the treatment of several diseases and enables the implementation of a new approach to healthcare known as “precision medicine”. DNA sequencing technologies produce extremely large amounts of raw data which are stored in different repositories worldwide. The processing, analysis, and comparison of such distributed data is a fundamental element for the effective usage of sequencing data for clinical and scientific purposes. Standard Application Program Interfaces (APIs) and Metadata, obviously, are the basis for interoperable and automated data access and processing systems that can efficiently operate on the worldwide available sets of sequencing data.

The MPEG-G standard, jointly developed by WG 11 (MPEG) and ISO Technical Committee for biotechnology standards (ISO TC 276/WG 5), is the first international standard to address and solve the problem of efficient and cost-effective handling of genomic data by providing, not only new compression and transport technologies (ISO/IEC 23092-1/2), but also a standard specification associating relevant information in the form of metadata and a rich set of APIs for data access and mining, for building a full ecosystem of interoperable applications capable of efficiently processing sequencing data.

At the meeting, the third part of the MPEG-G specifications, Application Program Interfaces and Metadata (ISO/IEC 23092-3) has been promoted to Final Draft International Standard (FDIS) stage. Such part of the standard will enable the industry to rely on a final specification in October 2019.

A workshop on applications of genomic information processing was hosted by Tencent in Shenzhen on 13th October 2018, the day after the closing of the 124th MPEG meeting in Macau.

The Workshop has provided an overview of MPEG-G the new ISO standard on the compression and optimized access to genomic information, its impact on the relevant industry, on the various related standardization initiatives, use cases, sequencing technology evolution and perspectives for standardization in other –omics fields.

Specifically the workshop addressed:

  • An overview of the ISO genomic compression standard and its new features and performance
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status and future perspectives of sequencing technology and genomic data generation
  • The vision of genomic information storage and processing on the cloud
  • The vision of further standardization objectives in the –omics fields

Final Program

Start End What Who
12:30 13:00 Registration  
13:00 13:10 Welcome & workshop goals Leonardo Chiariglione (MPEG Convener)
13:10 13:40 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)
13:40 14:10 “An overview of standardization progress in genomics data” Yong Zhang (ISO/TC 276/ WG2 & WG 5 Convenor)
14:10 14:40 “GSA: Genome Sequence Archive, in China” Yanqing Wang (BIG Data Center, BIG, CAS)
14:40 14:45 Short presentation of demos Alvaro G. Hernandez (UIUC DNA Services, USA)
14:50 15:20 Demo session and Coffee Break
15:20 15:50 “State-of-the-art and future of NGS, a standard perspective”  Ming NI (BGI-Shenzhen and MGI)
15:50 16:20 “Constructing an open ecosystem for bioinformatics and genomic big data” Chen Shifu (Haplox)
16:20 16:50 “Practice and Challenges of 20,000 human WGS data analysis on BGI Online” Kang FANG (BGI-Online, BGI)
16:50 17:20 Panel discussion,  Q&A and concluding remarks All speakers

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the workshop GenomSys showed demos, prototypes and products related to genome sequencing data processing analytics, compression and storage.

MPEG-G Standard
DNA Sequence Database Storage

13:00 – 18:00, 13th October 2018

Shenzhen (CN)

A workshop on applications of genomic information processing will be held on 13th October 2018 one day after the 124th MPEG meeting.

The Workshop intends to provide an overview of MPEG-G the new ISO standard on the compression of and optimized access to genomic information, its impact on the relevant industry, on the various related standardization initiatives, use cases, sequencing technology evolution and perspectives for standardization in other –omics fields.

Specifically the workshop addresses:

  • An overview of the ISO genomic compression standard and its new features and performance
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status and future perspectives of sequencing technology and genomic data generation
  • The vision of genomic information storage and processing on the cloud
  • The vision of further standardization objectives in the –omics fields

The workshop is open to the public and interested parties who want to learn more on the perspectives of genomic data processing applications and on new technologies for the processing of genome sequencing data.

Registration is free of charge. To register (only for logistic purposes), please send an email to Massimo Ravasi.

Date: 13th October (Saturday), 2018

Program: 13:00 – 18:00

Venue: 2F Function Room, Tencent Building, No. 10000 Shennan Avenue, NanShan District, Shenzhen , Guangdong province, China

 广省深圳市南山区深南大道10000腾讯大厦二楼多功能

 Organizing Committee:

Joern Ostermann (LUH), Rongshan Yu (AGINOME Scientific), Claudio Alberti (GenomSys), Tom Paridaens (imec and UGent)

Preliminary Program

Start End What Who
12:30 13:00 Registration
13:00 13:10 Welcome & workshop goals Leonardo Chiariglione
13:10 13:40 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)
13:40 14:10 “An overview of standardization initiatives on genomic data” Yong Zhang (ISO/TC 276/ WG2 & WG 5 Convenor)
14:10 14:40 “GSA: Genome Sequence Archive, in China” Yanqing Wang (BIG Data Center, BIG, CAS)
14:40 14:50 Short presentation of demos Demonstrators companies
14:50 15:20 Demo session and Coffee Break
15:20 15:50 “State-of-the-art and future of NGS, a standard perspective” Ming NI (BGI-Shenzhen and MGI)
15:50 16:20 “Constructing an open ecosystem for bioinformatics and genomic big data” Chen Shifu (Haplox)
16:20 16:50 “Practice and Challenges of 20,000 human WGS data analysis on BGI Online” Kang FANG (BGI-Online, BGI)
16:50 17:20 Panel discussion,  Q&A and concluding remarks All speakers
17:20 18:00 Demo session continues

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the WS it will be possible to show demos and to present prototypes and products related to genome sequencing data processing analytics, compression and storage to workshop participants.

Whole Genome Sequencing Data Analysis
MPEG-G Genomic Information Representation

Workshop on Genomic Sequencing Data Compression

GA4GH – MPEG, Basel 3rd October 2018

Call for Contributions

The amount of genome sequencing data generated day by day is either comparable or larger than other big data problems. Shrinking costs of data alone do not provide affordable solutions to the ambition of making genomic medicine common practice. However, storage costs are not the only factor to consider because genomic data, once generated, have to be made available to the scientific community for frequent and repeated accesses.

Current sequencing technologies compensate the errors generated by intrinsic noisy processes by generating redundant data and associated metadata (i.e. quality values). Thus compression approaches are effective solutions to reduce and mitigate the costs and the technological limitations related to the handling of extremely large volumes of data.

The heterogeneity of genome sequencing data and the diversity of the available compression solutions pose several challenges to the quest for an ideal technology able to deliver, at the same time, high compression ratios, high coding and decoding speed, efficient selective access to data and guaranteed interoperability among applications while respecting a variety of data protection and privacy requirements.

The goal of this workshop is to collect technical contributions on emerging and new compression technologies with particular attention to:

  • DNA sequencing data compression
  • Selective access and processing in the compressed domain
  • Emerging standard frameworks for the specification, representation and compression of genomic sequencing data
  • Interoperability of genomic sequencing data formats, applications standard frameworks and APIs
  • Use cases and processing applications requiring genomic data/metadata compression and protection·

Interested authors are invited to submit an abstract of no more than 600 words (excluding pictures and graphics which are welcome) describing their technical work by 31st August 2018.

The submission should indicate the preferred form of the contribution:

  • Oral presentation
  • Poster
  • Demonstration

Submissions of abstracts must be sent by email to:

GA4GH assembly program

Registration

MPEG-G Standard

The extensive usage of high-throughput deoxyribonucleic acid (DNA) sequencing technologies opens up new perspectives in the treatment of several diseases and enables “precision medicine”. As DNA sequencing technologies produce extremely large amounts of raw data, the ICT costs for the storage, transmission, and processing of DNA sequence data and related information, result to be very high due to the lack of universal standards preventing timely application of effective treatments.

The MPEG-G standard jointly developed by MPEG and ISO Technical Committee for biotechnology standards (ISO TC 276/WG 5) is the first international standard to address and solve the problem of efficient and cost-effective handling of genomic data by providing, not only new compression and transport technologies, but also a family of standard specifications associating relevant information in the form of metadata and a rich set of Application Programming Interfaces (APIs) for building a full ecosystem of interoperable applications and services capable of efficiently processing sequencing data.

At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. This will enable the industry to rely on a final specification in October 2018. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Beside standardization achievements, a workshop on the “applications of genomic information processing” has been held in conjunction with the 122nd MPEG meeting discussing requirements, open problems of genome information processing, and solutions provided by MPEG-G standards. Use cases representative of selective remote access with streaming and the execution of the Genome Analysis Toolkit (GATK) and equivalent processing pipelines using sequencing data in MPEG-G compressed forms have also been demonstrated.

Genomic Information Representation Metadata

A workshop on applications of genomic information processing has been held on 18th April 2018 co-located with the 122nd MPEG meeting in San Diego.

The Workshop explored the opportunities for improved genome sequencing data processing services enabled by the availability in late 2018 of an ISO standard called MPEG-G on the compression of genomic information and its impact on the relevant industry.

Specifically the workshop addressed:

  • The perspectives and potential of genomic information usage in medicine and public health
  • The vision of interdisciplinary approaches to the analysis of genome sequencing data
  • The challenges for the generation and management of very large volumes of genome sequencing data
  • The status, progresses of sequencing technology and associate data generation features
  • The reasons for supporting seamless availability and exchange of genome sequencing data for improving scientific progress yielded by wider data volume analysis
  • A status report on the development of the ISO genomic compression standard and an overview of its new features and performance

Venue:
San Diego Marriott La Jolla, 4240 La Jolla Village Drive
San Diego, CA 92037, United States
(see also the 122nd MPEG meeting for more details)

Organizing Committee:
Joern Ostermann (TNT-LUH), Claudio Alberti (GenomSys), Rongshan Yu (Aginome Scientific), Tom Paridaens (imec and UGent)

Program

Start End What Who
12:30 13:00 Registration  
13:00 13:15 Welcome & workshop goals  
13:15 13:40 “Genome and medical information portability, retrieval and analysis” Amalio Telenti (Scripps Research Institute, USA)
13:40 14:05 “From womb to tomb sequencing: on the advantages on bringing multidisciplinary R&D to develop standards and analytics” Ioannis Xenarios, (SIB Switzerland)
14:05 14:30  “Future of Genomics and Big Data” Dawn Barry (Luna DNA, USA)
14:30 14:55 “Generation and Management of Large Sequence Files: Perspectives from the DNA Sequencing Core” Alvaro G. Hernandez (UIUC DNA Services, USA)
14:55 15:10 Presentation of demonstrations GenomSys, Aginome Scientific
15:10 15:40 Demo session and Coffee Break  
15:40 16:05 “The role of compression in the genomics data life cycle” Come Raczy (Illumina Inc., USA)
16:05 16:30 Genomics at Rady’s Children Hospital San Diego Ray Veeraraghavan (Rady’s Children Hospital San Diego, USA)
16:30 16:55 “An overview of the MPEG-G standard for the compression and processing of genomic sequencing data” Marco Mattavelli (EPFL, Switzerland)

Demonstrations of genome sequencing data processing prototypes and products

Co-located with the workshop GenomSys and Aginome Scientific showed demos, prototypes and products related to genome sequencing data processing analytics, compression and storage.