Documents

This page contains documents relevant to the MPEG-G standard and the ecosystem of applications built around it.

MPEG-G documents

TypeTitleDescription
MPEG-G white paperAn overview of the MPEG-G standard and its main features
3rd MPEG Workshop on Genomic Information RepresentationSlides presented at the Workshop on Genomic Information Representation held in San Diego on 18th April 2018
2nd MPEG Workshop on Genomic Information Representation, "From Standards to Deployment"Slides presented at the Workshop on Genomic Information Representation held in Torino on 19th July 2017
1st MPEG Workshop on Genomic Information RepresentationSlides presented at the Workshop on Genomic Information Representation held in San Diego on 23rd February 2016
MPEG Seminar on Prospects on Genome Compression StandardizationSlides presented at the Seminar on Prospects on Genome Compression Standardization held in Geneva on 20th October 2015

Publications

TitleReferenceDescription
MPEG-G Reference-Based Compression of Unaligned Reads Through Ultra-Fast AlignmentsU. Ozturk, S. Casale-Brunet, P. Ribeca, M. Mattavelli; 2022 Data Compression Conference (DCC), Snowbird, UT, USA, 2022, pp. 01-01,This publication illustrates the ISO/IEC MPEG-G standard's compression in raw and aligned data to alleviate bandwidth, transfer, and storage requirements of genomics pipelines.
A Benchmark of Entropy Coders for the Compression of Genome Sequencing DataS. Casale-Brunet, P. Ribeca, C. Alberti, U. Ozturk, M. Mattavelli ; Journal of personalized medicine 12.6 (2022): 915In this paper, a variety of entropy encoders and com-pression algorithms were benchmarked in terms of compression-decompression rates and times separately for each data field as raw data from FASTQ files and in MPEG-G uncompressed descriptor symbols de-coded from MPEG-G bitstreams.
Implementation of Privacy and Security for a Genomic Information System Based on StandardsLlorente Silvia, Jaime Delgado; Proceedings of the IEEE, vol. 109, no. 9, pp. 1607-1622, Sept. 2021,This paper describes the key role of privacy provision to protecting genomic information from unauthorized access by proposing the GIPAMS (Genomic Information Protection And Management System) modular architecture, which is based on the use of standards such as ISO/IEC 23092 and other initiatives.
An Introduction to MPEG-G: The First Open ISO/IEC Standard for the Compression and Exchange of Genomic Sequencing DataJ. Voges, M. Hernaez, M. Mattavelli, J. Ostermann; Stud Health Technol Inform. 2021 Oct 27;285:253-258.This publication describes the benefits of the ISO/IEC 23092 series, known as MPEG-G, besides the higher levels of compression but it also provides new functionalities, such as built-in support for random access in the compressed domain, support for data protection mechanisms, flexible storage, and streaming capabilities.
Implementation of Privacy and Security for a Genomic Information SystemDelgado J, Llorente S, Reig G; Multimedia Tools and Applications 80.13 (2021): 20599-20618.This paper describes the way how handle genomic information for its high privacy and security requirements. The proposed GIPAMS modular architecture provides a secure and controlled access to genomic information, which may help on improving personalized medicine as described in this paper.
Side channel attack on a partially encrypted MPEG-G fileDaniel Naro, Jaime Delgado Mercè, Silvia Llorente Viejo; Bioinformatics 36.7 (2020): 2275-2277.This paper discusses the case of an attack through the use of an unencrypted stream to deduce the encrypted content if streams are encrypted separately. To do so, it presents two different attacks, one based on signal processing and the other one based on neural networks.
GABAC: an arithmetic coding solution for genomic dataJan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez ; Stud Health Technol Inform 275 (2020): 37-41.This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data.
Security and privacy when applying FAIR principles to genomic informationJaime Delgado Mercè, Silvia Llorente Viejo; Multimedia Tools and Applications 79.11-12 (2020): 8161-8180.This paper analyses some of the issues related to the FAIRification process when the objective is sharing genomic information. The main results are the identification of the already existing standards that could be used for this purpose and how to combine them.
Reversible fingerprinting for genomic informationDaniel Naro, Jaime Delgado Mercè, Silvia Llorente Viejo; Studies in health technology and informatics 258 (2019): 75-79Paper on watermarking the genomic information. Each read in a genomic file is modified depending on its content and a secret key. This allows generating different watermarked instances of the original file. Each watermark acts as a fingerprint: if a leak occurs, the unique modifications of the instance points to who originated the unauthorized publication.
Adding security and privacy to genomic information representationDelgado Mercè, Jaime, Silvia Llorente Viejo, Daniel Naro; In Proceedings of the 2019 4th International Conference on Biomedical Imaging, Signal Processing (ICBSP '19). Association for Computing Machinery, New York, NY, USA, 12–17.Overview paper on how the ISO/IEC 23092 (MPEG-G) standard series can provide flexible protection to the genomic information stored inside the MPEG-G format with a combination of security techniques and privacy rules.
On the Privacy of Genomic Big Data and EHR Standardization and RegulationItaru Kaneko, Emi Yuda; Annual Review of Biomedical Data Science 2 (2019): 19-37.Summary of recent situations of Genomic Information and Electronic Health Data (EHR), including the standardization of Genomic information representation and regulations in various countries on the privacy of medical and health information. As well give an outlook of the possible technologies and social practices to empower the privacy of genomic information.
Genomic data compressionMikel Hernaez, Dmitri Pavlichin, Tsachy Weissman, Idoia Ochoa; bioRxiv (2018): 426353.Review paper on the need for designing specialized compressors tailored to genomic data and suggestions of general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.
An introduction to MPEG-G, the new ISO standard for genomic information representationClaudio Albert, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Paolo Ribeca, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez; 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2018.Overview paper of the MPEG-G specification, with particular focus on the main advantages and novel functionality it offers. As the standard only specifies the decoding process, encoding performance, both in terms of speed and compression ratio, can vary depending on specific encoder implementations, and will likely improve during the lifetime of MPEG-G. Hence, the performance statistics provided here are only indicative baseline examples of the technologies included in the standard.
Improving coding efficiency of mpeg-g standard using context-based arithmetic codingWenxian Yang, Yating Lin, Shiyao Wu, Rongshan Yu; Bioinformatics, Oxford University Press, Vol. 34, No. 10, pp. 1650-1658, May 2018, edited by Bonnie BergerA paper on the improved method for lossless compression of nucleobase quality value, one of the most challenging parts from genomic data for data compression due to its high entropy.
CALQ: compression of quality values of aligned sequencing dataJan Voges, Jörn Ostermann, Mikel Hernaez; Nature Methods, Nature Publishing Group, Vol. 13, No. 12, pp. 1005-1008, October 2016A paper on the quality score compression technology that can be used for aligned data in MPEG-G
Comparison of high-throughput sequencing data compression toolsIbrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp; Poster abstracts of the 25th German Conference on Bioinformatics (PeerJ Preprints), PeerJ, Vol. 5, p. 2, Tübingen (DE), September 2017An overview paper about the performance of a comprehensive set of sequencing data compression tools of which most can be used in an MPEG-G compliant encoder
MPEG-G: The Emerging Standard for Genomic DataJan Voges, Jörn Ostermann; F1000Research (Presented at: Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017), International Society for Computational Biology (ISCB), Vol. 6, p. 1382 (poster), Prague (CZ), August 2017A poster summarizing the MPEG-G features and performance as of mid 2017
CALQ: compression of quality values of aligned sequencing data
Jan Voges, Jörn Ostermann, Mikel Hernaez; 2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 221-230, Snowbird, UT (US), April 2016
A poster on the quality score compression technology that can be used for aligned data in MPEG-G
An Evaluation Framework for Lossy Compression of Genome Sequencing Quality ValuesClaudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger; 2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 241-250, Snowbird, UT (US), April 2016A paper on the framework that was used to assess the impact of quantizing quality scores
Predictive Coding of Aligned Next-Generation Sequencing DataJan Voges, Marco Munderloh, Jörn Ostermann; 2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 241-250, Snowbird, UT (US), April 2016A paper on a reference-free read compression algorithm that can be used in an MPEG-G compliant encoder

Other documents

TBD