Unlock Rapid Analyses Across the Whole PDB Using BinaryCIF
Webinar hosted by RCSB PDB and Rutgers Institute for Quantitative Biomedicine and Part of ISCBAcademy and the Rutgers Artificial Intelligence and Data Collaboratory | November 4, 2024
As macromolecular structures available through the Protein Data Bank (PDB) archive continue to grow in complexity and size, traditional text data formats like PDBx/mmCIF and the legacy PDB file format are becoming increasingly inefficient for transfer and parsing. To support scalable data analysis, binary formats and compression techniques are now essential. Learn how to future-proof your data analysis with BinaryCIF, a fully interchangeable yet drastically more efficient flavor of the PDBx/mmCIF format. BinaryCIF not only boosts storage efficiency, but also substantially improves parsing speed, making it ideal for large-scale analyses. BinaryCIF is supported by resources such as RCSB PDB, PDBe, and AlphaFold DB.
After watching the videos featured in this course, you will be able to:
- Understand the basics of the PDBx/mmCIF schema
- Access BinaryCIF files and related APIs on RCSB.org
- Programmatically consume BinaryCIF data and convert between formats
- Compute archive-wide statistics across the entire PDB
- Gain hands-on experience with our Python parser
Additional materials for this course are available:
- Presentation Slides
- Python and Java tools/resources
- Open the Jupyter Python notebook in Google Colab
Click on the image below to play the video.
Introduction
Yana Rose
Scientific Software Developer and Data Architect, RCSB PDB/UCSD
CIF, mmCIF, and BinaryCIF Basics. Create archive-wide statistics with Java
Sebastian Bittrich
Scientific Software Developer, RCSB PDB/UCSD
Working with mmCIF and BinaryCIF in Python
Dennis W. Piehl
Scientific Software Developer and FAIR Manager, RCSB PDB/Rutgers
Q&A
Yana Rose, Sebastian Bittrich, Dennis W. Piehl