Skip to main content

Open Molecular Crystals 2025 Dataset and Models

May 21, 2026

The development of accurate and efficient machine learning models for predicting the structure and properties of molecular crystals has been hindered by the scarcity of publicly available datasets with property labels. To address this challenge, the Open Molecular Crystals 2025 (OMC25) dataset was introduced. This is a collection of over 27 million molecular crystal structures containing 12 elements and up to 300 atoms in the unit cell. The dataset was created by relaxing over 230,000 randomly constructed molecular crystal structures—representing approximately 50,000 organic molecules—using dispersion inclusive density functional theory (DFT). OMC25 comprises diverse chemical compounds capable of forming different intermolecular interactions and a wide range of crystal packing motifs. Information is provided on the dataset’s construction, composition, and properties. To demonstrate the quality and use cases of OMC25, state-of-the-art open-source machine learning interatomic potentials were trained and evaluated. By making this dataset publicly available, the development of accurate and efficient machine learning models for molecular crystals will be accelerated.

Authors

Noa Marom (Carnegie Mellon U.)

Additional Materials

NSF Logo

Any opinions, findings, and conclusions or recommendations expressed on this website are those of the participants and do not necessarily reflect the views of the National Science Foundation or the participating institutions. This site is maintained collaboratively by principal investigators with Designing Materials to Revolutionize and Engineer our Future awards, independent of the NSF.

DMREF Logo