Skip to main content

Open Molecular Crystals 2025 Dataset and Models

May 21, 2026

The development of accurate and efficient machine learning models for predicting the structure and properties of molecular crystals has been hindered by the scarcity of publicly available datasets with property labels. To address this challenge, the Open Molecular Crystals 2025 (OMC25) dataset was introduced. This is a collection of over 27 million molecular crystal structures containing 12 elements and up to 300 atoms in the unit cell. The dataset was created by relaxing over 230,000 randomly constructed molecular crystal structures—representing approximately 50,000 organic molecules—using dispersion inclusive density functional theory (DFT). OMC25 comprises diverse chemical compounds capable of forming different intermolecular interactions and a wide range of crystal packing motifs. Information is provided on the dataset’s construction, composition, and properties. To demonstrate the quality and use cases of OMC25, state-of-the-art open-source machine learning interatomic potentials were trained and evaluated. By making this dataset publicly available, the development of accurate and efficient machine learning models for molecular crystals will be accelerated.

Authors

Noa Marom (Carnegie Mellon U.)

Additional Materials

U.S. National Science Foundation and NSF DMREF, Materials for Our Future

This material is based upon work supported by the U.S. National Science Foundation Award No. 2015237. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the U.S. National Science Foundation. This site is maintained collaboratively by principal investigators with NSF DMREF awards, independent of the NSF.