Open Molecular Crystals 2025 Dataset and Models
The development of accurate and efficient machine learning models for predicting the structure and properties of molecular crystals has been hindered by the scarcity of publicly available datasets with property labels. To address this challenge, the Open Molecular Crystals 2025 (OMC25) dataset was introduced. This is a collection of over 27 million molecular crystal structures containing 12 elements and up to 300 atoms in the unit cell. The dataset was created by relaxing over 230,000 randomly constructed molecular crystal structures—representing approximately 50,000 organic molecules—using dispersion inclusive density functional theory (DFT). OMC25 comprises diverse chemical compounds capable of forming different intermolecular interactions and a wide range of crystal packing motifs. Information is provided on the dataset’s construction, composition, and properties. To demonstrate the quality and use cases of OMC25, state-of-the-art open-source machine learning interatomic potentials were trained and evaluated. By making this dataset publicly available, the development of accurate and efficient machine learning models for molecular crystals will be accelerated.