Skip to content

Latest commit

 

History

History
10 lines (5 loc) · 354 Bytes

README.md

File metadata and controls

10 lines (5 loc) · 354 Bytes

bio-datasets

bio-datasets

PubChem Compound Dataset

Processing and convering PubChem Compoud Dataset can be found in datasets/pubchem. The process_data.py script downloads the SDF file, converts the canonical SMILES representation to SELFIES, and saves it in a jsonl file.