Skip to content

NicolaiRee/smi2gcs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMI2GCS

With SMI2GCS you can generate atomic descriptors from SMILES. The atomic descriptors are based on convolutions of CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB).

More information about the method is available in the RegioML paper. Including: 1, 2, 3, 4, and 5.

Open In Colab

Installation

We recommend using anaconda to install the Python 3 environment:

conda env create -f environment.yml && conda activate smi2gcs

Then download the binaries of xtb version 6.4.0:

mkdir dep; cd dep; wget https://github.com/grimme-lab/xtb/releases/download/v6.4.0/xtb-210201.tar.xz; tar -xvf ./xtb-210201.tar.xz; cd ..

Details on the sorting algorithm

Sort each shell according to a modified version of the Cahn-Ingold-Prelog (CIP) priority rules and the CM5 charges if CIP is unambiguous:

  1. Sort according to atomic number in descending order.
  2. If (1) is not unique, for each atom with the same priority (A*):
    1. Go to bound and yet not included atoms and sum up atomic numbers. Set the priority of A* according to the sum of the atomic numbers.
    2. If (2i) did not give an unambiguous result expand the shell of each atom A* by one bond.
    3. Repeat (2ii) until a unique order is found.
  3. If no unique order is found in (2) and all bound atoms are included, then sort atoms according to the CM5 charges in descending order.

Citation

@article{Ree2022,
  title = {RegioML: predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning},
  volume = {1},
  ISSN = {2635-098X},
  url = {http://dx.doi.org/10.1039/D1DD00032B},
  DOI = {10.1039/d1dd00032b},
  number = {2},
  journal = {Digital Discovery},
  publisher = {Royal Society of Chemistry (RSC)},
  author = {Nicolai Ree and Andreas H. G\"{o}ller and Jan H. Jensen},
  year = {2022},
  pages = {108–114}
}

About

Generate CM5 atomic descriptors from SMILES

Resources

License

Stars

Watchers

Forks

Languages