Software for the paper [doi]
python 3
pytorch
rdkit
Some environment variables need to be exported:
export GACONF=path/to/GACONF/download
export EVOSRC=path/to/EVOSRC/download
export SCRIPTS=path/to/SCRIPTS/download
GENERA runs in a working directory where all temporary and output files will be saved. The working directory should start with a file called done_so_far that contains the starting population. GENERA will update this file at each iteration with new entries. At the end of the execution, the done_so_far file will contain the generated molecules and with all the required objectives.
The done_so_far file is the starting input file and will be the final output file. This file will contain each entry as a line. For each entry, there should be: <SMILES> = <objective 1> <objective 2> …
Where:
<SMILES> is the SMILES string of the generated molecules devoid of stereochemical notation.
<objective n> is the nth objective function for the Pareto multiobjective optimization. The content of these lines depends on the chosen mode. See the paper for examples of objective functions.
The software should be called using the command: $GACONF/pilot_local.csh workdir=<path_to_working_directory> recdir=<argument_for_specific_mode> minpop=1 mode=<mode_name> cont=yes nc=<number_of_CPUs> noprog=<stop_criterion_1> maxconfigs=<stop_criterion_2>
Where:
- workdir should be set to the path (either absolute or relative) to the chosen working directory
- mode is the name for the chosen mode. The software will look for a .csh file in the $GACONF folder. We provide the denovoDockpareto mode that reproduces Experiment 1 from the paper.
- nc is the number of CPUs to be used. By default, nc equals the number of CPUs available on the current machine.
- noprog is a stopping criterion on the GA iterations. The algorithm will stop running when fewer new molecules than noprog are produced.
- maxconfings is a stopping criterion on GA iterations. The algorithm will stop when a total number of maxconfig molecules is found in the done_so_far file.
-noprog. When the number of novel valid molecules is below a given number, the pilot script will stop execution.
-reaching maxconfig. When the done_so_far file contains a number of rows greater or equal to maxconfigs, the pilot script will stop.
-Manual stopping. To stop the execution correctly and cleanly, the user should create a file called stop_now in the working directory. When the stop_now file is found in the working directory, the pilot will end the current iteration and stop. NB: killing the pilot script execution is not guaranteed to stop all the running subprocesses, so please use the "stop_now file" method
We provide the necessary files and data to replicate the Experiment 1 from the paper. Remember that this mode has specific dependencies, namely the S4MPLE software (available at https://infochim.u-strasbg.fr/) and the ChemAxon package.