Skip to content

GiuseppeLamanna/GENERA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README for GENERA

Software for the paper [doi]

REQUIREMENTS

python 3
pytorch
rdkit

GENERAL SETUP

Some environment variables need to be exported:
export GACONF=path/to/GACONF/download
export EVOSRC=path/to/EVOSRC/download
export SCRIPTS=path/to/SCRIPTS/download

WORKDIR SETUP

GENERA runs in a working directory where all temporary and output files will be saved. The working directory should start with a file called done_so_far that contains the starting population. GENERA will update this file at each iteration with new entries. At the end of the execution, the done_so_far file will contain the generated molecules and with all the required objectives.

THE DONE_SO_FAR FILE

The done_so_far file is the starting input file and will be the final output file. This file will contain each entry as a line. For each entry, there should be: <SMILES> = <objective 1> <objective 2> …

Where:
<SMILES> is the SMILES string of the generated molecules devoid of stereochemical notation.
<objective n> is the nth objective function for the Pareto multiobjective optimization. The content of these lines depends on the chosen mode. See the paper for examples of objective functions.

START COMMAND

The software should be called using the command: $GACONF/pilot_local.csh workdir=<path_to_working_directory> recdir=<argument_for_specific_mode> minpop=1 mode=<mode_name> cont=yes nc=<number_of_CPUs> noprog=<stop_criterion_1> maxconfigs=<stop_criterion_2>

Where:

  • workdir should be set to the path (either absolute or relative) to the chosen working directory
  • mode is the name for the chosen mode. The software will look for a .csh file in the $GACONF folder. We provide the denovoDockpareto mode that reproduces Experiment 1 from the paper.
  • nc is the number of CPUs to be used. By default, nc equals the number of CPUs available on the current machine.
  • noprog is a stopping criterion on the GA iterations. The algorithm will stop running when fewer new molecules than noprog are produced.
  • maxconfings is a stopping criterion on GA iterations. The algorithm will stop when a total number of maxconfig molecules is found in the done_so_far file.
The user may rely on the currently provided $GACONF/local_denovoDockpareto.csh to change the score criteria defining the Pareto front. The resulting script may be saved under a new name local_.csh, which will run when calling pilot_local.csh with mode=new_mode_name as an argument. However, for consistency reasons, before the execution, two parameter files .rng and .pars need to be formally created in $GACONF. These are independent of the scoring details in .csh and therefore are best created by copying or linking to provided denovoDockpareto.{pars,rng} files.

EXIT CONDITIONS

-noprog. When the number of novel valid molecules is below a given number, the pilot script will stop execution.

-reaching maxconfig. When the done_so_far file contains a number of rows greater or equal to maxconfigs, the pilot script will stop.

-Manual stopping. To stop the execution correctly and cleanly, the user should create a file called stop_now in the working directory. When the stop_now file is found in the working directory, the pilot will end the current iteration and stop. NB: killing the pilot script execution is not guaranteed to stop all the running subprocesses, so please use the "stop_now file" method

EXAMPLE

We provide the necessary files and data to replicate the Experiment 1 from the paper. Remember that this mode has specific dependencies, namely the S4MPLE software (available at https://infochim.u-strasbg.fr/) and the ChemAxon package.

CONTACTS

[email protected]
[email protected]
[email protected]

About

Companion code to the paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published