Skip to content

Commit 0b15c69

Browse files
authored
Update README.md
1 parent d1395f9 commit 0b15c69

File tree

1 file changed

+38
-11
lines changed

1 file changed

+38
-11
lines changed

example/README.md

+38-11
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,58 @@ For a detailed description of the file structure please take a look at [this REA
77
**This example assumes it is run from within the base directory of this repository (`image-dms`).**
88

99
## Pre-training dataset creation (optional)
10-
For a detailed description of the used parameters run `python3 d4_pt_dataset.py`
10+
This will create a dataset containing pseudo-scores in the `out_path` end will be similar to the output shown in `gb1None.tsv`.
1111

12-
`python3 d4_pt_dataset.py --p_name gb1 --pdb_file example/gb1.pdb --algn_path example/gb1_1000_experimental.clustal --p_data --p_firstind 0 --p_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --out_path example/ --name_var variant --name_nmut num_mutations --name_score score`
12+
```sh
13+
python3 d4_pt_dataset.py --p_name gb1 --pdb_file example/gb1.pdb --algn_path example/gb1_1000_experimental.clustal --p_data --out_path example/ --p_firstind 0 --p_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --name_var variant --name_nmut num_mutations --name_score score
14+
```
15+
16+
For a detailed description of the used parameters run
17+
```sh
18+
python3 d4_pt_dataset.py -h
19+
```
1320

1421
## Pre-training a network (optional)
15-
For a detailed description of the used parameters run `python3 d4_cmd_driver.py`
22+
This will pre-train a model on the pseudo-scores in `gb1None.tsv`, store a model with the best weights based on the validation statistics and one from the end of the training which has the suffix `_end` in `result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/`, will store the used parameters in `result_files/log_file.csv` and the training statistics in `result_files/results.csv`.
23+
It can be used in the next step as a starting point for training a network on experimentally determined data.
1624

17-
`python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model `
25+
```sh
26+
python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model
27+
```
1828

19-
This will save the network trained on the pre-training dataset created in the previous step. It can be used in the next step as a starting point for training a network on experimentally determined data.
20-
The saved network can be found in `result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/` where the time stamp will depend on the time the training was started.
29+
For a detailed description of the used parameters run
30+
```sh
31+
python3 d4_cmd_driver.py -h
32+
```
2133

2234
## Training a network with experimental data
2335
In order to train a network on experimentally determined data use one of the methods mentioned below.
2436

25-
This will train a network on a 0.7-0.15-0.05 training-validation-test split and save the trained model in `result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/`
37+
This will train a network on a 0.7-0.15-0.05 training-validation-test split and save the trained model in `result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/` as well as the used parameters in `result_files/log_file.csv` and the training and test statistics in `result_files/results.csv`.
2638

2739
### Training without using a pre-trained model
28-
`python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model`
40+
```sh
41+
python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model
42+
```
2943

3044
### Training by using a pre-trained model
31-
`python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model --transfer_conv_weights result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/ --train_conv_layers`
45+
*CAUTION check the correct time stamp for the pre-trained model*
46+
```sh
47+
python3 d4_cmd_driver.py --query_name gb1 --alignment_file example/gb1_1000_experimental.clustal --tsv_filepath example/gb1None.tsv --pdb_filepath example/gb1.pdb --number_mutations num_mutations --variants variant --score score --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0 --training_epochs 100 --split0 0.7 --split1 0.15 --split2 0.05 --save_model --transfer_conv_weights result_files/saved_models/gb1None_DD_MM_YYYY_HHMMSS/ --train_conv_layers
48+
```
3249

3350
## Making predictions
51+
*CAUTION check the correct time stamp for the trained model*
52+
3453
To make predictions for the variants `G1W` as well as `K42G,I24P` and get the predicted fitness score.
35-
`d4_predict.py --model_filepath result_files/saved_models/gb1None_10_01_2024_110517_end/ --protein_pdb example/gb1.pdb --alignment_file example/gb1_1000_experimental.clustal --query_name gb1 --variant_s G1W_K42G,I24P --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0`
36-
(`| awk -F '_' '{print $1} $2 > 0' | grep _` can be added if only positive scores shoud be shown [assumes a UNIX machine])
3754

55+
```sh
56+
d4_predict.py --model_filepath result_files/saved_models/gb1None_10_01_2024_110517_end/ --protein_pdb example/gb1.pdb --alignment_file example/gb1_1000_experimental.clustal --query_name gb1 --variant_s G1W_K42G,I24P --wt_seq MQYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE --first_ind 0
57+
```
58+
59+
If only positive scores should be shown, one can pipe the output through the following commands:
60+
61+
```bash
62+
| awk -F '_' '{print $1} $2 > 0' | grep _
63+
```
64+
can be added if only positive scores shoud be shown (assumes a UNIX machine)

0 commit comments

Comments
 (0)