add reactions based on KEGG and MetaCyc annotations #304

cheng-yu-zhang · 2022-03-31T12:46:00Z

Main improvements in this PR:

Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.

First, construct two draft models using RAVEN Toolbox. Model file are in Saccharomyces_cerevisiae_draftmodel_kegg and Saccharomyces_cerevisiae_draftmodel_metacyc
Then compare the draft models with yeast8 to find the new reactions.
In terms of the new reations from step 2, check if their are reasonable using metacyc, yeastcyc, uniprot, SGD and KEGG.
Get the final new reactions.

I hereby confirm that I have:

Tested my code with all requirements for running the model
Selected develop as a target branch (top left drop-down menu)
If needed, asked first in the Gitter chat room about this PR

edkerk · 2022-03-31T13:26:12Z

@cheng-yu-zhang could you please explain a bit what was exactly done in this PR (and the other two that you opened)? Like where did you get the information from, why did you make these changes, perhaps any special cases or considerations? What solved the growth problem that you encountered?

cheng-yu-zhang · 2022-03-31T13:51:47Z

@edkerk that's my fault, i wiil detail more information.

edkerk · 2022-05-09T21:08:52Z

I have reorganized the data, to fit #302.

Make sure that there are no more duplicated metabolites (3f1646d)
Make sure that there are no unused metabolites (26b55d8)
There should be much more metadata provided in DBnewRxnsMets.tsv, DBnewRxnsRxns.tsv and now also DBnewRxnsGenes.tsv
Many of the subsystems are very unique, and this does not make sense in accordance to fix: subSystems field #11 and Unique subsystem of rxn #307.
How is the compartmental localization determined?

But overall, I'm not convinced whether all these reactions should be included. What criteria were used to include them? What experimental evidence is there to support them? [To facilitate this, I changed the layout of the yeast-GEM.txt file (cb966bc, using exportForGit), which makes for easier diff-ing in 25b724b.]

Some examples:

rxnID	reaction equation	grRule
r_4855	oxygen[c] + Melatonin[c] => Formyl-N-acetyl-5-methoxykynurenamine[c]	YJR078W
r_4810	oxygen[c] + Serotonin[c] => Formyl-5-hydroxykynurenamine[c]	YJR078W

These are probably not correct. The breakdown of melatonin and serotonin, which are not yeast metabolites, has the same EC number as the reaction from tryptophan to N-formyl-kynurenine, which is a reaction in NAD biosynthesis. Actually, there are four reactions in this map with the same EC number, but only one of these is part of a functional pathway.

There are more examples like this, also based on MetaCyc. So how were these reactions selected?

Then there are also other problematic reactions. The following two reactions are modifying proteins, which is outside the scope of a metabolic network. Moreover, they are actually half-reactions of pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase (both already in the model, and associated with the same genes). So no need to include these:

rxnID	reaction equation	grRule
r_4833	coenzyme A[m] + pyruvate-dehydrogenase-acetylDHlipoyl[m] => acetyl-CoA[m] + pyruvate-dehydrogenase-dihydrolipoate[m]	YNL071W
r_4834	succinyl-CoA[m] + N6-dihydrolipoyl-L-lysine[m] <=> coenzyme A[m] + N6-S-succinyldihydrolipoyl-L-lysine[m]	YDR148C

There are other reactions that act on non-specific substrates:

rxnID	reaction equation	grRule
r_4755	2 H+[c] + H2O[c] + L-Selenocystathionine[c] => ammonium[c] + pyruvate[c] + Selenohomocysteine[c]	YGL184C or YHR112C or YFR055W
r_4835	H2O[c] + S-Substituted-L-Cysteines[c] => ammonium[c] + pyruvate[c] + Thiols[c]	YGL184C or YFR055W

There has been some discussion about including non-specific substrates (#219), but these genes are already associated to existing reactions (r_0308), so there is no value of including it as non-specific reactions.

There are also examples of fluorinated and chlorinated compounds that would not occur in S. cerevisiae.

Overall: The list of new reactions should be carefully cureated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).

cheng-yu-zhang · 2022-09-04T12:17:30Z

@edkerk Is there any issue about the new reactions that I need to fix?

edkerk · 2022-09-05T16:14:52Z

I have refactored the script and location of datafiles to match the generic curation format introduced in #313. See code/modelCuration/v8_6_1.m for how the model curation is performed.

I reiterate the last sentence of the previous comment: The list of new reactions should be carefully curated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).

So you should go through the list of reactions 1-by-1 and manually check whether they make sense. You uploaded draft models from KEGG and MetaCyc, but there is no explanation given which reactions are then included and why. I quickly looked through the new reactions, and found some more issues:

Double check that it is not duplicate of an existing reaction.

rxnID	reaction equation	grRule
r_0916	ATP[c] + ribose-5-phosphate[c] => AMP[c] + H+[c] + PRPP[c]	(YKL181W and YER099C) or (YKL181W and YHL011C) or (YKL181W and YBL068W) or (YER099C and YOL061W) or (YBL068W and YOL061W)
r_4723	ATP[c] + D-ribose 5-phosphate[c] <=> AMP[c] + H+[c] + 5-Phospho-alpha-D-ribose 1-diphosphate[c]	YBL068W or YHL011C or YER099C or YOL061W or YKL181W

The first reaction was already present, while the second reaction has different metabolite names, it represents the same reaction. This also highlights that there are duplicate metabolites, which otherwise would have made it easier to spot.

Double check that there are no duplicate metabolites.

See above, even if the reaction would not have been duplicate, then ribose-5-phosphate and ´D-ribose 5-phosphate` are highly likely the same metabolite, so make sure there is only one of them present.

Double check whether the reaction is likely to be present in S. cerevisiae

rxnID	reaction equation	grRule
r_0481	glutathione disulfide[c] + H+[c] + NADPH[c] => 2 glutathione[c] + NADP(+)[c]	(YCL035C and YPL091W) or (YDR098C and YPL091W) or (YDR513W and YPL091W) or (YER174C and YPL091W)
r_4711	2 glutathione[c] + NAD[c] <=> glutathione disulfide[c] + H+[c] + NADH[c]	YPL091W

The first reaction is how glutathione oxidoreductase is widely accepted to function. The new reaction is reversible, uses NADH and has a much simplified gene association. What strong evidence is there to include the second one?

Double check the gene associations

See both examples above, the new reactions have much simplified gene associations, while the old reactions indicate complexes with subunits. What strong evidence is there to have the simplified gene association?

See previous comment

But it's worthwhile to have another look at the previous comment as well, as these issues are not fully resolved. How is the localization determined? Be very careful with reactions predicted by MetaCyc, it can quickly draw in non-native substrates.

cheng-yu-zhang · 2022-10-19T00:08:51Z

@edkerk Hi, Ed. I encounter a problem. When I fail to run deletion = cobra.flux_analysis.deletion.double_gene_deletion(model, gene_list1=pair1, gene_list2=pair2) in python using yeast-GEM from both main branch and develop branch. Even if I change the version of cobrapy, I can not solve it. So, I am wondering if saveYeastModel.m has changed.
The error is below:

Traceback (most recent call last):
File "D:\Anaconda\envs\python38\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 47, in
deletion = double_gene_deletion(model,
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 393, in double_gene_deletion
return _multi_deletion(
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 144, in _multi_deletion
with ProcessPool(
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\util\process_pool.py", line 56, in init
pickle.dump((initializer,) + initargs, handle)
TypeError: cannot pickle 'SwigPyObject' object

cheng-yu-zhang · 2022-11-06T05:10:34Z

@edkerk @hongzhonglu Are there any methods to solve the above problem?

edkerk · 2022-11-10T23:24:23Z

Hmm, even if saveYeastModel is changed, it would still produce a valid SBML file that cobrapy should be able to import without issues. Just to confirm that it is really a problem with the model itself, have you tried running it on another model (non yeast-GEM, maybe E. coli?).

cheng-yu-zhang · 2022-11-11T02:28:55Z

@edkerk @hongzhonglu double_gene_deletion and single_gene_deletion could be perfectly performed in iML1515 and yeast-GEM 8.5. But in the latest yeast-GEM, somthing goes wrong.

However, matlab can run double_gene_deletion with a solvable problem. And I am working on it.

data/modelCuration/v8.6.1/DBnewRxnsRxns.tsv

^ This is the 1st commit message: refactor: reformat for new curateMetsRxnsGenes function ^ The commit message #2 will be skipped: ^ fix: remove duplicated metabolites ^ The commit message #3 will be skipped: ^ fix: remove unused metabolites ^ The commit message #4 will be skipped: ^ feat: add new genes file for metadata ^ The commit message #5 will be skipped: ^ fix: remove already existing metabolites ^ The commit message #6 will be skipped: ^ fix: remove cytidine nucleosidase stoichiometry

^ This is the 1st commit message: remove improper new rxns; check the subsystem of new rxns ^ The commit message #2 will be skipped: ^ add draft model ^ The commit message #3 will be skipped: ^ update DBnewRxnsGenes.tsv ^ The commit message #4 will be skipped: ^ add more mets' annotation

the rxn compartments are based on its enzyme subcellular location and it is detailed in 'compartmental localization.tsv'

add confidence score and ref; add function annotation about new reactions in "function annotation.tsv"

provide the screen shots of databases in the file "v8.6.1\straightforward_proof", which describe existence of the reaction in S288C and easy to check by other reviewers.

based on 'develop' version as 8.6.3 is not yet released

edkerk · 2023-07-01T21:55:26Z

I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:

Check if the reaction is not a partial reaction, which is already represented in the model as the complete reacton.
Compare the new reaction with existing reactions annotated to the same gene: if there is a difference (in e.g. substrate or co-factor), find evidence in literature if the new reaction is supported and/or likely to be present. Not only guided by KEGG or UniProt, but search for more solid evidence.
If the above are true, then see if the reactants and/or products connect to existing metabolites. If so, then include the reaction in that compartment, but do not add it to other compartments. This should rather be addressed by a thorough curation of all reaction compartmentalizations. If the reaction does not connect to the existing metabolic network, then just add it to whatever compartment is suggested.

cheng-yu-zhang · 2023-07-06T06:20:41Z

I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:

Check if the reaction is not a partial reaction, which is already represented in the model as the complete reacton.

Compare the new reaction with existing reactions annotated to the same gene: if there is a difference (in e.g. substrate or co-factor), find evidence in literature if the new reaction is supported and/or likely to be present. Not only guided by KEGG or UniProt, but search for more solid evidence.

If the above are true, then see if the reactants and/or products connect to existing metabolites. If so, then include the reaction in that compartment, but do not add it to other compartments. This should rather be addressed by a thorough curation of all reaction compartmentalizations. If the reaction does not connect to the existing metabolic network, then just add it to whatever compartment is suggested.

I agree with the detailed strategy. With a standard workflow, we can add new reactions more efficiently and credibly.

hongzhonglu requested a review from edkerk April 20, 2022 07:05

This was referenced May 9, 2022

Complex annotation #305

Merged

Transporter annotation for transport rxns #306

Merged

edkerk added this to the 8.7.0 milestone Sep 5, 2022

edkerk reviewed Mar 16, 2023

View reviewed changes

data/modelCuration/v8.6.1/DBnewRxnsRxns.tsv Outdated Show resolved Hide resolved

data/modelCuration/v8.6.1/DBnewRxnsRxns.tsv Outdated Show resolved Hide resolved

cheng-yu-zhang and others added 11 commits July 1, 2023 20:54

feat：add 209 genes and 357 mets after solving the growth problem

cb19939

fix: invalid YAML due to incorrect " in rxn name

7ef9b9a

chore: update the folder v8.6.1

fc99c0e

the rxn compartments are based on its enzyme subcellular location and it is detailed in 'compartmental localization.tsv'

update folder "v8.6.1"

d902d93

add confidence score and ref; add function annotation about new reactions in "function annotation.tsv"

recheck the new rxns

ec38efa

provide the screen shots of databases in the file "v8.6.1\straightforward_proof", which describe existence of the reaction in S288C and easy to check by other reviewers.

remove R00245

8479276

fix: remove partial reactions, where the full rxn is already in model

997d04f

fix: manual curation of new reactions

12e3510

fix: remove figures (> 12 MB)

1b5fa8e

edkerk force-pushed the add-new-reaction-after-correction branch from ca7907d to 1b5fa8e Compare July 1, 2023 19:04

edkerk added 3 commits July 1, 2023 21:16

chore: remove intermediate recon files

02c9388

refactor: move curation data + script to v8.6.3

4178bb1

chore: run v8_6_3

662cdef

based on 'develop' version as 8.6.3 is not yet released

edkerk approved these changes Jul 1, 2023

View reviewed changes

edkerk mentioned this pull request Jul 2, 2023

feat: add metDeltaG and rxnDeltaG fields #330

Merged

3 tasks

edkerk changed the title ~~Add 209 new reactions after fixing the growth problem~~ add reactions based on KEGG and MetaCyc annotations Jul 4, 2023

edkerk added 2 commits July 8, 2023 13:06

refactor: reorganize curation scripts

2681871

fix: small corrections of incorrect annotation etc

88968da

edkerk merged commit be5dab0 into develop Jul 8, 2023

edkerk deleted the add-new-reaction-after-correction branch July 8, 2023 12:19

edkerk mentioned this pull request Jul 8, 2023

yeast 8.7.0 #343

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add reactions based on KEGG and MetaCyc annotations #304

add reactions based on KEGG and MetaCyc annotations #304

cheng-yu-zhang commented Mar 31, 2022 •

edited

Loading

edkerk commented Mar 31, 2022

cheng-yu-zhang commented Mar 31, 2022

edkerk commented May 9, 2022 •

edited by cheng-yu-zhang

Loading

cheng-yu-zhang commented Sep 4, 2022

edkerk commented Sep 5, 2022

cheng-yu-zhang commented Oct 19, 2022

cheng-yu-zhang commented Nov 6, 2022 •

edited

Loading

edkerk commented Nov 10, 2022

cheng-yu-zhang commented Nov 11, 2022 •

edited

Loading

edkerk commented Jul 1, 2023

cheng-yu-zhang commented Jul 6, 2023

add reactions based on KEGG and MetaCyc annotations #304

add reactions based on KEGG and MetaCyc annotations #304

Conversation

cheng-yu-zhang commented Mar 31, 2022 • edited Loading

Main improvements in this PR:

edkerk commented Mar 31, 2022

cheng-yu-zhang commented Mar 31, 2022

edkerk commented May 9, 2022 • edited by cheng-yu-zhang Loading

cheng-yu-zhang commented Sep 4, 2022

edkerk commented Sep 5, 2022

Double check that it is not duplicate of an existing reaction.

Double check that there are no duplicate metabolites.

Double check whether the reaction is likely to be present in S. cerevisiae

Double check the gene associations

See previous comment

cheng-yu-zhang commented Oct 19, 2022

cheng-yu-zhang commented Nov 6, 2022 • edited Loading

edkerk commented Nov 10, 2022

cheng-yu-zhang commented Nov 11, 2022 • edited Loading

edkerk commented Jul 1, 2023

cheng-yu-zhang commented Jul 6, 2023

cheng-yu-zhang commented Mar 31, 2022 •

edited

Loading

edkerk commented May 9, 2022 •

edited by cheng-yu-zhang

Loading

cheng-yu-zhang commented Nov 6, 2022 •

edited

Loading

cheng-yu-zhang commented Nov 11, 2022 •

edited

Loading