Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reactions based on KEGG and MetaCyc annotations #304

Merged
merged 16 commits into from
Jul 8, 2023

Conversation

cheng-yu-zhang
Copy link
Collaborator

@cheng-yu-zhang cheng-yu-zhang commented Mar 31, 2022

Main improvements in this PR:

Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.

  1. First, construct two draft models using RAVEN Toolbox. Model file are in Saccharomyces_cerevisiae_draftmodel_kegg and Saccharomyces_cerevisiae_draftmodel_metacyc
  2. Then compare the draft models with yeast8 to find the new reactions.
  3. In terms of the new reations from step 2, check if their are reasonable using metacyc, yeastcyc, uniprot, SGD and KEGG.
  4. Get the final new reactions.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Selected develop as a target branch (top left drop-down menu)
  • If needed, asked first in the Gitter chat room about this PR

@edkerk
Copy link
Member

edkerk commented Mar 31, 2022

@cheng-yu-zhang could you please explain a bit what was exactly done in this PR (and the other two that you opened)? Like where did you get the information from, why did you make these changes, perhaps any special cases or considerations? What solved the growth problem that you encountered?

@cheng-yu-zhang
Copy link
Collaborator Author

@edkerk that's my fault, i wiil detail more information.

@hongzhonglu hongzhonglu requested a review from edkerk April 20, 2022 07:05
@edkerk
Copy link
Member

edkerk commented May 9, 2022

I have reorganized the data, to fit #302.

But overall, I'm not convinced whether all these reactions should be included. What criteria were used to include them? What experimental evidence is there to support them? [To facilitate this, I changed the layout of the yeast-GEM.txt file (cb966bc, using exportForGit), which makes for easier diff-ing in 25b724b.]

Some examples:

rxnID reaction equation grRule
r_4855 oxygen[c] + Melatonin[c] => Formyl-N-acetyl-5-methoxykynurenamine[c] YJR078W
r_4810 oxygen[c] + Serotonin[c] => Formyl-5-hydroxykynurenamine[c] YJR078W

These are probably not correct. The breakdown of melatonin and serotonin, which are not yeast metabolites, has the same EC number as the reaction from tryptophan to N-formyl-kynurenine, which is a reaction in NAD biosynthesis. Actually, there are four reactions in this map with the same EC number, but only one of these is part of a functional pathway.

There are more examples like this, also based on MetaCyc. So how were these reactions selected?

Then there are also other problematic reactions. The following two reactions are modifying proteins, which is outside the scope of a metabolic network. Moreover, they are actually half-reactions of pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase (both already in the model, and associated with the same genes). So no need to include these:

rxnID reaction equation grRule
r_4833 coenzyme A[m] + pyruvate-dehydrogenase-acetylDHlipoyl[m] => acetyl-CoA[m] + pyruvate-dehydrogenase-dihydrolipoate[m] YNL071W
r_4834 succinyl-CoA[m] + N6-dihydrolipoyl-L-lysine[m] <=> coenzyme A[m] + N6-S-succinyldihydrolipoyl-L-lysine[m] YDR148C

There are other reactions that act on non-specific substrates:

rxnID reaction equation grRule
r_4755 2 H+[c] + H2O[c] + L-Selenocystathionine[c] => ammonium[c] + pyruvate[c] + Selenohomocysteine[c] YGL184C or YHR112C or YFR055W
r_4835 H2O[c] + S-Substituted-L-Cysteines[c] => ammonium[c] + pyruvate[c] + Thiols[c] YGL184C or YFR055W

There has been some discussion about including non-specific substrates (#219), but these genes are already associated to existing reactions (r_0308), so there is no value of including it as non-specific reactions.

There are also examples of fluorinated and chlorinated compounds that would not occur in S. cerevisiae.

Overall: The list of new reactions should be carefully cureated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).

@cheng-yu-zhang
Copy link
Collaborator Author

@edkerk Is there any issue about the new reactions that I need to fix?

@edkerk
Copy link
Member

edkerk commented Sep 5, 2022

I have refactored the script and location of datafiles to match the generic curation format introduced in #313. See code/modelCuration/v8_6_1.m for how the model curation is performed.


I reiterate the last sentence of the previous comment: The list of new reactions should be carefully curated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.).

So you should go through the list of reactions 1-by-1 and manually check whether they make sense. You uploaded draft models from KEGG and MetaCyc, but there is no explanation given which reactions are then included and why. I quickly looked through the new reactions, and found some more issues:

Double check that it is not duplicate of an existing reaction.

rxnID reaction equation grRule
r_0916 ATP[c] + ribose-5-phosphate[c] => AMP[c] + H+[c] + PRPP[c] (YKL181W and YER099C) or (YKL181W and YHL011C) or (YKL181W and YBL068W) or (YER099C and YOL061W) or (YBL068W and YOL061W)
r_4723 ATP[c] + D-ribose 5-phosphate[c] <=> AMP[c] + H+[c] + 5-Phospho-alpha-D-ribose 1-diphosphate[c] YBL068W or YHL011C or YER099C or YOL061W or YKL181W

The first reaction was already present, while the second reaction has different metabolite names, it represents the same reaction. This also highlights that there are duplicate metabolites, which otherwise would have made it easier to spot.

Double check that there are no duplicate metabolites.

See above, even if the reaction would not have been duplicate, then ribose-5-phosphate and ´D-ribose 5-phosphate` are highly likely the same metabolite, so make sure there is only one of them present.

Double check whether the reaction is likely to be present in S. cerevisiae

rxnID reaction equation grRule
r_0481 glutathione disulfide[c] + H+[c] + NADPH[c] => 2 glutathione[c] + NADP(+)[c] (YCL035C and YPL091W) or (YDR098C and YPL091W) or (YDR513W and YPL091W) or (YER174C and YPL091W)
r_4711 2 glutathione[c] + NAD[c] <=> glutathione disulfide[c] + H+[c] + NADH[c] YPL091W

The first reaction is how glutathione oxidoreductase is widely accepted to function. The new reaction is reversible, uses NADH and has a much simplified gene association. What strong evidence is there to include the second one?

Double check the gene associations

See both examples above, the new reactions have much simplified gene associations, while the old reactions indicate complexes with subunits. What strong evidence is there to have the simplified gene association?

See previous comment

But it's worthwhile to have another look at the previous comment as well, as these issues are not fully resolved. How is the localization determined? Be very careful with reactions predicted by MetaCyc, it can quickly draw in non-native substrates.

@edkerk edkerk added this to the 8.7.0 milestone Sep 5, 2022
@cheng-yu-zhang
Copy link
Collaborator Author

@edkerk Hi, Ed. I encounter a problem. When I fail to run deletion = cobra.flux_analysis.deletion.double_gene_deletion(model, gene_list1=pair1, gene_list2=pair2) in python using yeast-GEM from both main branch and develop branch. Even if I change the version of cobrapy, I can not solve it. So, I am wondering if saveYeastModel.m has changed.
The error is below:

Traceback (most recent call last):
File "D:\Anaconda\envs\python38\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 47, in
deletion = double_gene_deletion(model,
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 393, in double_gene_deletion
return _multi_deletion(
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\flux_analysis\deletion.py", line 144, in _multi_deletion
with ProcessPool(
File "D:\Anaconda\envs\python38\lib\site-packages\cobra\util\process_pool.py", line 56, in init
pickle.dump((initializer,) + initargs, handle)
TypeError: cannot pickle 'SwigPyObject' object

@cheng-yu-zhang
Copy link
Collaborator Author

cheng-yu-zhang commented Nov 6, 2022

@edkerk @hongzhonglu Are there any methods to solve the above problem?

@edkerk
Copy link
Member

edkerk commented Nov 10, 2022

Hmm, even if saveYeastModel is changed, it would still produce a valid SBML file that cobrapy should be able to import without issues. Just to confirm that it is really a problem with the model itself, have you tried running it on another model (non yeast-GEM, maybe E. coli?).

@cheng-yu-zhang
Copy link
Collaborator Author

cheng-yu-zhang commented Nov 11, 2022

@edkerk @hongzhonglu double_gene_deletion and single_gene_deletion could be perfectly performed in iML1515 and yeast-GEM 8.5. But in the latest yeast-GEM, somthing goes wrong.
微信图片_20221111102522
cf11613f2f0754e144aba0c78389195

However, matlab can run double_gene_deletion with a solvable problem. And I am working on it.

cheng-yu-zhang and others added 11 commits July 1, 2023 20:54
^ This is the 1st commit message:

refactor: reformat for new curateMetsRxnsGenes function

^ The commit message #2 will be skipped:

^ fix: remove duplicated metabolites

^ The commit message #3 will be skipped:

^ fix: remove unused metabolites

^ The commit message #4 will be skipped:

^ feat: add new genes file for metadata

^ The commit message #5 will be skipped:

^ fix: remove already existing metabolites

^ The commit message #6 will be skipped:

^ fix: remove cytidine nucleosidase stoichiometry
^ This is the 1st commit message:

remove improper new rxns; check the subsystem of new rxns

^ The commit message #2 will be skipped:

^ add draft model

^ The commit message #3 will be skipped:

^ update DBnewRxnsGenes.tsv

^ The commit message #4 will be skipped:

^ add more mets' annotation
the rxn compartments are based on its enzyme subcellular location and it is detailed in 'compartmental localization.tsv'
add confidence score and ref;
add function annotation about new reactions in "function annotation.tsv"
provide the screen shots of databases in the file "v8.6.1\straightforward_proof", which describe existence of the reaction in S288C and easy to check by other reviewers.
@edkerk edkerk force-pushed the add-new-reaction-after-correction branch from ca7907d to 1b5fa8e Compare July 1, 2023 19:04
@edkerk
Copy link
Member

edkerk commented Jul 1, 2023

I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:

  • Check if the reaction is not a partial reaction, which is already represented in the model as the complete reacton.
  • Compare the new reaction with existing reactions annotated to the same gene: if there is a difference (in e.g. substrate or co-factor), find evidence in literature if the new reaction is supported and/or likely to be present. Not only guided by KEGG or UniProt, but search for more solid evidence.
  • If the above are true, then see if the reactants and/or products connect to existing metabolites. If so, then include the reaction in that compartment, but do not add it to other compartments. This should rather be addressed by a thorough curation of all reaction compartmentalizations. If the reaction does not connect to the existing metabolic network, then just add it to whatever compartment is suggested.

@edkerk edkerk changed the title Add 209 new reactions after fixing the growth problem add reactions based on KEGG and MetaCyc annotations Jul 4, 2023
@cheng-yu-zhang
Copy link
Collaborator Author

I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:

  • Check if the reaction is not a partial reaction, which is already represented in the model as the complete reacton.
  • Compare the new reaction with existing reactions annotated to the same gene: if there is a difference (in e.g. substrate or co-factor), find evidence in literature if the new reaction is supported and/or likely to be present. Not only guided by KEGG or UniProt, but search for more solid evidence.
  • If the above are true, then see if the reactants and/or products connect to existing metabolites. If so, then include the reaction in that compartment, but do not add it to other compartments. This should rather be addressed by a thorough curation of all reaction compartmentalizations. If the reaction does not connect to the existing metabolic network, then just add it to whatever compartment is suggested.

I agree with the detailed strategy. With a standard workflow, we can add new reactions more efficiently and credibly.

@edkerk edkerk merged commit be5dab0 into develop Jul 8, 2023
@edkerk edkerk deleted the add-new-reaction-after-correction branch July 8, 2023 12:19
@edkerk edkerk mentioned this pull request Jul 8, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants