-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add reactions based on KEGG and MetaCyc annotations #304
Conversation
@cheng-yu-zhang could you please explain a bit what was exactly done in this PR (and the other two that you opened)? Like where did you get the information from, why did you make these changes, perhaps any special cases or considerations? What solved the growth problem that you encountered? |
@edkerk that's my fault, i wiil detail more information. |
I have reorganized the data, to fit #302.
But overall, I'm not convinced whether all these reactions should be included. What criteria were used to include them? What experimental evidence is there to support them? [To facilitate this, I changed the layout of the Some examples:
These are probably not correct. The breakdown of melatonin and serotonin, which are not yeast metabolites, has the same EC number as the reaction from tryptophan to N-formyl-kynurenine, which is a reaction in NAD biosynthesis. Actually, there are four reactions in this map with the same EC number, but only one of these is part of a functional pathway. There are more examples like this, also based on MetaCyc. So how were these reactions selected? Then there are also other problematic reactions. The following two reactions are modifying proteins, which is outside the scope of a metabolic network. Moreover, they are actually half-reactions of pyruvate dehydrogenase and alpha-ketoglutarate dehydrogenase (both already in the model, and associated with the same genes). So no need to include these:
There are other reactions that act on non-specific substrates:
There has been some discussion about including non-specific substrates (#219), but these genes are already associated to existing reactions ( There are also examples of fluorinated and chlorinated compounds that would not occur in S. cerevisiae. Overall: The list of new reactions should be carefully cureated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.). |
@edkerk Is there any issue about the new reactions that I need to fix? |
I have refactored the script and location of datafiles to match the generic curation format introduced in #313. See I reiterate the last sentence of the previous comment: The list of new reactions should be carefully curated, to make sure that the models that are added make sense. More reactions is not perse better, even if it would not directly affect some of the model metrics (predicted growth rate, gene essentiality etc.). So you should go through the list of reactions 1-by-1 and manually check whether they make sense. You uploaded draft models from KEGG and MetaCyc, but there is no explanation given which reactions are then included and why. I quickly looked through the new reactions, and found some more issues: Double check that it is not duplicate of an existing reaction.
The first reaction was already present, while the second reaction has different metabolite names, it represents the same reaction. This also highlights that there are duplicate metabolites, which otherwise would have made it easier to spot. Double check that there are no duplicate metabolites.See above, even if the reaction would not have been duplicate, then Double check whether the reaction is likely to be present in S. cerevisiae
The first reaction is how glutathione oxidoreductase is widely accepted to function. The new reaction is reversible, uses NADH and has a much simplified gene association. What strong evidence is there to include the second one? Double check the gene associationsSee both examples above, the new reactions have much simplified gene associations, while the old reactions indicate complexes with subunits. What strong evidence is there to have the simplified gene association? See previous commentBut it's worthwhile to have another look at the previous comment as well, as these issues are not fully resolved. How is the localization determined? Be very careful with reactions predicted by MetaCyc, it can quickly draw in non-native substrates. |
@edkerk Hi, Ed. I encounter a problem. When I fail to run
|
@edkerk @hongzhonglu Are there any methods to solve the above problem? |
Hmm, even if |
@edkerk @hongzhonglu However, matlab can run |
^ This is the 1st commit message: refactor: reformat for new curateMetsRxnsGenes function ^ The commit message #2 will be skipped: ^ fix: remove duplicated metabolites ^ The commit message #3 will be skipped: ^ fix: remove unused metabolites ^ The commit message #4 will be skipped: ^ feat: add new genes file for metadata ^ The commit message #5 will be skipped: ^ fix: remove already existing metabolites ^ The commit message #6 will be skipped: ^ fix: remove cytidine nucleosidase stoichiometry
the rxn compartments are based on its enzyme subcellular location and it is detailed in 'compartmental localization.tsv'
add confidence score and ref; add function annotation about new reactions in "function annotation.tsv"
provide the screen shots of databases in the file "v8.6.1\straightforward_proof", which describe existence of the reaction in S288C and easy to check by other reviewers.
ca7907d
to
1b5fa8e
Compare
based on 'develop' version as 8.6.3 is not yet released
I went through all suggested reactions, checked them one-by-one. With the quality of the current yeast-GEM, one should be careful to include new reactions, there should be more evidence than it appearing in KEGG. I checked with the following strategy:
|
I agree with the detailed strategy. With a standard workflow, we can add new reactions more efficiently and credibly. |
Main improvements in this PR:
Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.
Saccharomyces_cerevisiae_draftmodel_kegg
andSaccharomyces_cerevisiae_draftmodel_metacyc
I hereby confirm that I have:
develop
as a target branch (top left drop-down menu)