Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex annotation #305

Merged
merged 21 commits into from
May 28, 2022
Merged

Complex annotation #305

merged 21 commits into from
May 28, 2022

Conversation

cheng-yu-zhang
Copy link
Collaborator

@cheng-yu-zhang cheng-yu-zhang commented Mar 31, 2022

Main improvements in this PR:

Manually check all 209 complex annotations in yeast8.5 based on uniport, SGD and complex portal. I applied "addDBNewGeneAnnotation.m" to correct 45 complex annotations which are wrong or incomplete.

  1. The the corrected annotations are in the file "databasenewGPR.tsv"
  2. The corresponding reasons are in "databasenewGPR_proof.tsv".
  3. New genes are detailed in "DBnewRxnsGenes.tsv“".
  4. The result of gene essentiality analysis remain the same, 0.8980.
  5. The latest complex annotation downloaded from complex portal is in file "Yeast_complex_portal_2022.tsv"
  6. The explanation is in file "explanation.docx"

Explanation

Yeast_complex_portal_2022.tsv is the latest complex information downloaded from complex portal. This file and complex portal website are the most import reference, and uniprot and SGD is for supplement.

  • First, compare the file with yeast-GEM to find the complex annotation (A) in the file and its counterpart (B) in yeast-GEM.
  • If A contains more complex than B, then add the extra complex. e,g. “r_0831”.
  • For the same complex in A and B, if A contains more subunits than B, then add the extra subunits. e.g. “r_3216”
  • For more complicated situation. e.g. “r_0963”, “r_0263”, “r_0886”, “r_0021”. Then uniprot and SGD are needed for further information to make sure whether a single subunit could catalyse the reaction along, whether a subunit is necessary.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Selected develop as a target branch (top left drop-down menu)
  • If needed, asked first in the Gitter chat room about this PR

@hongzhonglu hongzhonglu requested a review from edkerk April 20, 2022 07:06
@hongzhonglu
Copy link
Collaborator

@cheng-yu-zhang For each pull request, please summarize the detailed work that you have done so that it will be easier for other people to review it.

@hongzhonglu hongzhonglu requested a review from feiranl April 20, 2022 07:10
@cheng-yu-zhang
Copy link
Collaborator Author

@cheng-yu-zhang For each pull request, please summarize the detailed work that you have done so that it will be easier for other people to review it.

@hongzhonglu I haved added more details into the comments.

@feiranl
Copy link
Collaborator

feiranl commented Apr 20, 2022

Hi @cheng-yu-zhang, Thanks for this update! Nice work!

The growth test for the updated model basically remains the same with model in the devel branch. The accuracy for gene essential test also remains the same (0.89). However, two genes: YKR072C and YOR054C are now false negative (experimental_viable, model_inviable for deletion), please double check reactions associated to these two genes.

You mentioned you added 7 new genes, but according to the README file, the gene number has been changed from 1150 to 1161. Please check this.

It would be better to have a reference or a database reference for every change so that we can trace back to the annotation. This could either be an extra column of "databasenewGPR.tsv" or summaries as a table here (see below for example). It would facilitate the transparency of the model curation. @edkerk @hongzhonglu, what do you think?

For example:

  • List of genes removed in this version:
Genes Related reactions Reference
YGL119W fill this fill this
YGR147C fill this fill this
  • List of genes added in this version:
Genes Related reactions Reference
YPR165W
YFR049W
YOR253W
YGR038W
YLR350W
YBR128C
YLR211C
YLR360W
YPL120W
YFR021W
YNL054W
YGR106C
YPR170W-B
  • List of genes modified in this version:
Genes Related reactions Reference
gene

@edkerk edkerk force-pushed the complex-annotation branch from 7f83202 to 123eef9 Compare May 9, 2022 21:34
@edkerk
Copy link
Member

edkerk commented May 9, 2022

@feiranl There should indeed be an explanation of why these curations were performed. The PR text mentions that these were manually curated by looking at different databases, but which database is then suggesting which change? Do the databases agree? Is there a conflict? Also some genes are removed, how confident are we of this?

I have rebased this PR onto the latest develop branch, so that the model files can be generated. I also refactored the code to use only RAVEN functions, following #301.

Instead of modifying existing files that were used for previous curations (databasenewGPR.tsv), it is better to make a dedicated file for this particular curation. See for instance #300 and #304, where separate folders with those files are made (here just 1 file would be sufficient).

@cheng-yu-zhang
Copy link
Collaborator Author

cheng-yu-zhang commented May 21, 2022

@edkerk Instead of making a new file "DBnewRxnsGenes.tsv“, which detailed the new genes, could I add another file, maybe named "databasenewGPR_proof.tsv", to explain why these curations were performed?
For example:

rxnID_yeast_model genes_yeast_model final_GPR reference
r_0005 YGR032W or YMR306W YMR306W web link or paper

cheng-yu-zhang and others added 8 commits May 23, 2022 15:27
# Conflicts:
#	model/dependencies.txt
#	model/yeast-GEM.txt
#	model/yeast-GEM.yml
(cherry picked from commit bbaefdb)

# Conflicts:
#	code/modelCuration/addTransNewGPR.m
#	data/modelCuration/TransRxnNewGPR.tsv
@edkerk
Copy link
Member

edkerk commented May 23, 2022

@cheng-yu-zhang

  1. I have done some refactoring of your data and code, two of the files were mostly duplicate.
  2. While you give detailed links to the webpages where each complex is described, it would be good to still have a higher level explanation of what curation was done. Similar as one would do for a paper/report. This can be put in the PR message.
  3. Please see Transporter annotation for transport rxns #306 (comment) for a comment on what should be in the refseq column of the gene metadata: not the nucleotide sequence, but a nucleotide NCBI identifier.
  4. Can you show results from gene essentiality analysis? There is a function for this code/modelTests/essentialGenes.m

Copy link
Member

@edkerk edkerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also run [accurancy,tp,tn,fn,fp] = essentialGenes(model); and compared this with yeast-GEM 8.6.0

Metric yeast-GEM 8.6.0 this PR
Number of genes 1151 1162
Accuracy 0.8801 0.8802
TP 923 930
TN 61 62
FP 98 97
FN 36 38

Two new false negatives (model predicts their essentiality, but experimental data indicates that they are not essential): YKR072C and YOR054C. Both are in reaction r_0906: H+[c] + N-[(R)-4-phosphonopantothenoyl]-L-cysteine[c] => carbon dioxide[c] + pantetheine 4'-phosphate[c] (part of coenzyme A biosynthesis).

The old grRule: (YKL088W and YKR072C and YOR054C) or (YKL088W and YKR072C) or (YKL088W and YOR054C) or YKL088W
The new grRule: YKL088W and YKR072C and YOR054C

The links (SGD (YKL088W), (YKR072C), (YOR054C), Complex portal, and Uniprot (YKL088W), (YKR072C), (YOR054C) all tend to agree that they form a complex, though. so this change to the model should be approved, evenwhile the FN goes up slightly.

Note: essentialGenes currently needs RAVEN from SysBioChalmers/RAVEN#421

@feiranl
Copy link
Collaborator

feiranl commented May 26, 2022

Could also run the Growth Tests? This normally will run successfully, but just to make sure that we have a functional model? @hongzhonglu @edkerk @cheng-yu-zhang I think maybe it is time to have some more tests after each update to ensure the quality. Now we have essentialGenes and growth, but maybe we can have a separate flux check which can be extracted from C13 data? In that case, we know that we are making the flux prediction better or at least not worse. What do you think?

model/yeast-GEM.txt Outdated Show resolved Hide resolved
model/yeast-GEM.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@feiranl feiranl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I can see you have fixed some long-existing issues about complex annotations especially the wrong gene annotation for r_0943. Nice work! Please resolve the questions and then the pull request is ready to go.

@hongzhonglu
Copy link
Collaborator

Could also run the Growth Tests? This normally will run successfully, but just to make sure that we have a functional model? @hongzhonglu @edkerk @cheng-yu-zhang I think maybe it is time to have some more tests after each update to ensure the quality. Now we have essentialGenes and growth, but maybe we can have a separate flux check which can be extracted from C13 data? In that case, we know that we are making the flux prediction better or at least not worse. What do you think?

It is very nice suggestion. More test will make sure the model prediction quality is increased consistently. @cheng-yu-zhang @feiranl

cheng-yu-zhang and others added 3 commits May 28, 2022 21:55
# Conflicts:
#	README.md
#	code/modelCuration/addDBNewGeneAnnotation.m
#	model/yeast-GEM.xml
#	model/yeast-GEM.yml
@edkerk edkerk merged commit 42a3fb0 into develop May 28, 2022
@edkerk edkerk deleted the complex-annotation branch May 28, 2022 20:18
@edkerk edkerk mentioned this pull request Jun 16, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants