Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incorrect metabolite annotation #108

Closed
3 tasks done
edkerk opened this issue May 15, 2018 · 33 comments · Fixed by #122
Closed
3 tasks done

fix: incorrect metabolite annotation #108

edkerk opened this issue May 15, 2018 · 33 comments · Fixed by #122
Labels
fixed in devel this issue is already fixed in devel and will be closed after the next release

Comments

@edkerk
Copy link
Member

edkerk commented May 15, 2018

Description of the issue:

There are multiple metabolites with incorrect annotations (CHEBI, KEGG).

Expected feature/value/output:

s_0511, s_0512 and s_0513 are all choline in different compartments, correctly annotated with CHEBI:15354 and KEGG C00114.

In addition, s_2807 is also annotated with those two CHEBI and KEGG IDs, even though it's (S)-3-hydroxyhexacosanoyl-CoA. Meanwhile, s_0045, which is the same compound but now located in the peroxisome instead of ER membrane is correctly annotated with CHEBI:52976

Current feature/value/output:

The RAVEN function checkModelStruct indicates that the following annotations are repeated for metabolites with different names:

WARNING: The following MIRIAM strings are associated to more than one unique metabolite name:
	chebi/CHEBI:15354
	chebi/CHEBI:15377
	chebi/CHEBI:15428
	chebi/CHEBI:15471
	chebi/CHEBI:15676
	chebi/CHEBI:15698
	chebi/CHEBI:15699
	chebi/CHEBI:15740
	chebi/CHEBI:15837
	chebi/CHEBI:15873
	chebi/CHEBI:15891
	chebi/CHEBI:16004
	chebi/CHEBI:16024
	chebi/CHEBI:16182
	chebi/CHEBI:16235
	chebi/CHEBI:16335
	chebi/CHEBI:16347
	chebi/CHEBI:16450
	chebi/CHEBI:16638
	chebi/CHEBI:16643
	chebi/CHEBI:16708
	chebi/CHEBI:16750
	chebi/CHEBI:16810
	chebi/CHEBI:16865
	chebi/CHEBI:16933
	chebi/CHEBI:16947
	chebi/CHEBI:16977
	chebi/CHEBI:16988
	chebi/CHEBI:17038
	chebi/CHEBI:17071
	chebi/CHEBI:17108
	chebi/CHEBI:17115
	chebi/CHEBI:17191
	chebi/CHEBI:17203
	chebi/CHEBI:17295
	chebi/CHEBI:17368
	chebi/CHEBI:17536
	chebi/CHEBI:17549
	chebi/CHEBI:17562
	chebi/CHEBI:17596
	chebi/CHEBI:17600
	chebi/CHEBI:17754
	chebi/CHEBI:17836
	chebi/CHEBI:17924
	chebi/CHEBI:18050
	chebi/CHEBI:24636
	chebi/CHEBI:27689
	chebi/CHEBI:27750
	chebi/CHEBI:27989
	chebi/CHEBI:28789
	chebi/CHEBI:28938
	chebi/CHEBI:29033
	chebi/CHEBI:29806
	chebi/CHEBI:29985
	chebi/CHEBI:30849
	chebi/CHEBI:31725
	chebi/CHEBI:32682
	chebi/CHEBI:36655
	chebi/CHEBI:4167
	chebi/CHEBI:48943
	chebi/CHEBI:48945
	chebi/CHEBI:49000
	chebi/CHEBI:50569
	chebi/CHEBI:50585
	chebi/CHEBI:57457
	chebi/CHEBI:57925
	chebi/CHEBI:58210
	chebi/CHEBI:58297
	chebi/CHEBI:58343
	chebi/CHEBI:62014
	chebi/CHEBI:62501
	chebi/CHEBI:70712
	chebi/CHEBI:72001
	chebi/CHEBI:75074
	chebi/CHEBI:87781
	chebi/CHEBI:88008
	chebi/CHEBI:88980
	chebi/CHEBI:88984
	chebi/CHEBI:89019
	chebi/CHEBI:89763
	chebi/CHEBI:89765
	chebi/CHEBI:89959
	chebi/CHEBI:90051
	kegg.compound/C00001
	kegg.compound/C00025
	kegg.compound/C00026
	kegg.compound/C00031
	kegg.compound/C00037
	kegg.compound/C00041
	kegg.compound/C00048
	kegg.compound/C00051
	kegg.compound/C00054
	kegg.compound/C00058
	kegg.compound/C00061
	kegg.compound/C00062
	kegg.compound/C00064
	kegg.compound/C00065
	kegg.compound/C00073
	kegg.compound/C00079
	kegg.compound/C00080
	kegg.compound/C00114
	kegg.compound/C00116
	kegg.compound/C00121
	kegg.compound/C00122
	kegg.compound/C00127
	kegg.compound/C00147
	kegg.compound/C00148
	kegg.compound/C00158
	kegg.compound/C00159
	kegg.compound/C00212
	kegg.compound/C00216
	kegg.compound/C00242
	kegg.compound/C00245
	kegg.compound/C00256
	kegg.compound/C00259
	kegg.compound/C00262
	kegg.compound/C00263
	kegg.compound/C00266
	kegg.compound/C00294
	kegg.compound/C00318
	kegg.compound/C00334
	kegg.compound/C00352
	kegg.compound/C00387
	kegg.compound/C00407
	kegg.compound/C00430
	kegg.compound/C00475
	kegg.compound/C00499
	kegg.compound/C00504
	kegg.compound/C00517
	kegg.compound/C00526
	kegg.compound/C00568
	kegg.compound/C00794
	kegg.compound/C00849
	kegg.compound/C00881
	kegg.compound/C01342
	kegg.compound/C01551
	kegg.compound/C01571
	kegg.compound/C01694
	kegg.compound/C01722
	kegg.compound/C02223
	kegg.compound/C02504
	kegg.compound/C02944
	kegg.compound/C03221
	kegg.compound/C03479
	kegg.compound/C04525
	kegg.compound/C05272
	kegg.compound/C05275
	kegg.compound/C05853
	kegg.compound/C07328
	kegg.compound/C07329
	kegg.compound/C12296
	kegg.compound/C14818

Reproducing these results:

checkModelStruct(model,true,false)

(note that the current checkModelStruct version has a bug that limits the output to the first 10 mistakes)
I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
@edkerk edkerk changed the title incorrect metabolite annotation fix: incorrect metabolite annotation May 15, 2018
edkerk added a commit to SysBioChalmers/Yarrowia_lipolytica_W29-GEM that referenced this issue May 15, 2018
Due to a bug in yeast-GEM (SysBioChalmers/yeast-GEM#108), incorrect metabolite annotations were introduced. Yeast-GEM derived annotations are removed, while KEGG-derived annotations are kept.
Will be updated once bug in yeast-GEM is fixed.
@hongzhonglu
Copy link
Collaborator

@edkerk Hi ,which version of model you used for the check? Just have a quick check, these are annotations for s_2807 and s_0045, which are right in the latest version:

s_2807[erm] (S)-3-hydroxyhexacosanoyl-CoA[erm] C47H86N7O18P3S NA CHEBI:52976
s_0045[p] (S)-3-hydroxyhexacosanoyl-CoA[p] C47H86N7O18P3S NA CHEBI:52976 NA 0

@edkerk
Copy link
Member Author

edkerk commented May 16, 2018

@hongzhonglu As specified in the bottom of the first post, this was in the master branch. Which version do you mean when referring to "the latest version" ?

Excerpt from the xml from the master branch, showing that s_2807[erm] is still annotated with CHEBI:15354:

      <species metaid="s_2807__91__erm__93__" id="s_2807__91__erm__93__" name="(S)-3-hydroxyhexacosanoyl-CoA [endoplasmic reticulum membrane]" compartment="erm" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="0" fbc:chemicalFormula="C47H86N7O18P3S">
        <annotation xmlns:sbml="http://www.sbml.org/sbml/level3/version1/core">
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#s_2807__91__erm__93__">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/chebi/CHEBI:15354"/>
                  <rdf:li rdf:resource="http://identifiers.org/kegg.compound/C00114"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </species>

Even in the curation/metabolites branch this is still annotated as such.

@hongzhonglu
Copy link
Collaborator

hongzhonglu commented May 16, 2018

@edkerk I will check where we make mistakes. As in the excel format we used to correct, the metabolite information is right.

@hongzhonglu
Copy link
Collaborator

@edkerk @BenjaSanchez The error exist in the metabolite data file used to update the metabolite annotation, which missed some changes. I will recheck and update the metabolite data file in the github and then you can recheck it again.

@hongzhonglu
Copy link
Collaborator

hongzhonglu commented May 16, 2018

@edkerk @BenjaSanchez After check, I find during the update we should update all the information based on the whole list of metabolites in our model. In our previous update, we only upload the metabolite annotation which has beed corrected. In the previous model, if one metabolite could have both right and wrong annotation, this metabolite information was not included in the uploaded data. As a result the wrong annotation will be kept and not updated. @BenjaSanchez so we should change the way for the metabolite updating.

@BenjaSanchez
Copy link
Contributor

@hongzhonglu this is very easily solvable if you just update the table metabolite_manual_curation.tsv to have all of those changes also (i.e. additional rows for each metabolite ID that there was a wrong ID, with a blank space in the corresponding ID_new). Then you can just re-run updateMetaboliteAnnotation.m and the model should be updated. Please do so in the curation/metabolites branch, and let me know if you have any questions.

@hongzhonglu
Copy link
Collaborator

@edkerk Just update the metabolite annotation.https://github.com/SysBioChalmers/yeast-GEM/tree/curation/metabolites
@BenjaSanchez If no problem, you can update the metabolites with "R". In the update,
yml format meet error.
Error in saveYeastModel (line 17)
exportForGit(model,'yeastGEM','..',{'yml'});

@edkerk
Copy link
Member Author

edkerk commented May 21, 2018

@hongzhonglu are you running the latest (2.0.0) version of RAVEN, and if this does not fix the error message, can you please give the full error message? The yml file is very useful to tracking changes.

A quick manual look at the XML file in your latest commit on that branch 59a9e2c seems to indeed fix some of the problems, but it also seems to revert to some other recent changes, such as updated metabolic formulae. It seems like @BenjaSanchez might be on to this as well: #113?

@BenjaSanchez
Copy link
Contributor

@edkerk no, that was just a specific conflict that arose on dependencies between devel and master and it is fixed already.

@hongzhonglu as @edkerk points out, some changes are being reverted, also the [compartment] information is being cleared out from some metNames, which we want to keep in the COBRA version... maybe it's better if we create a new branch to address this once we merge PR #112, with all changes from devel, and then you can add all changes. I see that you created 2 extra tables with data, why not just expand the already existing metabolite_manual_curation.tsv? Is there any extra information needed for the script in the other 2 tables?

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez
metabolite_manual_curation.tsv is just recording the all the changed information while metabolite_manual_curation_full_list.tsv is used for the update.

@BenjaSanchez
Copy link
Contributor

@hongzhonglu I don't see the difference: shouldn't the data used for the update record all information that changed as well? How can something be updated if it's not changing anything?

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez "In our previous update, we only upload the metabolite annotation which has beed corrected. In the previous model, if one metabolite could have both right and wrong annotation, this metabolite information was not included in the uploaded data. As a result the wrong annotation will be kept and not updated. @BenjaSanchez so we should change the way for the metabolite updating."

@BenjaSanchez
Copy link
Contributor

@hongzhonglu I understand that, my point is that a metabolite annotation that was wrong before and now has been removed is a change that can be easily included in metabolite_manual_curation.tsv, as the columns for that are already there. No need to create extra tables for it (will just confuse users), or am I missing something?

@hongzhonglu
Copy link
Collaborator

hongzhonglu commented May 21, 2018

@BenjaSanchez So it is better to just delete metabolite_manual_curation.tsv and keep metabolite_manual_curation_full_list.tsv, which is a fast and safe way to do the update. The change information can be also found in the later file.

@hongzhonglu
Copy link
Collaborator

@edkerk As you suggested, I run the RAVEN 2, the error still existed when I run saveYeastModel. The followed is the error information:

saveYeastModel(model)
Document written
Error using exportForGit
Too many input arguments.

Error in saveYeastModel (line 17)
exportForGit(model,'yeastGEM','..',{'yml'});

I think it is better to make this process simple so that everyone can save it normally.

@edkerk
Copy link
Member Author

edkerk commented May 21, 2018

@hongzhonglu can you let me know what is in the version.txt file that is in your RAVEN folder? It seems like you do not have RAVEN 2.0.0 installed.

@hongzhonglu
Copy link
Collaborator

@edkerk This is my version.txt: 2.0.0-rc.2

@edkerk
Copy link
Member Author

edkerk commented May 21, 2018

@hongzhonglu This is not version 2.0.0, but a release candidate. Please install the latests release from here, or update your master branch.

@BenjaSanchez
Copy link
Contributor

@hongzhonglu I see that the new file has 2 extra columns with formulas, are you updating any formulas? This is probably why some formulas are being reverted to the non-SBML compliant format. Also, what is general_chebiID?

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez Yes. I update formula also this time. I just remember that Feiran has done it before. The general_chebiID means that we can't find the specific ID for the metabolite.

@hongzhonglu
Copy link
Collaborator

@edkerk Thanks for your help. Now it works.

@BenjaSanchez
Copy link
Contributor

@hongzhonglu sounds then like that information should just be in remarks? If the formulas are already updated then those 2 columns can also go away, and then it's better to just use the old file metabolite_manual_curation.tsv with new rows added, that way the script doesn't have to be changed. If you agree with this I can delete this branch and we start over with just 2 commits:

  1. fix-data: updated metabolite_manual_curation.tsv
  2. fix-met.prop: updated model with metabolite curation

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez I agree to update it again. I suggest that we should keep the two columns though we don't use it this time so that we can record all the changes we have.

@BenjaSanchez
Copy link
Contributor

@hongzhonglu the problem with those columns is that many of the formulas are not SBML-compliant, so they had to be removed from the model. If you indicate this for the corresponding rows in remarks, then ok if those columns stay. general_chebiID can also go to remarks for consistency.

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez which formula you have changed for SBML-compliant? Can you send me the list?

@BenjaSanchez
Copy link
Contributor

BenjaSanchez commented May 21, 2018

@hongzhonglu any that has the characters )n in it, in the file it is all aminocyl tRNAs

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez I see. Then I give remark for them.

@BenjaSanchez BenjaSanchez mentioned this issue Jun 5, 2018
3 tasks
@BenjaSanchez
Copy link
Contributor

update: after including all missing changes from the manual curation (PR #119), a total of 68 warnings for repeated CHEBI ids and 65 warnings for repeated KEGG ids were solved. However, 19 warnings for repeated ids are still present when checkModelStruct.m is used:

WARNING: The following MIRIAM strings are associated to more than one unique metabolite name:
	chebi/CHEBI:15471
	chebi/CHEBI:27989
	chebi/CHEBI:62014
	chebi/CHEBI:70712
	chebi/CHEBI:72001
	chebi/CHEBI:75074
	chebi/CHEBI:87781
	chebi/CHEBI:88008
	chebi/CHEBI:88980
	...and 10 more

@hongzhonglu
Copy link
Collaborator

@BenjaSanchez
I have further checked all the information and found that all the changes is included in this update.
As for these repeated chebiIDs, it is due to that the chebi is find based on the metabolites formula, not the name, here is an example:

Description Charged formula Charge Compartment KEGG ID ChEBI ID
triglyceride (1-18:1, 2-18:1, 3-18:0) [endoplasmic reticulum membrane] C57H106O6 0 endoplasmic reticulum membrane   CHEBI:88980
triglyceride (1-18:0, 2-18:1, 3-18:1) [endoplasmic reticulum membrane] C57H106O6 0 endoplasmic reticulum membrane   CHEBI:88980
triglyceride (1-18:1, 2-18:1, 3-18:0) [lipid particle] C57H106O6 0 lipid particle   CHEBI:88980
triglyceride (1-18:0, 2-18:1, 3-18:1) [lipid particle] C57H106O6 0 lipid particle   CHEBI:88980

For each format of triglyceride, it is difficult to find the related chebiID, so we give the same chebiID for these metabolites.

The other repeated chebiID is just like this.
If you have any doubt, then let me know!

@BenjaSanchez
Copy link
Contributor

@hongzhonglu ok for me as long as all cases are like that. Remember to review PR #119 so we can merge changes to devel.
@edkerk thoughts on this? Is it ok to leave cases like these ones in the model?

@edkerk
Copy link
Member Author

edkerk commented Jun 7, 2018

I'm not so sure, it might be better to then annotate all TAGs with CHEBI:17855, the generic CHEBI for all triacylglycerols. CHEBI:88980 specifies the exact chemical structure in InChI, so this would be incorrect for then most of the metabolites it is annotated to. Using CHEBI:17855 doesn't get rid of the warnings of checkModelStruct, but at least the annotations are not misleading.

Instead, include annotations to SwissLipids for precise identification. Note that SwissLipids is also included in MetaNetX.

When annotating this, make sure that the location of the desaturation is correct (now I don't remember whether it should be 9Z or 11Z in yeast?).

@BenjaSanchez
Copy link
Contributor

@edkerk SwissLipids are a good idea, as MetaNetX ids for metabolites can be stored in the COBRA field metMetaNetXID. For now, i would suggest then as a compromise to just leave only the CHEBI ids that match EXACTLY and remove the rest. Same with KEGG ids @hongzhonglu

@BenjaSanchez BenjaSanchez added the fixed in devel this issue is already fixed in devel and will be closed after the next release label Jun 8, 2018
@hongzhonglu
Copy link
Collaborator

@edkerk All the repeated chebiID and keggID have been removed now. Only correct IDs were kept in our model now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed in devel this issue is already fixed in devel and will be closed after the next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants