Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique subsystem of rxn #307

Merged
merged 19 commits into from
Jun 16, 2022
Merged

Unique subsystem of rxn #307

merged 19 commits into from
Jun 16, 2022

Conversation

cheng-yu-zhang
Copy link
Collaborator

@cheng-yu-zhang cheng-yu-zhang commented Apr 30, 2022

Main improvements in this PR:

*Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE
Add subsystem of rxns systematacially and the data is in file ‘Rxn_unique_subsystem.tsv’
The maping file is in 'KEGG_pathways_maping.tsv'.
I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Selected develop as a target branch (top left drop-down menu)
  • If needed, asked first in the Gitter chat room about this PR

hongzhonglu and others added 3 commits July 23, 2020 17:41
Besides the current subsystem defined for each rxn based on kegg database. A manually curated subsystem is added for each rxn.
Add the missing subsystem information for newly added reactions.
Update the column names.
hongzhonglu and others added 3 commits May 9, 2022 23:52
Besides the current subsystem defined for each rxn based on kegg database. A manually curated subsystem is added for each rxn.
Add the missing subsystem information for newly added reactions.
Update the column names.
@edkerk edkerk force-pushed the unique_subsystem_of_rxn branch from 80fbf30 to 28cdacb Compare May 9, 2022 21:55
@edkerk
Copy link
Member

edkerk commented May 9, 2022

I rebased it to the latest develop branch. But it now requires a function to apply these changes.

A few comments:

  • Remove the sceXXXXX codes at the end of some subsystems. KEGG pathways can be stored as separate annotation.
  • Simplify glycerolipid metabolism / glycerophospholipid metabolism ( sce00561,sce00564 ) into glycerolipid metabolism

@edkerk
Copy link
Member

edkerk commented May 26, 2022

Will you also write a function to apply those curations? And are the two files not mostly containing the same information?

@hongzhonglu
Copy link
Collaborator

Will you also write a function to apply those curations? And are the two files not mostly containing the same information?

Hi Ed, do you mean find some applications for this curations? The unique subsystem is mainly used to generate the metabolic map. May be also helpful for the function classification of reactions.

@edkerk
Copy link
Member

edkerk commented May 28, 2022

No application. But at the moment it is just a text file with subsystems, it's not incorporated in the model. I can also do that, just wondered if this was worked on already, otherwise it would be double work.

@mihai-sysbio
Copy link
Member

I too am looking forward to having the unique subsystem integrated in the model file(s).

@hongzhonglu
Copy link
Collaborator

Yes. These unique subsystem should be worked directly. But in future maybe we need to consider how to organize the subsystem in a more reasonable way as there are different standards-pathway definition in KEGG and biocyc. @edkerk

@edkerk
Copy link
Member

edkerk commented May 30, 2022

KEGG pathways are on identifiers.org, so they can easily be incorporated as kegg.pathway annotations. Does KEGG_pathways_maping.tsv supposed to contain those annotations? It seems to be the right length (numel(model.rxns)), but the file doesn't mention reaction IDs, so a bit dangerous to match just by order in the file.

Copy link
Member

@edkerk edkerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very valuable work! I just have a few comments to ensure consistency and that most reactions are assigned to a subSystem.

data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/Rxn_unique_subsystem.tsv Outdated Show resolved Hide resolved
data/modelCuration/KEGG_pathways_maping.tsv Outdated Show resolved Hide resolved
@edkerk
Copy link
Member

edkerk commented May 30, 2022

I've also pushed a v8_6_0 script that applies the change to the model. Note that complete functioning of that script would require #313 to be merged to develop and develop merged in this branch. But I will not yet do this, as I first want to test how it works when this PR is contributing another change to the same v8_6_0 script as used in #313. From release 8.6.1 this will all be resolved.

@edkerk
Copy link
Member

edkerk commented Jun 5, 2022

Keep in mind what the purpose is of subSystems, to group reactions that are involved in similar metabolic processes. Often this refers to pathways, but this is not always the case. Having subsystems with just 1 or very few reactions is not very helpful, as it doesn't help with grouping reactions.

Subsystems with very few reactions that might be better merged:

  • Merge Various types of N-glycan biosynthesis into N-glycan biosynthesis.
  • One reaction r_0362 is assigned to Other types of O-glycan biosynthesis, but this is a weird name for a subsystem if there are no other subsystems with O-glycan biosynthesis. As the reaction is somewhat similar to the reverse of DPM1 (making mannan, instead of man-GDP), it can be merged with N-glycan biosynthesis.
  • Merge Atrazine degradation, Aminobenzoate degradation, Styrene degradation, Chloroalkane and chloroalkene degradation, Limonene and pinene degradation and Bisphenol degradation in Alternate carbon metabolism, otherwise they are mostly just 1 reaction per subsystem.

Then there are a few subsystems that are somewhat duplications, which perhaps should be better merged? I'm not 100% certain, but something to consider (@feiranl @mihai-sysbio, what do you think?):

  • There are separate Arginine and proline metabolism and Arginine biosynthesis subsystems, should the latter not be merged with the first?
  • Similar with Phenylalanine metabolism and Phenylalanine, tyrosine and tryptophan biosynthesis?
  • Can Beta-alanine metabolism be merged with Alanine, aspartate and glutamate metabolism?

Finally there are still many reactions with None or Other as subsystem. This should be avoided.

  • Assign r_0454 fumarate reductase to TCA cycle?
  • There are many peptidase reactions, they can be in a subsystem Dipeptidases?
  • Assign r_4491 arabinose reductase to Alternate carbon metabolism?
  • In general, just go through the list of reaction associated to None and Other and try to assign them to existing subsystems, or define new ones if needed (e.g. Dipeptidases, as I cannot think of any other subsystem where they would fit in, but at least they are now all grouped).

Particularly for the last set of changes, it might be good to explicitly document the changes and arguments for each assignment.

@edkerk
Copy link
Member

edkerk commented Jun 5, 2022

@cheng-yu-zhang Why are there two text files, uniqueSubsystems.tsv and keggPathways.tsv? In addRxnSubsystem.m, it uses the keggPathways.tsv file to write the subSystems, which seems incorrect, KEGG pathway annotation is something that will be fixed in a separate PR. In addition, there is no need for addRxnsSubsystem.m, as v8_6_0.m has the required code to make those changes. And this is not a type of curation that will routinely be performed, so there is no need for a generic function.

Now with #313 merged and develop merged into this PR, you should be able to run v8_6_0 to apply all model curations. The model tests are disabled for now, until SysBioChalmers/RAVEN#421 is merged and released.

edkerk added 3 commits June 5, 2022 12:47
@edkerk edkerk mentioned this pull request Jun 5, 2022
4 tasks
@cheng-yu-zhang
Copy link
Collaborator Author

Most Other and None have been avoided and the corresponding reasons are provided in uniqueSubsystem.tsv, but I can not find proper subsystems for r_0331, r_4198, r_4231.

@edkerk edkerk self-requested a review June 7, 2022 20:43
@haowang-bioinfo
Copy link
Member

@cheng-yu-zhang this feature of providing unique subSystem info for reactions is great!

Just a minor comment to r_2126, would be nice to adjust its subsystem to Glycolysis / gluconeogenesis (use lower case g for the first letter of gluconeogenesis) which then would be consistent with the others in this group.

@edkerk edkerk merged commit 5f49f66 into develop Jun 16, 2022
@edkerk edkerk deleted the unique_subsystem_of_rxn branch June 16, 2022 12:07
@edkerk edkerk mentioned this pull request Jun 16, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants