Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify DE results dataframe to have columns "group1" and "group2" in addition to "comparison" #1074

Merged
merged 3 commits into from
May 29, 2021

Conversation

Munfred
Copy link
Contributor

@Munfred Munfred commented May 26, 2021

I often work with big dataframes of concatenated DE results among arbitrary groups, and so I like to have the names of the group1 and group2 labels in separate columns, instead of joined by a " vs " string, eg Endothelial vs Fibroblast.

So I often end up doing:

de_df['group1']=de_df['comparison'].str.split('vs', expand=True)[0]
de_df['group2']=de_df['comparison'].str.split('vs', expand=True)[1]

I made a small change so that the de_df result is now in this format already, for example the output from the api_overview tutorial would be:

de_df = model.differential_expression(
    groupby="cell_type",
    group1="Endothelial",
    group2="Fibroblast"
)
de_df.head()

Out[16]: 
        proba_de  proba_not_de  ...       group1      group2
ABTB2     0.9646        0.0354  ...  Endothelial  Fibroblast
DANT2     0.9642        0.0358  ...  Endothelial  Fibroblast
TMBIM4    0.9640        0.0360  ...  Endothelial  Fibroblast
LRRTM3    0.9640        0.0360  ...  Endothelial  Fibroblast
CLDN5     0.9638        0.0362  ...  Endothelial  Fibroblast

I implemented this as a breaking change, however if it is preferable I can instead make it an additional argument with the default as the old behavior.

@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.70%. Comparing base (03f0b49) to head (1cbca2d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1074   +/-   ##
=======================================
  Coverage   90.70%   90.70%           
=======================================
  Files          90       90           
  Lines        6743     6745    +2     
=======================================
+ Hits         6116     6118    +2     
  Misses        627      627           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adamgayoso
Copy link
Member

We were actually meaning to do this. Can we maybe for now keep both? Then if we keep both we can add a deprecation message or just keep both because it's just one column

@Munfred
Copy link
Contributor Author

Munfred commented May 27, 2021

Keeping both seems like a good idea for now, but shouldn't it be two new columns? res["group1"] = g1 and res["group2"] = g2

@adamgayoso
Copy link
Member

Keeping both seems like a good idea for now, but shouldn't it be two new columns? res["group1"] = g1 and res["group2"] = g2

Yes! my change didn't affect your addition of g2, you don't have to accept my change if you just want to do it yourself.

@Munfred
Copy link
Contributor Author

Munfred commented May 28, 2021

Ah! My bad thanks!

@adamgayoso adamgayoso merged commit f20ae55 into scverse:master May 29, 2021
@adamgayoso adamgayoso changed the title modify DE results dataframe to have columns "group1" and "group2" instead of "comparison" modify DE results dataframe to have columns "group1" and "group2" in addition to "comparison" Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants