Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are we handling promoter vs enhancer motif outputs from evaluation? #21

Open
adamklie opened this issue Aug 21, 2024 · 4 comments
Open
Assignees
Labels
dashboard enhancement New feature or request question Further information is requested

Comments

@adamklie
Copy link
Collaborator

We used to have a column for this in the output file for motif enrichment, but that is not there in the latest outputs (presumably because E2G links weren't there yet).

ProgramID EPType TFMotif PValue FDR Enrichment
K60_1 Promoter AHR 0.044631 0.210088 1.594955
K60_10 Promoter AHR 0.351685 0.67633 1.242518
K60_11 Promoter AHR 0.681555 0.885289 0.901666
K60_12 Promoter AHR 0.446282 0.745748 1.204339

How do we want to handle this more generally? The two ways I could see for the dashboard:

  1. Include a typecolumn for this regardless of what enrichment is run on.
  2. Output separate files for each type and name differently
  1. seems better and more flexible, but either will work.
@adamklie adamklie added question Further information is requested dashboard labels Aug 21, 2024
@aron0093
Copy link
Collaborator

aron0093 commented Sep 5, 2024

The config file for snakemake pipeline has separate paramters for P2G and E2G links and the pipeline would store outputs for these two separately with appropriate naming. The motif enrichment code itself will now accept any genomic coordinates mapped to genes and run the enrichment without requiring a "class" column in the input or the user to specify a specific class.

I would prefer 2. for the dashboard since the idea would be to use the pipeline outputs as the default input for the dashboard.

@aron0093 aron0093 added the enhancement New feature or request label Sep 5, 2024
@aron0093 aron0093 self-assigned this Sep 5, 2024
@aron0093
Copy link
Collaborator

aron0093 commented Sep 5, 2024

The code needs to be further modified to not expect a seq_class column to bring it in line with standard E2G output formats and therefore not require the user to manually add this column before running our pipeline.

@adamklie adamklie added this to the 2024 Gene Program Jamboree milestone Sep 10, 2024
@adamklie
Copy link
Collaborator Author

Here is our proposed format for output files from this step:

  • {prog_key}_{E/P_type}_{database}_{test_type}_{stratification_key}_{level_key}_enrichment.txt
    • prog_key from config
    • stratification_key from config

For the jamboree we will just run individually and save as such. Main idea here is to have a separate file for each that the dashapp will load in. There will be several dropdowns for the user to choose between what they want to visualize

@adamklie
Copy link
Collaborator Author

I've also implemented something I think we should discuss at some point. For convenience, I adjusted pvals after calculating all the pearson tests across programs. Would it be better to do FDR correction at a program level instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants