Skip to content

Latest commit

 

History

History
133 lines (133 loc) · 7.94 KB

todo.org

File metadata and controls

133 lines (133 loc) · 7.94 KB

AcidGSEA

Development

Consider the warning / error approach on gene set (symbol / entrez id) mismatches.

Consider adding support for `ignoreVersion` for RankedList.

Consider adding numeric vector method support for RankedList.

Consider including alpha cutoff in subtitle for `plotGeneSet` and `plotNES`.

`RankedList` rework.

  • Currently expecting Ensembl-processed input here.
  • Direct NCBI Entrez/RefSeq-processed input requires package update.
  • Rework to not require gene2symbol input here?

`FGSEAList` rework.

  • Need to rework ‘keyType’ and ‘gene2symbol’ here.
  • Rework `geneSetFiles` argument. Assign the names from geneSetFiles automatically if necessary.
  • Add option to match via Entrez IDs instead of gene symbols.
  • Rethink our symbol averaging expression approach. Consider allowing the user to set this?
  • Rework internal handling of gene2symbol as rowRanges instead.
  • Get the keyType from metadata here.
  • Ensure that gene sets match expected metadata…parse the first one and check for identifiers…

`leadingEdge` rework?

  • Rethink this, unit test?
  • Allow the user to pass in positional contrast or collection.
  • Inform the user in the results call what we matched…

`plotGeneSet` rework.

  • Support multiple contrasts here.

`plotNES` rework.

  • Allow the user to set title and subtitle here.
  • Allow the user to select which contrasts to plot.
  • Rework contrast input here, supporting multiple.
  • Rework this as multi-contrast support.
  • Rework this into a single call, and check for “contrast” column.
  • Add an option here to plot all contrasts.
  • Allow the user to input specific contrasts.
  • Also allow the user to select which pathways to plot. This is super useful for non-hallmark gene sets…
  • Allow user to pick specific gene sets from collection.

`rankedList` rework.

  • Allow the user to set how to handle duplicate identifiers / symbols here. Default behavior is to average.
  • Need to add code coverage for duplicate handling.
  • Increase the verbosity about matching here.
  • Consider reworking how we handle Entrez identifier matching from input data that contains Ensembl identifiers.
  • Can we add support for edgeR analysis here?
  • Consider adding support for limma as well.
  • Exclude identifiers that are scaffolds, etc.
  • Ensure we exclude scaffolds and stuff that shouldn’t be averaged here by default.
  • Consider adding method support for matrix here, which is useful for a table of values across multiple contrasts.
  • Rename keyType to “entrez” or “symbols” here....simpler
  • Assign names from gene set files automatically.
  • Need to rethink our geneId, entrezId approach here…
  • Don’t set the keyType as “geneName” or “geneId” here?.
  • Rework this, consider not reexporting…too generic…
  • Change the gene2symbol to rowData here.
  • rowData MUST contain geneName, geneId, entrezId.
  • Drop elements that map to scaffolds here…
  • Check Entrez identifier metadata here…
  • Can we get information on Entrez version for the gene sets here? Maybe we can improve our identifier matching a bit…
  • Switch to using `EntrezGeneInfo()` approach here to map input against
  • MSigDB sets…
  • Censor Entrez identifiers that are no longer active.
  • Always require rowRanges.
  • Inform the user about what type of keyType we’re using for matching.
  • What do we do with Ensembl-to-Entrez matches that aren’t 1:1?
  • Consider defaulting here to entrezId.
  • Define the RankedList pattern here based on geneId.
  • geneId is too vague here, is this ever used?
  • rename keyType to “entrez” and “symbols”
  • Add support and code coverage for direct NCBI Entrez / RefSeq input.
  • Add support for edgeR and limma/voom here, based on DataFrame.

`results,FGSEAList` rework.

  • Support input of “all” or specific contrasts.
  • Need to slot contrast and collection in metadata here…
  • Rethink our approach that works on leading edge.

`.resultsForAllContrasts` rework.

  • Rework this directly into main `results` function…
  • Allow user to look up by position.
  • Allow results extract of multiple contrasts here.
  • Slot multiContrast in metadata here?

`enrichedGeneSets`: Allow user to pass in positional collection.

See also `DESeqAnalysis::plotDEGUpset()`, for looping inspiration.

Rework object to store gene sets directly in the object.

Currently relies on external file paths, which breaks easily.

Work on adding support for edgeR DGEList and limma into RankedList.

Explore alternative approaches to averaging gene symbols. Should we

rework our default approach to map to Entrez gene identifiers instead? Is this less problematic? Alternatively, can we select for unique gene symbols that don’t map to gene scaffolds, non-coding genes? Think about this one…

Compare results against GSEA Java client and gseapy Python approach.

`convertToHuman: May need to ensure that genomic ranges are sorted by

identifier name, or we may need to filter out scaffold identifiers. Otherwise, could potentially run into unwanted matches: e.g. “FH” vs. “LRG_504”.

Rework object to store gene sets directly in the object.

Currently relies on external file paths, which breaks easily.

Need to include which contrast is running in output.

Need to improve export message too.

Finish adding the heatmap support and release update.

Does topTables have lfcShrink support?

Add support for PC3, PC4, etc. in plotPCA.

Refer to bcbio-rnaseq code repo (roryk) for inspiration on this point.

Shade P values (by opacity) in MA and volcano plot?

Remove incorrect rowRanges metadata in example object.

Stash the date automatically in metadata.

Consider also saving sessionInfo?

Need to rethink the humanize support step here.

Make `humanize()` a separate function call, and add method support. Dispatch onto SummarizedExperiment for DESeqDataSet and DESeqTransform. Need to define an internal humanize method here for DESeqResults.

Check DESeqResults and lfcShrink consistency in validity check.

Check that all stashed res objects use the same alpha level cutoff as a validity check

Add a tighter assert check to ensure that `lfcShrink` contains shrunken values.

Can use `priorInfo` to test for this.

Need to slot DESeqAnalysis package version in object.

Define a `metadata` list and slot prototype metadata.

Check for metadata mismatch in DESeqTransform (e.g. interestingGroups) and update automatically in `DESeqAnalysis()` call.

`export()`: humanize mode needs to ALWAYS include `geneID` and `geneName` columns.

Add plotting support for `svalue` column generated from DESeq2, when shrinkage is applied.

Adaptation of plotMA() to show how much the lfcShrink() function affects the LFC shrinkage.

Need to add dendrogram support for getting a module of enriched genes.

GSEA table messes up when rendered inside an R Markdown header block.

Switch to using “collection” instead of “geneSet” or “pathway”.

Use “pathway” or “geneSet” as argument?

Need to figure out the language here.

Compare Broad GSEA pre-ranked to fgsea.

https://bioinformatics.stackexchange.com/questions/149/are-fgsea-and-broad-institute-gsea-equivalent

Add clusterProfiler GSEA function support.

`topTables()` conflicts with basejump?

`pfgsea()`: Switch to `matchArgsToDoCall()` approach here too, so the formals are clearer.

`statsList()`: Work toward returning as `List` instead.

We can stash metadata in the `metadata()` slot. Particularly useful is including the value type here.

Look into using GSEABase classes.

Use `GSEABase::getGmt()`.