- Currently expecting Ensembl-processed input here.
- Direct NCBI Entrez/RefSeq-processed input requires package update.
- Rework to not require gene2symbol input here?
- Need to rework ‘keyType’ and ‘gene2symbol’ here.
- Rework `geneSetFiles` argument. Assign the names from geneSetFiles automatically if necessary.
- Add option to match via Entrez IDs instead of gene symbols.
- Rethink our symbol averaging expression approach. Consider allowing the user to set this?
- Rework internal handling of gene2symbol as rowRanges instead.
- Get the keyType from metadata here.
- Ensure that gene sets match expected metadata…parse the first one and check for identifiers…
- Rethink this, unit test?
- Allow the user to pass in positional contrast or collection.
- Inform the user in the results call what we matched…
- Support multiple contrasts here.
- Allow the user to set title and subtitle here.
- Allow the user to select which contrasts to plot.
- Rework contrast input here, supporting multiple.
- Rework this as multi-contrast support.
- Rework this into a single call, and check for “contrast” column.
- Add an option here to plot all contrasts.
- Allow the user to input specific contrasts.
- Also allow the user to select which pathways to plot. This is super useful for non-hallmark gene sets…
- Allow user to pick specific gene sets from collection.
- Allow the user to set how to handle duplicate identifiers / symbols here. Default behavior is to average.
- Need to add code coverage for duplicate handling.
- Increase the verbosity about matching here.
- Consider reworking how we handle Entrez identifier matching from input data that contains Ensembl identifiers.
- Can we add support for edgeR analysis here?
- Consider adding support for limma as well.
- Exclude identifiers that are scaffolds, etc.
- Ensure we exclude scaffolds and stuff that shouldn’t be averaged here by default.
- Consider adding method support for matrix here, which is useful for a table of values across multiple contrasts.
- Rename keyType to “entrez” or “symbols” here....simpler
- Assign names from gene set files automatically.
- Need to rethink our geneId, entrezId approach here…
- Don’t set the keyType as “geneName” or “geneId” here?.
- Rework this, consider not reexporting…too generic…
- Change the gene2symbol to rowData here.
- rowData MUST contain geneName, geneId, entrezId.
- Drop elements that map to scaffolds here…
- Check Entrez identifier metadata here…
- Can we get information on Entrez version for the gene sets here? Maybe we can improve our identifier matching a bit…
- Switch to using `EntrezGeneInfo()` approach here to map input against
- MSigDB sets…
- Censor Entrez identifiers that are no longer active.
- Always require rowRanges.
- Inform the user about what type of keyType we’re using for matching.
- What do we do with Ensembl-to-Entrez matches that aren’t 1:1?
- Consider defaulting here to entrezId.
- Define the RankedList pattern here based on geneId.
- geneId is too vague here, is this ever used?
- rename keyType to “entrez” and “symbols”
- Add support and code coverage for direct NCBI Entrez / RefSeq input.
- Add support for edgeR and limma/voom here, based on DataFrame.
- Support input of “all” or specific contrasts.
- Need to slot contrast and collection in metadata here…
- Rethink our approach that works on leading edge.
- Rework this directly into main `results` function…
- Allow user to look up by position.
- Allow results extract of multiple contrasts here.
- Slot multiContrast in metadata here?
See also `DESeqAnalysis::plotDEGUpset()`, for looping inspiration.
Currently relies on external file paths, which breaks easily.
rework our default approach to map to Entrez gene identifiers instead? Is this less problematic? Alternatively, can we select for unique gene symbols that don’t map to gene scaffolds, non-coding genes? Think about this one…
identifier name, or we may need to filter out scaffold identifiers. Otherwise, could potentially run into unwanted matches: e.g. “FH” vs. “LRG_504”.
Currently relies on external file paths, which breaks easily.
Refer to bcbio-rnaseq code repo (roryk) for inspiration on this point.
Consider also saving sessionInfo?
Make `humanize()` a separate function call, and add method support. Dispatch onto SummarizedExperiment for DESeqDataSet and DESeqTransform. Need to define an internal humanize method here for DESeqResults.
Can use `priorInfo` to test for this.
Define a `metadata` list and slot prototype metadata.
Check for metadata mismatch in DESeqTransform (e.g. interestingGroups) and update automatically in `DESeqAnalysis()` call.
Need to figure out the language here.
https://bioinformatics.stackexchange.com/questions/149/are-fgsea-and-broad-institute-gsea-equivalent
We can stash metadata in the `metadata()` slot. Particularly useful is including the value type here.
Use `GSEABase::getGmt()`.