Allow running of metatdata subworkflow on multiple specimen IDs #114

BethYates · 2024-06-13T11:51:21Z

Description of feature

A genome note provides meta data related to the specimen used to produce the genome assembly, the specimen used to generate HiC data and the specimen used to produce RNA-Seq data. These may all be different specimens. The genome note pipeline should be able to take in each of these IDs and run the metadata subworkflow on each, recording the relevant data for use in the publication

BethYates · 2024-06-26T15:15:54Z

The genome_metadata subworkflow will be introduced in version 2.0 of the genome note pipeline and is currently only present on the public_dev branch of the repository. To work on this issue you will need to create a feature branch from the public_dev branch rather than the dev branch. Pushing development for the 2.0 release to the public_dev branch allows us to keep the dev branch clean in case we need to push some bug fixes from there to the main release branch.

BethYates · 2024-07-25T12:22:32Z

To close this issue:

Rename the biosample parameter to biosample_wgs and add two additional parameters biosample_hic and biosample_rna to nextflow.config the value of these should be set to null
Update test.config, test_full.config to contain values for the new parameters that you have added/changed. For the test profile biosample_hic="SAMEA7520846" and biosample_rna="SAMEA7521081" for the test_full profile biosample_hic="SAMEA7519968" and biosample_rna=null
Modify genome_metadata.nf so that all of the files in ch_file_list that contains a "BIOSAMPLE_ACCESSION" are added to the file_list channel for each of the biosample parameters. In some cases (as in the test_full profile) biosample_rna will be null and should be ignored - the code needs to handle this
Modify the metadata in genome_metadata.nf to include a biosample_type, the value for this should be either "WGS", "HIC", "RNA" or "" if the file is not related to a biosample.
Modify run_wget.nf to include the biosample_type in the output file name where the biosample_type is not an empty string.
Modify parse_metadata.nf to include the biosample_type in the output file name where the biosample_type is not an empty string.
Modify parse_xml_ena_biosample.py to extract the biosample_type from the output file name passed to the script. In For the HiC and RNASeq biosample accession use this biosample_type to prefix the parameter names written to the output file (e.g. for the biosample_hic IDENTIFIER would become HIC_IDENTIFIER, for the biosample_rna SPECIMEN_ID would become RNA_SPECIMEN_ID) 9. Update docs/usage.md and nextflow_schema.json to include the new/renamed parameters

BethYates added the enhancement Improvement of the existing features label Jun 13, 2024

reichan1998 self-assigned this Jul 3, 2024

BethYates referenced this issue in reichan1998/genomenote Jul 25, 2024

Allow adding multiple biosample

a2446eb

reichan1998 mentioned this issue Jul 30, 2024

Allow running of metatdata subworkflow on different biosample types #132

Merged

9 tasks

reichan1998 linked a pull request Jul 30, 2024 that will close this issue

Allow running of metatdata subworkflow on different biosample types #132

Merged

9 tasks

BethYates closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow running of metatdata subworkflow on multiple specimen IDs #114

Allow running of metatdata subworkflow on multiple specimen IDs #114

BethYates commented Jun 13, 2024 •

edited

Loading

BethYates commented Jun 26, 2024

BethYates commented Jul 25, 2024

Allow running of metatdata subworkflow on multiple specimen IDs #114

Allow running of metatdata subworkflow on multiple specimen IDs #114

Comments

BethYates commented Jun 13, 2024 • edited Loading

Description of feature

BethYates commented Jun 26, 2024

BethYates commented Jul 25, 2024

BethYates commented Jun 13, 2024 •

edited

Loading