forked from sanger-tol/genomenote
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
12bff64
commit a2446eb
Showing
6 changed files
with
80 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a2446eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really close to doing what we need, there are a few changes that I think could improve things. I've updated the original issue to try and explain things a bit better but I'll add some specific comments here too.
Rather than passing through a single parameter with a list of biosample IDs I would prefer to pass them as individual parameters this is because the data type for each of the biosamples is important and will be needed later when generating the genome note document. There are three types of biosample data that we are concerned with:
I would like to see a parameter for each of these added to the config you could then replace your line on genome_metdata.nf
params.biosample.split(',').each { biosample ->
with something like
var biosamples = [["WGS", params.biosample_wgs ], ["HIC", params.biosample_hic], ["RNA", params_biosample_rna]] biosamples.each { biosampleType, biosampleID ->
In your metadata array you could then include the biosampleType as a new item "biosample_type" and remove the "biosample" entry. biosampleID would then be used to replace the "BIOSAMPLE_ACCESSION" in the replaceAll()
a2446eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In run_wget.nf it is more informative for us to know what type of biosample data we are dealing with than to know the biosample ID in the output file so I would suggesting modiying the name of the output file, you could do something like this:
def is_biosample = (meta.biosample_type == "WGS" | meta.biosample_type == "HIC" | meta.biosample_type == "RNA" ) ? "_${meta.biosample_type}" : ""
def output = "${meta.id}_${meta.source}_${meta.type}${is_biosample}.${meta.ext}"
I would make the same change to the output file in parse_metadata.nf too
a2446eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would then modify the script that processes the biosample related files, parse_xml_ena_biosample.py. As the output file name in the
FILE_OUT
argument passed to the script now contains information in the biosample data type you can extract this from the file name and use it to prefix the parameter names in the output file, e.g COLLECTOR would become either "COLLECTOR" or "HIC_COLLECTOR" or "RNA_COLLECTOR". That way when we merge all the individual metadata files we will know which values correspond to which type of biosample and we will be able to use these values to fill in the genome note template file.I would only prefix the parameters that correspond to the HiC and RNASeq biosample accessions as this is what the template file we are using to produce the genome note expects.
a2446eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make the changes suggested above I think that the changes you have made to combine_statistics_data.py and combine_metadata.nf should not be needed.
In combine_metadata.nf you will need to add change
def file_name = "--" + item.getName().minus("${prefix}_").minus(".${file_ext}") + "_file"
to
def file_name = "--" + item.getName().minus("${prefix}_").minus(".${file_ext}").toLowerCase() + "_file"
though.
And in combine_parsed_data,nf you can simply things if you remove your changes and instead add the follwing to the files list
and add
to the parse_args function