You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a rookie for using your ncbi-datasets tools.
Recently, I am using a taxid list as the options '--inputfile' to get the genome report when I use the command 'datasets summary genome taxon'. Yet, the question is that I cannot priorly know if each of the taxonomic IDs in my list can correspond an existing assembly genome or not. Thus, the programme always interrupted at a taxonomic ID without any known genome on NCBI .
How can I add a option to skip this taxonomic IDs without any genome corresponded in my workflow of getting genome report and meantime output these ids to std err? I did not find the concerned in the help manual.
Currently I have to give up the option '--inputfile' and in turn use a cyclic statement to solve it.
Forward to your reply and help.
The text was updated successfully, but these errors were encountered:
Thanks for creating this issue. As you point out above, the current behavior is to abort if datasets encounters a taxid without genome data. Based on your feedback, we are going to change this behavior so it will still return genome data even if it encounters a taxid without genome data. This could take a little while due to competing priorities.
In the meantime, you may be interested in using the taxonomy data report counts data to check whether a particular taxid has genome data.
Given a list, tax.list, where 9606 and 10090 have genome data and 105513 does not, you can check the taxonomy data report to see whether genome data is available, and filter out taxids without genome data:
# Given a taxid list with a mixture of taxids with and without genome data
cat tax.list
9606
105513
10090
# Use datasets to check the taxonomy data report for genome assembly counts, then filter out taxids without genomes
datasets summary taxonomy taxon --inputfile tax.list | \
jq -r '.reports[].taxonomy | (if .counts[]?.type=="COUNT_TYPE_ASSEMBLY" then .tax_id else empty end)'
10090
9606
Well,I didn't realize I can check the genome assembly counts at first by using the datasets summary taxonomy. Of course, the availability in genome for each taxid in the output of datasets summary genome can be more convenient and straightforward, I think. Anyway, thank you for taking my question seriously.
Hello,
The text was updated successfully, but these errors were encountered: