A question on the flag '--inputfile' in the command 'datasets summary genome taxon' : taxid with no genome #450

DongHRLZU · 2025-02-09T06:05:44Z

Hello,

  I am a rookie for using your ncbi-datasets tools.
  Recently, I am using a taxid list as the options '--inputfile' to get the genome report when I use the command 'datasets summary genome taxon'. Yet, the question is that I cannot priorly know if each of the taxonomic IDs in my list can correspond an existing assembly genome or not. Thus, the programme always interrupted at a taxonomic ID without any known genome on NCBI .  
  How can I add a option to skip this taxonomic IDs without any genome corresponded in my workflow of getting genome report  and meantime output these ids to std err? I did not find the concerned in the help manual.
  Currently I have to give up the option '--inputfile' and in turn use a cyclic statement to solve it.

  Forward to your reply and help.

The text was updated successfully, but these errors were encountered:

ericcox1 · 2025-02-10T18:15:17Z

Hi @DongHRLZU,

Thanks for creating this issue. As you point out above, the current behavior is to abort if datasets encounters a taxid without genome data. Based on your feedback, we are going to change this behavior so it will still return genome data even if it encounters a taxid without genome data. This could take a little while due to competing priorities.

In the meantime, you may be interested in using the taxonomy data report counts data to check whether a particular taxid has genome data.

Given a list, tax.list, where 9606 and 10090 have genome data and 105513 does not, you can check the taxonomy data report to see whether genome data is available, and filter out taxids without genome data:

# Given a taxid list with a mixture of taxids with and without genome data
cat tax.list
9606
105513
10090

# Use datasets to check the taxonomy data report for genome assembly counts, then filter out taxids without genomes
datasets summary taxonomy taxon --inputfile tax.list | \
jq -r '.reports[].taxonomy | (if .counts[]?.type=="COUNT_TYPE_ASSEMBLY" then .tax_id else empty end)'
10090
9606

Best,
Eric

DongHRLZU · 2025-02-11T02:36:00Z

Well，I didn't realize I can check the genome assembly counts at first by using the datasets summary taxonomy. Of course, the availability in genome for each taxid in the output of datasets summary genome can be more convenient and straightforward, I think. Anyway, thank you for taking my question seriously.

ericcox1 added the enhancement New feature or request label Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on the flag '--inputfile' in the command 'datasets summary genome taxon' : taxid with no genome #450

A question on the flag '--inputfile' in the command 'datasets summary genome taxon' : taxid with no genome #450

DongHRLZU commented Feb 9, 2025

ericcox1 commented Feb 10, 2025

DongHRLZU commented Feb 11, 2025

A question on the flag '--inputfile' in the command 'datasets summary genome taxon' : taxid with no genome #450

A question on the flag '--inputfile' in the command 'datasets summary genome taxon' : taxid with no genome #450

Comments

DongHRLZU commented Feb 9, 2025

ericcox1 commented Feb 10, 2025

DongHRLZU commented Feb 11, 2025