Skip to content

Latest commit

 

History

History
46 lines (38 loc) · 1.83 KB

Entrez.md

File metadata and controls

46 lines (38 loc) · 1.83 KB

I use NCBI Entrez Direct UNIX E-utilities regularly for sequence and data retrieval from NCBI. These UNIX utils can be combined with any UNIX commands

Looking for specific bugs:

esearch -db "nucleotide" -query "Faecalibacterium prausnitzii[ORGN]"|  efetch -format fasta

Download genes using assembly accesion ID

esearch -db assembly -query  AP014658.1 | elink -target nuccore -name assembly_nuccore_insdc | efetch -format fasta_cds_na > B_longum_ldh4.genes.fna

rhino virus:

esearch -db "nucleotide" -query "Rhinovirus[ORGN]"|  efetch -format fasta | grep '>' | head

specific genes:

esearch -db gene -query "Liver cancer AND Homo sapiens" |elink -target nuccore | efetch -format fasta

filter all bacteria with a filter

esearch  -db "nucleotide" -query "Bacteria[Organism] AND Refseq[Filter]" | efetch -format fasta  

download based on accession

esearch -db assembly -query GCF_000508965.1 | elink -target nucleotide -name assembly_nuccore_insdc | efetch -format fasta > GCF_000508965.1.fna

get assembly status

search -db assembly -query "Veillonella sp. DNF00869" |  esummary |xtract -pattern DocumentSummary -element SpeciesName,assembly-status >> assembly_status.txt

get proteins

esearch -db "protein" -query "baif[gene]"   | efetch -format fasta  > test;  touch search ; echo baif > search ;awk -v RS="\n>" -v FS="\n" '$1 ~ /baif/  {print ">"$0}' test > Baib.faa
esearch -db "protein" -query ACJ51375.1  | efetch -format  fasta >> humann2_hmo_genes.faa

Great resource for downloading Entrez on your compute

Tips on efetch values