Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow download of dehydrated .zip file #446

Open
Jtrachsel opened this issue Feb 4, 2025 · 6 comments
Open

Very slow download of dehydrated .zip file #446

Jtrachsel opened this issue Feb 4, 2025 · 6 comments

Comments

@Jtrachsel
Copy link

Hello,

I am downloading a large collection of bacterial genomes using the datasets download genome accession --dehydrated ... technique.
The initial download of the dehydrated .zip file is proceeding very slowly ~ 3.0kB/s. Is this rate expected? Is there anything I can do to speed this process up?

Once the .zip file is downloaded the rehydration proceeds very quickly.

Thanks

@ericcox1
Copy link
Collaborator

ericcox1 commented Feb 5, 2025

Hi @Jtrachsel,

Thanks for opening this issue.

We are investigating the slow download speed and I will comment on this thread with any updates.

Best,
Eric

@peterlaurin
Copy link

Was this ever resolved? I'm experiencing the same issue, but in addition it's throwing out the following error:

datasets download genome accession $genome_accession --dehydrated --annotated --assembly-source 'RefSeq'

> Collecting 186 genome records [================================================] 100% 186/186
> Downloading: ncbi_dataset.zip    18.9kB 170B/s
> Error: Download error: stream error: stream ID 5; INTERNAL_ERROR; received from peer
>
> Use datasets download genome accession <command> --help for detailed help about a command.

Thanks for your help!

@ericcox1
Copy link
Collaborator

Thank you for your comment, @peterlaurin. We have identified the problem and are working on a fix.

-Eric

@valery-shap
Copy link

I have the same issue.

@ericcox1
Copy link
Collaborator

Hi @valery-shap,

Thanks for your report. We are still working on a fix and hope to get this out soon.

Best,
Eric

@ericcox1 ericcox1 marked this as a duplicate of #455 Mar 7, 2025
@ericcox1
Copy link
Collaborator

ericcox1 commented Mar 7, 2025

We believe that the slow download reported here as well as the errors reported in #455 both stem from the same problem related to slow retrieval of md5 checksums. We are continuing to work on a fix and hope to have this resolved sometime next week.

Best,
Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants