Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets not working from one of our servers: getting timeouts connecting to api.ncbi.nlm.nih.gov #449

Open
corneliusroemer opened this issue Feb 6, 2025 · 11 comments
Labels
bug Something isn't working

Comments

@corneliusroemer
Copy link

corneliusroemer commented Feb 6, 2025

As mentioned in #448, we at Loculus (https://github.com/loculus-project/loculus/actions) have consistently been getting errors using datasets since about 2024-02-04 - but only on the Hetzner servers we run our previews on. Note: We can't reproduce locally. Our servers make a fair number of requests, so it's possible we've been rate limited or blocked - however without getting a proper 429 http status code.

We noticed that from this server, requests to api.ncbi.nlm.nih.gov time out:

curl -v api.ncbi.nlm.nih.gov
* Host api.ncbi.nlm.nih.gov:80 was resolved.
* IPv6: 2607:f220:41e:4290::110
* IPv4: 130.14.29.110
*   Trying [2607:f220:41e:4290::110]:80...
* Immediate connect fail for 2607:f220:41e:4290::110: Network is unreachable
*   Trying 130.14.29.110:80...
* connect to 130.14.29.110 port 80 from 172.17.0.2 port 35892 failed: Connection timed out
* Failed to connect to api.ncbi.nlm.nih.gov port 80 after 129534 ms: Could not connect to server
* closing connection #0
curl: (28) Failed to connect to api.ncbi.nlm.nih.gov port 80 after 129534 ms: Could not connect to server

I'm just opening this issue for general awareness, and to help other datasets users who might be getting the same error.

I'm aware of the banner on NCBI websites and will email the help desk:

Some NLM-NCBI services and products are experiencing heavy traffic, which may affect performance and availability. We apologize for the inconvenience and appreciate your patience. For assistance, please contact our Help Desk at [email protected].

For debugging purposes, our requests should be coming from 65.108.0.35.

@corneliusroemer corneliusroemer added the bug Something isn't working label Feb 6, 2025
@ericcox1
Copy link
Collaborator

Hi @corneliusroemer,

Are you still seeing the same error?

-Eric

@theosanderson
Copy link

Hi @ericcox1,

(I am part of the same team as Cornelius)

We are still seeing the same issue which we now have a better repro for:

curl -X POST 'https://api.ncbi.nlm.nih.gov/datasets/v2/taxonomy/taxon_suggest' \
-H 'User-Agent: OpenAPI-Generator/1.0.0/go' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-H 'X-Datasets-Client: datasets-cli' \
-H 'X-Datasets-Client-Arch: amd64' \
-H 'X-Datasets-Client-Os: linux' \
-H 'X-Datasets-Client-Version: 16.40.1' \
-d '{
 "exact_match": true,
 "tax_rank_filter": "higher_taxon",
 "taxon_query": "186540",
 "taxon_resource_filter": "TAXON_RESOURCE_FILTER_ALL"
}'

we get

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://misuse.ncbi.nlm.nih.gov/error/abuse.shtml">here</a>.</p>
</body></html>

which is fairly self-explanatory that you think we are hitting you with too many requests. We are actively working to mirror our own copy of the datasets we use to prevent this. We believe we shouldn't have issued many requests from this IP in recent weeks, so if it is possible to clear the ban we'd be grateful.

@ericcox1
Copy link
Collaborator

Hi @theosanderson,

Thanks for this information. There isn't a block at the datasets level but we're going to reach out to another team to get this ban lifted.

-Eric

@corneliusroemer
Copy link
Author

Thanks @ericcox1 and @theosanderson. I can add that the blocking doesn't just affect Loculus servers, it also occurs to other teams/individuals we work with, some also on Hetzner servers, some working from totally different University subnets.

@olearyna
Copy link
Contributor

Hi @corneliusroemer and @theosanderson

We're in the process of unblocking some IPs. However, it's important to note that recent attempts to download the entire NCBI database have significantly strained our systems. These actions have not only slowed down our services but have also impacted access for other users.
To prevent further issues, we must caution that if large-scale downloads are attempted again—even from a single IP within the data center—we will have no choice but to reimpose blocks. Your help in spreading the word that such activities will trigger blocks would be greatly appreciated .

Thank you for your understanding and cooperation.

Nuala

Nuala O'Leary, Ph.D.
Product Owner, NCBI Datasets
NCBI/NLM/NIH

@theosanderson
Copy link

theosanderson commented Feb 21, 2025

Thank you @olearyna ,

For our part we have never made any attempts to download the entire database. We have made repeated requests for a limited set of organisms and have now worked to de-duplicate these to minimize impact on NCBI.

All the best,

Theo
Pathoplexus / Loculus

@olearyna
Copy link
Contributor

Hi @corneliusroemer and @theosanderson,

You should now be unblocked. Thanks for your patience. If you're still having any trouble, could you share the URL and your IP address? We'd be happy to help troubleshoot.

Nuala

@theosanderson
Copy link

Thank you @olearyna, but unfortunately we are still seeing this. Our IP is 65.108.0.35.

Our command to reproduce is curl https://api.ncbi.nlm.nih.gov

which returns:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://misuse.ncbi.nlm.nih.gov/error/abuse.shtml">here</a>.</p>
</body></html>

@olearyna
Copy link
Contributor

Ok, thank you. We're discussing we're discussing with our systems team. We'll keep you posted.

Nuala

@olearyna
Copy link
Contributor

olearyna commented Feb 28, 2025

Hi @corneliusroemer and @theosanderson,

You should be unblocked now. Can you try again and let us know if you're still having problems?

Thanks
Nuala

@theosanderson
Copy link

Apologies that we were so slow to respond. I have just checked and we still appear to be blocked. This is less of a pressing issue for us as we now have workarounds to mirror the relevant organisms for us - but just to let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants