You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I apologize for the slight abuse of the term "Issues", as I don't think the problem I'm encountering is a true issue of your project.
While using wayback, I've run into issues with the connection being closed by the remote host. I've been performing a lot of search requests/pulling mementos, and suspect I'm hitting a rate limit. However, I have put a large delay between queries (5ish seconds).
Is there a best practice on how much we should throttle usage, and are there other things that we should do beyond just looping over all our searches with a time.sleep call to avoid slamming the server?
The text was updated successfully, but these errors were encountered:
No worries! TBH, I’ve lost track of the current rate limits the Wayback Machine imposes, but I think earlier this year it was at 10 requests/second for both CDX search (i.e. WaybackClient.search()) and mementos (WaybackClient.get_memento()).
If you are using multiple threads, you can do some messy stuff to share connections across threads, which has helped us reduce connection errors with Wayback in these code samples:
That’s way over-complicated and I hope to get that functionality built-in to this package as part of #58.
You also might find some useful inspiration from other parts of the above script, which we use to pull in ~20 GB of data every night from Wayback. It’s really messy and a bit hard to follow, though. (It’s had a lot of iterations but limited time to really clean it up over the last few years, and is what this package was originally extracted from.)
(Sorry about the slow feedback here, @jordannickerson. I’ve been semi-offline for the last couple weeks.)
I apologize for the slight abuse of the term "Issues", as I don't think the problem I'm encountering is a true issue of your project.
While using wayback, I've run into issues with the connection being closed by the remote host. I've been performing a lot of search requests/pulling mementos, and suspect I'm hitting a rate limit. However, I have put a large delay between queries (5ish seconds).
Is there a best practice on how much we should throttle usage, and are there other things that we should do beyond just looping over all our searches with a time.sleep call to avoid slamming the server?
The text was updated successfully, but these errors were encountered: