Version 0.4.0
Breaking Changes
This release includes a significant overhaul of parameters for WaybackClient.search
.
-
Removed parameters that did nothing, could break search, or that were for internal use only:
gzip
,showResumeKey
,resumeKey
,page
,pageSize
,previous_result
. -
Removed support for extra, arbitrary keyword parameters that could be added to each request to the search API.
-
All parameters now use snake_case. (Previously, parameters that were passed unchanged to the HTTP API used camelCase, while others used snake_case.) The old, non-snake-case names are deprecated, but still work. They’ll be completely removed in v0.5.0.
matchType
→match_type
fastLatest
→fast_latest
resolveRevisits
→resolve_revisits
-
The
limit
parameter now has a default value. There are very few cases where you should not set alimit
(not doing so will typically break pagination), and there is now a default value to help prevent mistakes. We’ve also added documentation to explain how and when to adjust this value, since it is pretty complex. (#65) -
Expanded the method documentation to explain things in more depth and link to more external references.
While we were at it, we also renamed the datetime
parameter of WaybackClient.get_memento
to timestamp
for consistency with the CdxRecord
and Memento
classes. The old name still works for now, but it will be fully removed in v0.5.0.
Features
-
Memento.headers
is now case-insensitive. The keys of theheaders
dict are returned with their original case when iterating, but lookups are performed case-insensitively. For example:list(memento.headers) == ['Content-Type', 'Date'] memento.headers['Content-Type'] == memento.headers['content-type']
(#98)
-
There are now built-in, adjustable rate limits for calls to both
search()
andget_memento()
. The default values should keep you from getting temporarily blocked by the Wayback Machine servers, but you can also adjust them when instantiatingWaybackSession
:# Limit get_memento() calls to 2 per second (or one every 0.5 seconds): client = WaybackClient(WaybackSession(memento_calls_per_second=2)) # These now take a minimum of 0.5 seconds, even if the Wayback Machine # responds instantly (there's no delay on the first call): client.get_memento('http://www.noaa.gov/', timestamp='20180816111911') client.get_memento('http://www.noaa.gov/', timestamp='20180829092926')
A huge thanks to @LionSzl for implementing this. (#12)
Fixes & Maintenance
-
All API requests to archive.org now use HTTPS instead of HTTP. Thanks to @sundhaug92 for calling this out. (#81)
-
Headers from the original archived response are again included in
Memento.headers
. As part of this, theheaders
attribute is now case-insensitive (see new features above), since the Internet Archive servers now return headers with different cases depending on how the request was made. (#98)