Skip to content

Version 0.4.0

Compare
Choose a tag to compare
@Mr0grog Mr0grog released this 10 Nov 18:35
· 42 commits to main since this release
e2af777

Breaking Changes

This release includes a significant overhaul of parameters for WaybackClient.search.

  • Removed parameters that did nothing, could break search, or that were for internal use only: gzip, showResumeKey, resumeKey, page, pageSize, previous_result.

  • Removed support for extra, arbitrary keyword parameters that could be added to each request to the search API.

  • All parameters now use snake_case. (Previously, parameters that were passed unchanged to the HTTP API used camelCase, while others used snake_case.) The old, non-snake-case names are deprecated, but still work. They’ll be completely removed in v0.5.0.

    • matchTypematch_type
    • fastLatestfast_latest
    • resolveRevisitsresolve_revisits
  • The limit parameter now has a default value. There are very few cases where you should not set a limit (not doing so will typically break pagination), and there is now a default value to help prevent mistakes. We’ve also added documentation to explain how and when to adjust this value, since it is pretty complex. (#65)

  • Expanded the method documentation to explain things in more depth and link to more external references.

While we were at it, we also renamed the datetime parameter of WaybackClient.get_memento to timestamp for consistency with the CdxRecord and Memento classes. The old name still works for now, but it will be fully removed in v0.5.0.

Features

  • Memento.headers is now case-insensitive. The keys of the headers dict are returned with their original case when iterating, but lookups are performed case-insensitively. For example:

    list(memento.headers) == ['Content-Type', 'Date']
    memento.headers['Content-Type'] == memento.headers['content-type']

    (#98)

  • There are now built-in, adjustable rate limits for calls to both search() and get_memento(). The default values should keep you from getting temporarily blocked by the Wayback Machine servers, but you can also adjust them when instantiating WaybackSession:

    # Limit get_memento() calls to 2 per second (or one every 0.5 seconds):
    client = WaybackClient(WaybackSession(memento_calls_per_second=2))
    
    # These now take a minimum of 0.5 seconds, even if the Wayback Machine
    # responds instantly (there's no delay on the first call):
    client.get_memento('http://www.noaa.gov/', timestamp='20180816111911')
    client.get_memento('http://www.noaa.gov/', timestamp='20180829092926')

    A huge thanks to @LionSzl for implementing this. (#12)

Fixes & Maintenance

  • All API requests to archive.org now use HTTPS instead of HTTP. Thanks to @sundhaug92 for calling this out. (#81)

  • Headers from the original archived response are again included in Memento.headers. As part of this, the headers attribute is now case-insensitive (see new features above), since the Internet Archive servers now return headers with different cases depending on how the request was made. (#98)