Skip to content

Commit

Permalink
Fix target_window when redirect is > 1 day (#53)
Browse files Browse the repository at this point in the history
The target_window parameter for `get_memento()` is supposed to limit how far off in time from the requested time you can get a memento for when the `exact` parameter is `False`. However, it only works correctly when the target is off by less than a day! (The scenarios we were originally concerned with in EDGI almost invariably were on the scale of a few hours, so I guess that's how this error snuck in.) If you set `exact=True` (the default), this bug wouldn’t be triggered.

We were checking the target offset by the number of seconds, irrespective of the additional number of days involved. This fixes the issue by checking by total_seconds, which converts the days into seconds and includes them.
  • Loading branch information
Mr0grog committed Oct 19, 2020
1 parent 4dd3814 commit a3372fe
Show file tree
Hide file tree
Showing 5 changed files with 624 additions and 2 deletions.
6 changes: 6 additions & 0 deletions docs/source/release-history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@
Release History
===============

v0.2.5 (2020-10-19)
-------------------

This release fixes a bug where the ``target_window`` parameter for :meth:`wayback.WaybackClient.get_memento` did not work correctly if the memento you were redirected to was off by more than a day from the reequested time. See `#53 <https://github.com/edgi-govdata-archiving/wayback/pull/53>`_ for more.


v0.2.4 (2020-09-07)
-------------------

Expand Down
2 changes: 1 addition & 1 deletion wayback/_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -748,7 +748,7 @@ def get_memento(self, url, exact=True, exact_redirects=None,
# been produced by an earlier memento redirect -- it's
# just the *closest* one. The first job here is to make
# sure it fits within our target window.
if abs(target_date - original_date).seconds <= target_window:
if abs(target_date - original_date).total_seconds() <= target_window:
# The redirect will point to the closest-in-time
# SURT URL, which will often not be an exact URL
# match. If we aren't looking for exact matches,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
interactions:
- request:
body: null
headers:
Accept-Encoding:
- gzip, deflate
User-Agent:
- wayback/0.2.4.post2.dev0+g0c2a63c (+https://github.com/edgi-govdata-archiving/wayback)
method: GET
uri: http://web.archive.org/web/20171101000000id_/https://www.fws.gov/birds/
response:
body:
string: ''
headers:
Connection:
- keep-alive
Content-Length:
- '0'
Content-Type:
- text/plain; charset=utf-8
Date:
- Mon, 19 Oct 2020 08:07:14 GMT
Location:
- http://web.archive.org/web/20171124151315id_/https://www.fws.gov/birds/
Server:
- nginx/1.15.8
Server-Timing:
- RedisCDXSource;dur=220.587665, PetaboxLoader3.resolve;dur=21.099726, exclusion.robots.policy;dur=0.321841,
CDXLines.iter;dur=21.136237, LoadShardBlock;dur=1556.284805, captures_list;dur=1886.300549,
PetaboxLoader3.datanode;dur=119.223321, esindex;dur=0.009743, exclusion.robots;dur=0.338801
X-App-Server:
- wwwb-app13
X-Archive-Redirect-Reason:
- found capture at 20171124151315
X-Archive-Screenname:
- '0'
X-Cache-Key:
- httpweb.archive.org/web/20171101000000id_/https://www.fws.gov/birds/US
X-Page-Cache:
- HIT
X-location:
- All
X-ts:
- '302'
status:
code: 302
message: FOUND
version: 1
Loading

0 comments on commit a3372fe

Please sign in to comment.