Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info about link header relationships to Memento #57

Closed
Mr0grog opened this issue Oct 21, 2020 · 0 comments · Fixed by #108
Closed

Add info about link header relationships to Memento #57

Mr0grog opened this issue Oct 21, 2020 · 0 comments · Fixed by #108
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Oct 21, 2020

Wayback Mementos carry additional info about related resources in the Link header. For example, here’s the header from https://web.archive.org/web/20171124151315id_/https://www.fws.gov/birds/:

<https://www.fws.gov/birds/>; rel="original",
<http://web.archive.org/web/timemap/link/https://www.fws.gov/birds/>; rel="timemap"; type="application/link-format",
<http://web.archive.org/web/https://www.fws.gov/birds/>; rel="timegate",
<http://web.archive.org/web/20050323155300/http://www.fws.gov:80/birds>; rel="first memento"; datetime="Wed, 23 Mar 2005 15:53:00 GMT",
<http://web.archive.org/web/20170929002712/https://www.fws.gov/birds/>; rel="prev memento"; datetime="Fri, 29 Sep 2017 00:27:12 GMT",
<http://web.archive.org/web/20171124151315/https://www.fws.gov/birds/>; rel="memento"; datetime="Fri, 24 Nov 2017 15:13:15 GMT",
<http://web.archive.org/web/20171228222143/https://www.fws.gov/birds/>; rel="next memento"; datetime="Thu, 28 Dec 2017 22:21:43 GMT",
<http://web.archive.org/web/20201011123440/http://www.fws.gov/birds>; rel="last memento"; datetime="Sun, 11 Oct 2020 12:34:40 GMT"

(Line breaks added for clarity.)

This follows the standard format for the Link header. The most accessible docs are at MDN.

It would probably be nice to surface this information in the Memento class in some useful way. We originally planned to do this in #2, but it wasn’t critical and there were enough open questions and options that it seemed worth waiting on coming up with a better design for.

Some possibilities:

  • Simply parse the data generically, like the Requests package does:

    memento.links = {
        'original': {
            'url': 'https://www.fws.gov/birds/',
            'rel': 'original'
        },
        'timemap': {
            'url': 'http://web.archive.org/web/timemap/link/https://www.fws.gov/birds/',
            'rel': 'timemap',
            'type': 'application/link-format'
        },
        'timegate': {
            'url': 'http://web.archive.org/web/https://www.fws.gov/birds/',
            'rel': 'timegate'
        },
        'first memento': {
            'url': 'http://web.archive.org/web/20050323155300/http://www.fws.gov:80/birds',
            'rel': 'first memento',
            'datetime': datetime(2005, 3, 23, 15, 53, tzinfo=timezone.utc)
        },
        'prev memento': {
            'url': 'http://web.archive.org/web/20170929002712/https://www.fws.gov/birds/',
            'rel': 'prev memento',
            'datetime': datetime(2017, 9, 29, 0, 27, 12, tzinfo=timezone.utc)
        },
        'memento': {
            'url': 'http://web.archive.org/web/20171124151315/https://www.fws.gov/birds/',
            'rel': 'memento',
            'datetime': datetime(2017, 11, 24, 15, 13, 15, tzinfo=timezone.utc)
        },
        'next memento': {
            'url': 'http://web.archive.org/web/20171228222143/https://www.fws.gov/birds/',
            'rel': 'next memento',
            'datetime': datetime(2017, 12, 28, 22, 21, 43, tzinfo=timezone.utc)
        },
        'last memento': {
            'url': 'http://web.archive.org/web/20201011123440/http://www.fws.gov/birds',
            'rel': 'last memento',
            'datetime': datetime(2020, 10, 11, 12, 34, 40, tzinfo=timezone.utc)
        }
    }
  • Since the original and memento relationships are redundant (all that info is already on Memento), we could drop them.

  • We could make these a more special type than a dict so they can be passed to get_memento(), e.g:

    get_memento(memento.links['next memento'])
  • We could add get_next_memento(), etc. as a shortcut to get_memento(), e.g:

    memento.get_next_memento()
    # Same as:
    get_memento(memento.links['next memento']['url'],
                memento.links['next memento']['datetime'],
                memento.mode)
  • Since these are known, predictable links, we could add attributes directly to Memento instead of Memento.links:

    memento.first_memento = {...}
    memento.previous_memento = {...}
    memento.next_memento = {...}
    memento.last_memento = {...}

Another thing to consider is that we don’t currently have any special support for timemaps or timegates, so we can’t do anything special for those links. Basic dict parsing would probably be the lowest common denominator here. It’s not very special, but lets us treat everything the same.

Lots of ways we could go here. ¯\_(ツ)_/¯

@Mr0grog Mr0grog added enhancement New feature or request question Further information is requested labels Oct 21, 2020
@Mr0grog Mr0grog added this to the v0.4.x milestone Nov 10, 2022
Mr0grog added a commit that referenced this issue Nov 14, 2022
@Mr0grog Mr0grog linked a pull request Nov 14, 2022 that will close this issue
Mr0grog added a commit that referenced this issue Nov 14, 2022
This fixes an issue where the `Memento.url` property could be slightly incorrect, since it was based on the URL you requested the memento from (e.g. `https://web.archive.org/web/20221010000000/<url>`), rather than the actual URL the memento was captured from. The URL the memento is requested from matches records via SURT key rather than the URL.

For example, requesting an archived copy of `http://fws.gov/` might return a capture of `https://www.fws.gov/` instead. The returned Memento object’s `url` property used to be `http://fws.gov/` in this case, but this changes it to be `https://www.fws.gov/`.

Since this required checking the `links` header, I also went ahead and made the parsed `links` data available on `Memento`.

Fixes #57.
Fixes #99.
@Mr0grog Mr0grog modified the milestones: v0.5.x, v0.4.x Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant