Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul search() signature and docs #94

Merged
merged 9 commits into from
Oct 27, 2022

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented Oct 26, 2022

⚠️ Breaking Changes! ⚠️

The signature for search() was unwieldy, complicated, and included a bunch of parameters that did nothing or that might cause broken behavior and should never be used (e.g. page and pageSize). While adding a default value for limit to resolve #65, I went ahead and cleaned up search():

  • Adds a default value for limit in order to fix a very bad footgun. (See Add a default limit to WaybackClient.search() #65)
  • Adds a lot more detail to the docs, explains special formatting for url and complex considerations for limit.
  • Removes non-functional or breakage-prone parameters that were only included because they were part of the HTTP API. In some cases, they are things this library does automatically for you and aren’t useful to adjust (e.g. gzip), in others you could break things (e.g. page and pageSize), and some are implementation details that users shouldn’t be bothered with (e.g. resumeKey, previous_result).
  • Removes the ability to specify arbitrary extra keyword parameters to be passed directly to the API (there are so many ways to break things here; I argued for this originally so we didn’t have to maintain as much, but it’s just not good).
  • Makes all parameters use snake_case.

Internally, the only real change is that this is now a loop instead of a recursive call. This was required in order to not expose internal details as parameters, but is also probably better for call stack and memory management on large queries.

Fixes #65.

@edsu are these API changes going to be OK for you? I’ll make sure this is released as 0.4.0 rather than 0.3.x. If you have any feedback on the docs changes, too, I’d appreciate it (you can preview the build docs from the “checks” at the bottom of the PR).

I might also try to include a first pass at #8 before releasing.

It turns out that the pagination strategy we use with Wayback's CDX search does not work correctly if you don't set a `limit`, so this change sets one by default. However, there is no value that's always safe, so this needs to be used with some care. A limit that's too low may mean you never get any results even when you should, and a limit that's too high may get ignored (or cause other unknown issues). The right values really depend on how frequently you expect the Internet Archive to have captured versions of the URLs you are looking for. :(

Fixes #65.
Also use a loop instead of recursion for pagination.
We originally used camelCase when the name and functionality were the same as a parameter in the HTTP API. It was intended to make things clear and to support arbitrarily specifying additional keyword arguments that would get sent in the request. However, we no longer support additional, arbitrary keyword arguments, and the mixed case created more inconvenience and mess than help. This makes all parameters use the same snake_case scheme.
@Mr0grog
Copy link
Member Author

Mr0grog commented Oct 26, 2022

Note: the size reduction here is mostly because recorded responses for tests are gzipped, which was not the case before. (I think what happened is that the CDX server now respects the Accept-Encoding header, which we have always sent, instead of only using the gzip query parameter, which we do not send.)

Copy link
Contributor

@edsu edsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these changes to simplify usage and guide users down the happy path -- also the snake_case changes are most welcome.

@Mr0grog Mr0grog merged commit bce65fd into main Oct 27, 2022
@Mr0grog Mr0grog deleted the 65-search-requires-limit-but-limit-is-complicated branch October 27, 2022 22:30
Mr0grog added a commit that referenced this pull request Nov 10, 2022
In #94 and #102, we renamed several method parameters. This brings back the old names, but with deprecation warnings. We'll remove them fully in v0.5.0. (This does *not* support parameters that were completely *removed*, though. I did a search on GitHub, and it doesn't look like anybody is actually using them, so it didn't seem worth keeping support for them.)
Mr0grog added a commit that referenced this pull request Nov 10, 2022
In #94 and #102, we renamed several method parameters. This brings back the old names, but with deprecation warnings. We'll remove them fully in v0.5.0. This does *not* support parameters that were completely *removed*, though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a default limit to WaybackClient.search()
2 participants