Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple filter? #119

Closed
BilibalaX opened this issue Jul 5, 2023 · 1 comment · Fixed by #127
Closed

Multiple filter? #119

BilibalaX opened this issue Jul 5, 2023 · 1 comment · Fixed by #127
Milestone

Comments

@BilibalaX
Copy link

Is it possible to add multiple filter fields?

@Mr0grog
Copy link
Member

Mr0grog commented Jul 5, 2023

Not currently, but I’d be happy to accept a PR to add it!

There’s actually a big TODO comment in the code about it:

wayback/wayback/_client.py

Lines 639 to 657 in e633e86

# TODO: support args that can be set multiple times: filter, collapse
# Should take input as a sequence and convert to repeat query args
# TODO: Check types
query_args = {'url': url, 'matchType': match_type, 'limit': limit,
'offset': offset, 'from': from_date,
'to': to_date, 'filter': filter_field,
'fastLatest': fast_latest, 'collapse': collapse,
'showResumeKey': True,
'resolveRevisits': resolve_revisits}
query = {}
for key, value in query_args.items():
if value is not None:
if isinstance(value, str):
query[key] = value
elif isinstance(value, date):
query[key] = _utils.format_timestamp(value)
else:
query[key] = str(value).lower()

You need to change the part in lines 650-657 to do different conversions for each option (instead of the same generic conversion for every option). The value should be a list or tuple, which will get serialized correctly as repeated items in the querystring in the actual request.

@Mr0grog Mr0grog added this to the v0.4.3 milestone Sep 22, 2023
@Mr0grog Mr0grog linked a pull request Sep 25, 2023 that will close this issue
Mr0grog added a commit that referenced this issue Sep 25, 2023
The `filter_field` parameter for `WaybackClient.search()` can now be a list or tuple of strings, letting you add multiple filters. For example, to search for all captures at `nasa.gov` with a 404 status and “feature” somewhere in the URL:

```python
client.search('nasa.gov/',
              match_type='prefix',
              from_date=date(2022, 1, 1),
              to_date=date(2022, 2, 1),
              filter_field=['statuscode:404',
                            'urlkey:.*feature.*'])
```

Thanks to @BilibalaX for starting this in #120.

Fixes #119.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants