diff --git a/docs/source/release-history.rst b/docs/source/release-history.rst index e2594c89..bc98f4ae 100644 --- a/docs/source/release-history.rst +++ b/docs/source/release-history.rst @@ -2,35 +2,32 @@ Release History =============== -In Development --------------- - -Breaking Changes -^^^^^^^^^^^^^^^^ +v0.4.3 (2023-09-26) +------------------- -N/A +This is mainly a compatibility releae: it adds support for urllib3 v2.x and the next upcoming major release of Python, v3.12.0. It also adds support for multiple filters in :meth:`wayback.WaybackClient.search`. There are no breaking changes. Features ^^^^^^^^ -- You can now apply multiple filters to a search by using a list or tuple for the ``filter_field`` parameter of :meth:`wayback.WaybackClient.search`. (:issue:`119`) +You can now apply multiple filters to a search by using a list or tuple for the ``filter_field`` parameter of :meth:`wayback.WaybackClient.search`. (:issue:`119`) - For example, to search for all captures at ``nasa.gov`` with a 404 status and “feature” somewhere in the URL: +For example, to search for all captures at ``nasa.gov`` with a 404 status and “feature” somewhere in the URL: - .. code-block:: python +.. code-block:: python - client.search('nasa.gov/', - match_type='prefix', - filter_field=['statuscode:404', - 'urlkey:.*feature.*']) + client.search('nasa.gov/', + match_type='prefix', + filter_field=['statuscode:404', + 'urlkey:.*feature.*']) Fixes & Maintenance ^^^^^^^^^^^^^^^^^^^ - Add support for Python 3.12.0. (:issue:`123`) -- Add support for urllib3 v2.x. (:issue:`116`) +- Add support for urllib3 v2.x (urllib3 v1.20+ also still works). (:issue:`116`) v0.4.3a1 (2023-09-22) diff --git a/wayback/_client.py b/wayback/_client.py index 216e94fb..3383f53f 100644 --- a/wayback/_client.py +++ b/wayback/_client.py @@ -605,10 +605,23 @@ def search(self, url, *, match_type=None, limit=1000, offset=None, assumed to be in UTC. filter_field : str or list of str or tuple of str, optional A filter or list of filters for any field in the results. Equivalent - to the ``filter`` argument in the CDX API. To apply multiple - filters, use a list of strings instead of a single string. Format: - ``[!]field:regex``, e.g. ``'!statuscode:200'`` to select only - captures with a non-200 status code. + to the ``filter`` argument in the CDX HTTP API. Format: + ``[!]field:regex`` or ``~[!]field:substring``, e.g. + ``'!statuscode:200'`` to select only captures with a non-2xx status + code, or ``'~urlkey:feature'`` to select only captures where the + SURT-formatted URL key has "feature" somewhere in it. + + To apply multiple filters, use a list or tuple of strings instead of + a single string, e.g. ``['statuscode:200', 'urlkey:.*feature.*']``. + + Regexes are matched against the *entire* field value. For example, + ``'statuscode:2'`` will never match, because all ``statuscode`` + values are three characters. Instead, to match all status codes with + a "2" in them, use ``'statuscode:.*2.*'``. Add a ``!`` at before the + field name to negate the match. + + Valid field names are: ``urlkey``, ``timestamp``, ``original``, + ``mimetype``, ``statuscode``, ``digest``, ``length``. collapse : str, optional Collapse consecutive results that match on a given field. (format: `fieldname` or `fieldname:N` -- N is the number of chars to match.)