Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.9.1 - 2024-07-10
- add support for new versions of lxml and Python
0.9.0 - 2022-02-10
- add
Page.accept_response
method that can be overriden to trigger custom retry logic - add preliminary spatula.config for setting/overriding global defaults
(this feature is not yet considered stable, it likely will be modified before 1.0)
0.8.10 - 2022-01-31
- update click dependency
0.8.9 - 2021-12-14
- fix for
--rmdir
not recreating directory
0.8.8 - 2021-12-09
- add
--rmdir
flag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItem
from a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeout
argument to URL source - add
--subpages
argument tospatula test
which runs
similarly tospatula scrape
but writes output to the terminal
0.8.5 - 2021-08-09
- add
verify
argument to URL source - improve messaging when using
spatula test
- add
--dump
flag tospatula scrape
to control output format
0.8.4 - 2021-07-15
self.skip
is deprecated in favor of raisingSkipItem
- add experimental support for module arguments to
scrape
command
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --version
to report correct version - allow
--data
command line flags to overrideexample_input
values - add caching of
dependencies
- fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_items
function - added
Page.do_scrape
to programmatically get all items from a scrape - added
--source
parameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow
- allow using
Page
instances (as opposed to just the type) for scout & scrape - add check for
get_filename
on output classes to override default filename - improved automatic
pydantic
support - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input
- major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scout
command - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage
- improve
Page.logger
and CLI output - move to simpler
Workflow
class spatula scrape
can now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Page
subclasses to
continue scraping - add default behavior when
Page.input
has aurl
attribute. - add
PdfPage
- add
page_to_items
helper - add
Page.example_input
andPage.example_source
for test command - add
Page.logger
for logging - allow use of
dataclasses
in addition toattrs
as input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release