Overview

spatula is a modern Python library for writing maintainable web scrapers.

Features

Page-oriented design: Encourages writing understandable & maintainable scrapers.
Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
Fully Typed: Makes full use of Python 3 type annotations.