Skip to content

A modern Python library for writing maintainable web scrapers.

License

Notifications You must be signed in to change notification settings

openstates/spatula

 
 

Repository files navigation

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

About

A modern Python library for writing maintainable web scrapers.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%