Parts API with data scrapper

The purpose of this API is to expose alle the data that were available to scrape from https://www.urparts.com/index.cfm/page/catalogue/.

Dependency management

We're using uv to manage our dependencies.

How to use

There is a Make file with a couple of handy recipes. Let's go through them:

make format reformats whole codebase using ruff;
make check does static code analysis using ruff & mypy;
make test runs test suite for whole application;
make build builds a Docker image for the application;
make migrate applies alembic migrations to the database;
make scrape runs the data scraping process;
make up runs the API;
make down puts down all Docker containers.

Scraper

Solution

We're using aiohttp for requesting all the pages in parts catalogue, then we use BeautifulSoup4 to parse the data of interest.
Everything is being run in an asynchronous context, although not every possible aspect is parallelized.
Each manufacturer is parsed imperatively (one by one), and then, the categories, models & parts are parsed in parallel.

There is a semaphore used to prevent abusing the website too much (it was raising Timeouts when).

Performance

Between 6 - 7 minutes for 4.4M parts (and much fewer manufacturers, categories & models) on 2021 M1 Pro.

Resources

The task is CPU-bound as of now, so it will consume a lot of CPU. Because of the parallelism it's not so IO-bound anymore.

API

OpenAPI specification

The OpenAPI specification can be found here: http://localhost:8080/docs after starting the application.

Structure

Overall structure of the repository & API is based on domains. Each domain has its own category & directory.

For the simplicity of the solution - service/controller layer wasn't introduced.

Tests

For demonstration purposes, there's just one test written.
It won't work if you have any category in the database (there's no schema separation).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
_docker		_docker
_migrations		_migrations
parts_api		parts_api
tests		tests
.env		.env
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parts API with data scrapper

Dependency management

How to use

Scraper

Solution

Performance

Resources

API

OpenAPI specification

Structure

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mdczaplicki/parts_api

Folders and files

Latest commit

History

Repository files navigation

Parts API with data scrapper

Dependency management

How to use

Scraper

Solution

Performance

Resources

API

OpenAPI specification

Structure

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages