Work In Progress

Scraping Pokémon Database using the Playwright library combined with asynchronous Python.

Todo

Poetry Bash scripts

Think of something elegant for the following:

First run
- poetry install
- poetry run playwright install firefox
Scrape
- poetry run scrape
Query
- poetry run query

Docstrings

Write docstrings for all classes, functions, etc.

Logging

Add logging calls to all operations to log in the console, and logging dumps.

Pokédex detail page: `feature-scraper`

Example Bulbasaur

Data dumps: `feature-scraper`

Find a way to properly tie the data together.
- Such that everything is properly grouped by e.g. Pokémon.

NOSQL database: `feature-db-nosql`

Create document table for Pokémon details.
- Insert data into it from the coroutines.
Create document tables for other document objects.

Relational database: `feature-db-sql`

Set up initial data models.
Finish and validate the data models.
Create CRUD methods in the classes.
Insert data into the models from the coroutines.
Optimize the models
- Convert to proper data types, instead of using TextField.
- Set appropriate contrains (null, unique, etc.)
- Create appropriate indices for the tables.

Bugs

Pokédex detail page: `feature-scraper`

Example Pikachu

Data below the table Base stats does not get scraped.
- This happens when the table does not occur in the expected nth position.
- Suggestion: Find tables in relation to position of the header (e.g. Base stats), in order to properly determine its location.

Concurrent Pokémon details: `feature-concurrent-details`

This branch still needs major work done.

Fix timeout issue
- Currently, the asyncio.gather() call for the Pokémon detail concurrent batch scraping causes a timeout.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
docs		docs
scraping_pokemon		scraping_pokemon
.gitignore		.gitignore
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
query.sh		query.sh
scrape.sh		scrape.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Work In Progress

Todo

Poetry Bash scripts

Docstrings

Logging

Pokédex detail page: `feature-scraper`

Data dumps: `feature-scraper`

NOSQL database: `feature-db-nosql`

Relational database: `feature-db-sql`

Bugs

Pokédex detail page: `feature-scraper`

Concurrent Pokémon details: `feature-concurrent-details`

About

Uh oh!

Uh oh!

Languages

LPvdT/scraping-pokemon

Folders and files

Latest commit

History

Repository files navigation

Work In Progress

Todo

Poetry Bash scripts

Docstrings

Logging

Pokédex detail page: feature-scraper

Data dumps: feature-scraper

NOSQL database: feature-db-nosql

Relational database: feature-db-sql

Bugs

Pokédex detail page: feature-scraper

Concurrent Pokémon details: feature-concurrent-details

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Pokédex detail page: `feature-scraper`

Data dumps: `feature-scraper`

NOSQL database: `feature-db-nosql`

Relational database: `feature-db-sql`

Pokédex detail page: `feature-scraper`

Concurrent Pokémon details: `feature-concurrent-details`