Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data imports #5588

Merged
merged 17 commits into from
Sep 4, 2021
Merged

Data imports #5588

merged 17 commits into from
Sep 4, 2021

Conversation

mekarpeles
Copy link
Member

@mekarpeles mekarpeles commented Aug 28, 2021

This is a followup of #5583 which should be merged first which updates the schema for the import_items table.

Closes first half of #5130

Allows manage-imports.py to add {"data": {...}} records in addition to {"ia_id": "..."}records to the https://openlibrary.org/admin/imports pipeline.

⚠️ Please squash

Testing

tty1

ssh -A ol-home0.us.archive.org
# Monitor the ImportBot container (w/ old code) to make sure it's not
# the one attempting to import records -- we want to manually start another
# `import-all` service from the container directly after we enqueue records
# so we can be confident that our new code is performing the imports:
sudo docker logs -f --tail 100 openlibrary_importbot_1

tty2

# Connect to db
ssh -A ol-db1.us.archive.org
sudo -upostgres psql
# Delete old records you want to re-import, e.g:
delete from import_item where data is NOT NULL;
# Confirm only the records you expect are in the db
select * from import_item where data is NOT NULL;

tty3:

ssh -A ol-home0.us.archive.org
# Connect to the cron container
sudo docker exec -uroot -it openlibrary_cron-jobs_1 bash
# rm the statefile (saving our progress processing & queuing data)
rm  /1/var/tmp/imports/2021-08/Bibliographic/2021-08-04/import.log
# re-run the test.py script to queue up a test batch of imports
python3 /olsystem/bin/bwb_etl/test.py /1/var/tmp/imports/2021-08/Bibliographic/*/bettworldbks*

tty4

ssh -A ol-home0.us.archive.org
# Connect to the ImportBot container
sudo docker exec -it -uroot openlibrary_importbot_1 bash
# Run the continuous import process
scripts/manage-imports.py --config /olsystem/etc/openlibrary.yml import-all

Go back to tty2

# ... Re-run this commend *after* you've enqueued items using test.py
select * from import_item where data is NOT NULL;

Stakeholders

@mekarpeles mekarpeles marked this pull request as ready for review September 3, 2021 15:31
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mek's en route with the changes. Code lgtm after these. We tested doing the import flow with BWB and looks like it's running smoothly!

Might want to do a test that IA is still running smoothly.

Merge at your discretion.

@mekarpeles
Copy link
Member Author

@mekarpeles mekarpeles merged commit cf7d35a into master Sep 4, 2021
@mekarpeles mekarpeles deleted the data-imports branch September 4, 2021 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants