Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Kingfisher Collect log file #44

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Check Kingfisher Collect log file #44

wants to merge 3 commits into from

Conversation

duncandewhurst
Copy link
Contributor

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@duncandewhurst
Copy link
Contributor Author

@jpmckinney anything else I should be using from scrapyloganalyzer here? Currently, I'm just using ScrapyLogFile.logparser.

@jpmckinney
Copy link
Member

jpmckinney commented Apr 19, 2022

You probably want to use:

Note that error_rate is just the ratio of FileError items to File + FileError items.

I don't think you need to use logparser itself.

@duncandewhurst
Copy link
Contributor Author

From open-contracting/kingfisher-collect#917 (comment):

Kingfisher Collect (unless there's a bug in a spider) always yields a FileError item if any URL ultimately fails (retries don't yield, but intermediary URLs do). Kingfisher Process stores FileErrors in the DB. So, they should have appeared in the DB.

As such, let's hold fire on this PR until the new version of Kingfisher Process has been deployed, after which we'll be able to get the log URL from the database. Otherwise, I don't think looking at error_rate and item_counts for 'File' and 'FileError' adds anything over the existing step of looking in collection_file.

@jpmckinney
Copy link
Member

This PR's process is more robust, but Process should be storing the errors, yes. It's possible that the old version of the spider wasn't yielding FileErrors correctly. The current version does.

@jpmckinney
Copy link
Member

jpmckinney commented Jul 4, 2023

Noting that the current file was deleted, but this PR could still be useful. Nevermind, the file is restored.

@jpmckinney
Copy link
Member

I reverted the pre-commit commits, to avoid merge conflicts, until this PR is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants