Check Kingfisher Collect log file #44

duncandewhurst · 2022-04-19T21:36:11Z

Related to open-contracting/kingfisher-collect#917 (comment)

review-notebook-app · 2022-04-19T21:36:15Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

duncandewhurst · 2022-04-19T21:44:46Z

@jpmckinney anything else I should be using from scrapyloganalyzer here? Currently, I'm just using ScrapyLogFile.logparser.

jpmckinney · 2022-04-19T21:58:12Z

You probably want to use:

is_finished()
is_complete()
error_rate
item_counts for 'File' and 'FileError'

Note that error_rate is just the ratio of FileError items to File + FileError items.

I don't think you need to use logparser itself.

duncandewhurst · 2022-04-21T23:19:38Z

From open-contracting/kingfisher-collect#917 (comment):

Kingfisher Collect (unless there's a bug in a spider) always yields a FileError item if any URL ultimately fails (retries don't yield, but intermediary URLs do). Kingfisher Process stores FileErrors in the DB. So, they should have appeared in the DB.

As such, let's hold fire on this PR until the new version of Kingfisher Process has been deployed, after which we'll be able to get the log URL from the database. Otherwise, I don't think looking at error_rate and item_counts for 'File' and 'FileError' adds anything over the existing step of looking in collection_file.

jpmckinney · 2022-04-21T23:24:11Z

This PR's process is more robust, but Process should be storing the errors, yes. It's possible that the old version of the spider wasn't yielding FileErrors correctly. The current version does.

jpmckinney · 2023-07-04T16:28:11Z

~~Noting that the current file was deleted, but this PR could still be useful.~~ Nevermind, the file is restored.

jpmckinney · 2023-07-04T19:45:53Z

I reverted the pre-commit commits, to avoid merge conflicts, until this PR is ready.

duncandewhurst added 2 commits April 20, 2022 09:33

setup_environment: collect log file url and install packages

5e6259a

check_for_errors: check scrapy log file

231df67

jpmckinney force-pushed the log-file branch from 389e070 to fef94be Compare July 4, 2023 19:44

setup_environment: import getpass before use, correct syntax error

dd5fb49

jpmckinney force-pushed the log-file branch from fef94be to dd5fb49 Compare July 4, 2023 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check Kingfisher Collect log file #44

Check Kingfisher Collect log file #44

duncandewhurst commented Apr 19, 2022

review-notebook-app bot commented Apr 19, 2022

duncandewhurst commented Apr 19, 2022

jpmckinney commented Apr 19, 2022 •

edited

Loading

duncandewhurst commented Apr 21, 2022

jpmckinney commented Apr 21, 2022

jpmckinney commented Jul 4, 2023 •

edited

Loading

jpmckinney commented Jul 4, 2023

Check Kingfisher Collect log file #44

Are you sure you want to change the base?

Check Kingfisher Collect log file #44

Conversation

duncandewhurst commented Apr 19, 2022

review-notebook-app bot commented Apr 19, 2022

duncandewhurst commented Apr 19, 2022

jpmckinney commented Apr 19, 2022 • edited Loading

duncandewhurst commented Apr 21, 2022

jpmckinney commented Apr 21, 2022

jpmckinney commented Jul 4, 2023 • edited Loading

jpmckinney commented Jul 4, 2023

jpmckinney commented Apr 19, 2022 •

edited

Loading

jpmckinney commented Jul 4, 2023 •

edited

Loading