-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update kingfisher process URLs and use metadata endpoint #343
Conversation
…variables names and log/error messages.
Ah, because of my refactoring we'll also have to do #345 as part of this. |
I have to switch tasks for a bit, so let me know if you want to do that @yolile. |
Sure, If you have pushed all your changes I can take it from here. |
- Collect: Don't try to request a null URL if scrapy_log is not set - Collect: Don't set process_id if not found in crawl log (task fails later, in any case) - Process: Task should not start if process_id is not set - Pelican: pelican_dataset_name should not be set if spider or process_data_version are not set - Pelican: If "id" is in the response, it is guaranteed to not be None Cleanup: - Collect: Clarify variables names - Visually group as: request preparation, request, request parsing, response usage - Don't use "json" as a variable name, as it can shadow the json library (if used) - Rewrite code to avoid potential refactoring errors, like: x = mydict.get("x") if not x: # Oops! Should have set `x = calculate()` and then `mydict["x"] = x`. mydict["x"] = calculate() return x
2e18bae
to
2f3b057
Compare
All pushed now |
Also check for procces_id_pelican and process_data_version before wiping, as they are not set if Kingfisher Collect failed
…n in dependency injections style, without any actual injection). Extract scrapyd_url method.
…run() - Save the pelican_dataset_name to the job context only if the API request succeeded - pelican_dataset_name is guaranteed to be set by the time get_status() runs - Check whether run() was called before attempting to wipe() - Merge get_pelican_dataset_name() into __init__(), for readability
… docs: Update TaskManager class. Add comments about where job context fields are set.
…lection_availability() into get_status().
… task manager with a task. Clarify variable names.
This also sets note="" when a job completes successfully. This handles another case like #168 (i.e. the task recovers in a completed status, not a waiting/running status).
f5f7241
to
96baf3d
Compare
…r Process has started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, should be easier to make changes to this next time
What a refactoring! The code looks more consistent and easier to maintain indeed! |
ref #291
closes #345