You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working on a Rust version of OCDS Merge. If it performs well, then we could maybe change our analysis process to:
Run OCDS Merge directly on a Scrapy data directory. If an analyst wants to use data before the crawl is complete, that's fine – OCDS Merge can run on the closed files (lsof). Noting that merging should be done before upgrading.
We don't need to upgrade many collections. If there is one that is too large to complete quickly with ocdskit, I can write a Rust version.
Add a SQL loader command (remember to replace control codes), so that Kingfisher Summarize can still work. We can probably simplify Summarize, as not sure how frequently we analyze release/record collections.
We might prefer analysts to load data into separate tables (e.g. into a schema under their own name). This makes it very easy to clean up old data. That'll require changes to Summarize (mostly release_*.sql and JOIN data).
libcoveocds is too slow to run in sequence on an entire dataset. We can instead run a fast JSON Schema validator, and only run libcoveocds' other checks using sampling.
For all the above, the instructions in the documentation for data support managers should redirect output to files, for easier review of warnings. The instructions could maybe be organized into a Makefile.
Edit: Moving comment from #402 (comment)
I've been working on a Rust version of OCDS Merge. If it performs well, then we could maybe change our analysis process to:
lsof
). Noting that merging should be done before upgrading.release_*.sql
andJOIN data
).For all the above, the instructions in the documentation for data support managers should redirect output to files, for easier review of warnings. The instructions could maybe be organized into a Makefile.
If the above changes don't go ahead, then work on https://github.com/open-contracting/kingfisher-process/milestone/7
Note: There is a similar issue for the registry at open-contracting/data-registry#292
The text was updated successfully, but these errors were encountered: