Data Review Tool rewrite #223

jpmckinney · 2024-10-21T20:27:07Z

Create new repositories for non-CoVE version, e.g. plover and plover_web (still Django)

Web frontend (templates and text)

(Quick fix) Adjust the header and footer to match the Data Registry or Spoonbill
Update error messages
Implement new designs and requirements (folder)

Web backend

Use Celery (with Redis queue) for background tasks
Implement async upload and async processing
Don't use the filesystem as the cache
- Once we cache validation results, we no longer need to write/cache the metatab, conversion warnings, extended schema, cell_source_map, heading_source_map.
- See Unflatten in Kingfisher Collect to get unflatten results from temporary directory.
- We do want to write the original file and the converted file to the media directory, as it is helpful to users and analysts to download the file (especially if they did not upload it, lost track of it, or if the data at the URL has changed).
Use Django page cache (with Redis, since it's already installed)

Library (libcoveocds)

Extract relevant logic from lib-cove (only common.py is very relevant)
Simplify and refactor the extracted code
Stop writing to validation_errors-3.json and remove corresponding logic from web backend
Remove keys from output/context that are unused (check what other projects reads this library's output)
~~Remove JSON serializing of errors (originates in lib-cove)~~ [This is needed to aggregate similar error]
Try switching to jsonschema-rs Try substituting jsonschema-rs for jsonschema lib-cove-ocds#123
Ask ODS about dropping AGPL in our code

Learning

While everything is fresh, read latest JSON Schema to see if anything can be simplified by adopting new versions

The text was updated successfully, but these errors were encountered:

jpmckinney · 2024-10-22T14:53:13Z

Ideas for new checks from internal discussions, that better solve for what data support managers find useful in the current Key Field Information:

“who bought what from whom, for how much, when and how”

Unlike other checks, it is useful to report on the details of these checks even when they pass.

It is also useful to report (similar to Pelican):

number of contracting processes
stages covered
date ranges: at least release date and tender period, but ideally also awards' date and and contracts' period

Plus, if possible (Slack):

tag counts
party role counts

An important design caveat is that users are not uploading full datasets or representative samples. The checks need to make sense even for a sample. We'll need to word any messages carefully. e.g. "the sample doesn't contain awards" not "your dataset doesn't contain awards" or something.

In general, along the lines of the earlier user research, we need the DRT to be useful, interpretable and actionable for OCDS implementers. If something is needed for data support, we might prefer to implement it as a notebook. (Of course, it is more convenient for team members to not load another tab.)

jpmckinney · 2024-12-05T13:50:28Z

See also open-contracting/ocds-extensions#128 about ensuring that oneOf reports subschema errors correctly, for more than just the oneOf used for embedded vs linked releases.

jpmckinney · 2024-12-21T23:45:50Z

I am using the jsonschema library wherever possible, instead of writing separate checks. Storing old code for checks here, in case useful in future:

OCID_PREFIX_RE = re.compile(r"^ocds-[a-z0-9]{6}")


def ocid_prefix_format(data_paths):
    values = [
        (value, "/".join(map(str, full_path)))
        for path in (
            ("releases", "ocid"),
            ("records", "ocid"),
            ("records", "releases", "ocid"),
            ("records", "compiledRelease", "ocid"),
        )
        if (full_paths := data_paths.get(path))
        for full_path, value in full_paths.items()
        if isinstance(value, str) and not OCID_PREFIX_RE.match(value)
    ]

    if values:
        return {"conformance_errors": {"ocds_prefixes_bad_format": values}}
    return {}

jpmckinney mentioned this issue Oct 31, 2024

Pelican integration open-contracting/kingfisher-process#441

Open

duncandewhurst mentioned this issue Dec 5, 2024

Lots: Error when maximumLotsBidPerSupplier is set to infinity (1e9999) open-contracting/ocds-extensions#128

Open

jpmckinney mentioned this issue Dec 17, 2024

Add links to the docs in OCDS schema validation messages #103

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Review Tool rewrite #223

Data Review Tool rewrite #223

jpmckinney commented Oct 21, 2024 •

edited

Loading

jpmckinney commented Oct 22, 2024 •

edited

Loading

jpmckinney commented Dec 5, 2024

jpmckinney commented Dec 21, 2024

Data Review Tool rewrite #223

Data Review Tool rewrite #223

Comments

jpmckinney commented Oct 21, 2024 • edited Loading

Web frontend (templates and text)

Web backend

Library (libcoveocds)

Learning

jpmckinney commented Oct 22, 2024 • edited Loading

jpmckinney commented Dec 5, 2024

jpmckinney commented Dec 21, 2024

jpmckinney commented Oct 21, 2024 •

edited

Loading

jpmckinney commented Oct 22, 2024 •

edited

Loading