Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a high level diagram of the data validation process #166

Open
jwestw opened this issue Jan 9, 2025 · 2 comments
Open

Create a high level diagram of the data validation process #166

jwestw opened this issue Jan 9, 2025 · 2 comments
Labels
data_validator Tasks for the RDSA data validator documentation Improvements or additions to documentation

Comments

@jwestw
Copy link
Contributor

jwestw commented Jan 9, 2025

Task: Create the diagram and insert it into the correct place (maybe creating a new header) in docs\validation_schema_toml.md

The data validator, its use of schemas, and toml/schema validator that validates those schemas and has its own config file (also a Toml) is a potentially a bit confusing. Create a diagram to show the validation process at a high level and at what points

Make a process diagram for the validation of a data source "example_survey_results.csv"

Process

  • the schema, example_survey_results_schema.toml is loaded
  • the config config_validator_config.toml is loaded
  • toml_schema_validator.py validates the schema
    • errors are logged
    • pipeline can optionally be stopped
  • data_validation uses the validated schema to create a suite of "Expectations"
  • data_validation loads the data from "example_survey_results.csv"
  • example_survey_results_df is created
  • the dataframe is validated against the expectation suite
  • a validation report is produced and the pipeline is optionally stopped

Microsoft Whiteboard can be used to create the diagram

This create a key/legend like this:

image

Feel free to copy from here: https://officenationalstatistics-my.sharepoint.com/:wb:/g/personal/alex_westwood_ons_gov_uk/EaotJkUNLjZMh5zdnEII7SkBZqvxkvLaN8NHktv7G-Rmug?e=okCDkD

@jwestw jwestw added documentation Improvements or additions to documentation data_validator Tasks for the RDSA data validator labels Jan 9, 2025
@jwestw
Copy link
Contributor Author

jwestw commented Jan 10, 2025

Steps for the developer:

  • Create branch create_high_level_diagram
  • Create diagram in Whiteboard
  • Save diagram in folder docs/img (create folder img)
  • Insert diagram into docs/validation_schema_toml.md file so that image displays (link to image file will be relative)
  • commit and push your work to create_high_level_diagram
  • check that it is viewable and working in Github via the browser
  • create a pull request

@shauncommee
Copy link
Collaborator

  • Change input csv and input list to one key
  • Change it so that config and schema are seperate to CSV
  • Create data object validated schema after validation process
  • data object from validator is an error log of the toml_schema_validator
  • put the module (uses the schema to validate the data)
  • Change to continue pipeline or stop pipeline at the end
  • Implement changes to change log and don't refer to first person

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data_validator Tasks for the RDSA data validator documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants