`output_checker`

output_checker is a tool that allows analysts to check if their code is ok to export

End Goal

Check some repo to ensure it's ok to remove from a secure environment
- flag any large files
  - function to calculate file size
- flag any long files
  - function to calculate file length
- flag any CSVs, or other known data files
- flag any images, artifacts etc.
- scan files for hardcoded datastructures, e.g. embedded data, tokens
- scan files for "entities" e.g. names, or other identifiers
- scan notebooks to ensure they have had outputs cleared
Check "outputs" to ensure they are ok to remove from a secure environment
- make sure "data" outputs, e.g. CSV/parquet, etc. follow disclosure control rules
- scan output files for entities
- flag any long files
- flag any large files

This will work by:

pointing a checking function at a given directory and indicating if it contains code or data files
- function which scans all files in a directory and applies the necessary checks
This will return a table detailing each file in which there were problems, and which problems were found
- functionality which returns all failures in a tabular form

This repo will also contain test cases to show it works correctly. E.g.

example code files with and without harded datastructures, entities, etc.
example data files with and without entities, and which pass and fail disclosure control etc.
For large and long files - we need functions which can generate these on the fly

Components

This will contain functions to do a few different checks for both data outputs and code.

data outputs:

Statistical Disclosure Control

code outputs:

Initially, we need to make simple functions which address the above, then we can build other functions which will apply them multiple files.

A core part of the use-case is that there is that people can be alerted what issues there are, in which files.

It does this by running a few simple checks. The envisaged workflow is:

graph LR
    A[Statistical Disclosure Control] --> B{Values below a threshold?};
    A --> C{Values not rounded?};
    B -->|Pass| D[Ok to output];
    B -->|Fail| E[Identifies problems];
    C -->|Pass| D[Ok to output];
    C -->|Fail| E[Identifies problems];

    click A "https://github.com/SamHollings/output_checker/tree/main/src/disclosure_control_check" "Disclosure Control code" _blank

Getting started

To start using this project, first make sure your system meets its requirements.

Requirements

Contributors have some additional requirements!

Python 3.6.1+ installed
a .secrets file with the required secrets and credentials
load environment variables from .env

To install the Python requirements, open your terminal and enter:

pip install -r requirements.txt

Required secrets and credentials

To run this project, you need a .secrets file with secrets/credentials as environmental variables. The secrets/credentials should have the following environment variable name(s):

Secret/credential	Environment variable name	Description
Secret 1	`SECRET_VARIABLE_1`	Plain English description of Secret 1.
Credential 1	`CREDENTIAL_VARIABLE_1`	Plain English description of Credential 1.

Once you've added, load these environment variables using .env.

Licence

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation. The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Contributing

If you want to help us build, and improve output_checker, view our contributing guidelines.

Acknowledgements

This project structure is based on the govcookiecutter template project.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
data		data
docs		docs
notebooks		notebooks
outputs		outputs
src		src
tests		tests
.env		.env
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conftest.py		conftest.py
make.bat		make.bat
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`output_checker`

End Goal

Components

Getting started

Requirements

Required secrets and credentials

Licence

Contributing

Acknowledgements

About

Releases 2

Packages

Contributors 4

Languages

License

SamHollings/output_checker

Folders and files

Latest commit

History

Repository files navigation

output_checker

End Goal

Components

Getting started

Requirements

Required secrets and credentials

Licence

Contributing

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

`output_checker`

Packages