Skip to content

A data catalog for database tables and columns to track PII and PHI.

License

Notifications You must be signed in to change notification settings

bballamudi/piicatcher

 
 

Repository files navigation

CircleCI codecov PyPI image image

Pii Catcher for Files and Databases

Overview

PiiCatcher is a data catalog and scanner for PII and PHI information. It finds PII data in your databases and file systems and tracks critical data. The data catalog can be used as a foundation to build governance, compliance and security applications.

Check out AWS Glue & Lake Formation Privilege Analyzer for an example of how piicatcher is used in production.

Quick Start

PiiCatcher is available as a command-line application.

To install use pip:

python3 -m venv .env
source .env/bin/activate
pip install piicatcher

# Install Spacy English package
python -m spacy download en_core_web_sm

# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│   schema    │    table    │   column    │   has_pii   │
├─────────────┼─────────────┼─────────────┼─────────────┤
│        main │    full_pii │           a │           1 │
│        main │    full_pii │           b │           1 │
│        main │      no_pii │           a │           0 │
│        main │      no_pii │           b │           0 │
│        main │ partial_pii │           a │           1 │
│        main │ partial_pii │           b │           0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯

Supported Technologies

PiiCatcher supports the following filesystems:

  • POSIX
  • AWS S3 (for files that are part of tables in AWS Glue and AWS Athena)
  • Google Cloud Storage (Coming Soon)
  • ADLS (Coming Soon)

PiiCatcher supports the following databases:

  1. Sqlite3 v3.24.0 or greater
  2. MySQL 5.6 or greater
  3. PostgreSQL 9.4 or greater
  4. AWS Redshift
  5. SQL Server
  6. Oracle
  7. AWS Glue/AWS Athena

Documentation

For advanced usage refer documentation at its website.

Contributing

For Contribution guidelines, refer to developer documentation.

About

A data catalog for database tables and columns to track PII and PHI.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%