Skip to content

kiprotect/data-privacy-for-data-scientists

Repository files navigation

Data Privacy for Data Scientists

A workshop on data privacy methods for data scientists.

This workshop will be presented as part of EuroPython 2018.

Motivation

As data and information security become core components of managing user data, data scientists are keen to expand their knowledge and skills relating to data privacy and security basics. As of May 2018, the European General Data Protection Regulation affects how European residents can access and grant consent to use their data. As European data scientists, we now have an obligation as well as distinct motivation, to practice data science with attention to data privacy.

In this workshop, we will introduce some of the basics in terms of defining privacy within the realm of data collection, modeling and machine learning. A focus on practical knowledge and code, we will cover how one can implement some of these algorithms with Python. Students will be presented with these theories along with recent research on privacy-preserving models, so they can leave with a better understanding of how to apply privacy principles to data science in their work and study.

Installation

Please utilize the included requirements.txt to install your requirements using pip (you can also do so in conda. The notebooks have only been tested with Python 3. 🙌🏻

We recommend using virtual environments or conda environments.

Outline

Agenda

  • Introduction and Motivation
  • Pseudonymization
  • K-Anonymity
  • Differential Privacy
  • Case Study
  • Wrap-Up and Q&A

Recommended Reading

Each notebook has its own section of recommended reading. We may update this README with additional reading of interest on this topic.

Questions?

Questions about getting set up or the content covered in the workshop? Feel free to reach out via email at: info /at/ kiprotect (d o t) com