Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Privacy Preserving Learning #3334

Merged

Conversation

manavsinghal157
Copy link
Contributor

@manavsinghal157 manavsinghal157 commented Sep 20, 2021

Part of the Empirical Analysis of Privacy Preserving Learning Project.

This PR introduces a command line argument that implements aggregated learning by saving only those features that have seen a minimum threshold of users thus upholding the privacy of the user.

Methodology:

  • For each feature, a 32-bit vector is defined. (vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h)
  • We calculate a 5-bit hash of the tag of the example. (vowpalwabbit/parser.cc)
  • For each feature weight updated by a non-zero value, we use the 5-bit hash to look up a bit in the 32-bit vector and set it to 1.(vowpalwabbit/gd_predict.h -> (vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h))
  • When saving the weights into a file, we calculate the number of bits set to 1 for a feature. If it is greater than the threshold, the weights for that feature are saved. (vowpalwabbit/gd.cc->(vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h))

(The default value of the threshold is 10)

This PR includes:

  • Command line argument to activate privacy preservation and set the threshold. (vowpalwabbit/parse_args.cc)
  • Runtests to test the desired output on a small dataset. (test/core.vwtest.json)
  • Unit-tests for checking output when threshold is reached for a feature and when it is not. (test/unit_test/weights_test.cc)
  • Benchmarks to test time taken for learning in privacy preserving method. (test/benchmarks/standalone/benchmark_text_input.cc )

Implementation details:

--privacy_activation : To activate the feature
--privacy_activation_threshold arg (=10) : To set the threshold

Future Work:

  • Implement the feature for save_resume.
  • Work on aggregations in the online setting.

Wiki page for the same : https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Privacy-Preserving-Learning

…tency and removed is_activated in gd.cc to pass checks
…g-Learning

Patch for Privacy Preserving Learning
…g-Learning

Patch_for_privacy_preserving_learning #2
…g-Learning

Command line argument for privacy preserving learning
…g-Learning

Calculating tag_hash in parser.cc && RunTests
…g-Learning

Benchmarks for Privacy Preserving Learning
Removed extra } line 802
@olgavrou olgavrou closed this Nov 23, 2021
@olgavrou olgavrou reopened this Nov 29, 2021
@olgavrou olgavrou changed the title [wip] please ignore, running benchmarks feat: Privacy Preserving Learning Nov 29, 2021
@olgavrou olgavrou marked this pull request as ready for review November 29, 2021 21:58
@olgavrou olgavrou added this to the VW 9.0 milestone Nov 29, 2021
@olgavrou olgavrou merged commit f0e16ad into VowpalWabbit:master Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants