-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PBKDF2 for blind indexing #67
Add support for PBKDF2 for blind indexing #67
Conversation
For very sensitive information (Social Security numbers, tax identification numbers, or other ostensibly unique numbers assigned to people), it is a good idea to use a key stretching algorithm for the blind index to make it harder to brute force the hashed values. PBKDF2 is a key stretching algorithm that accomplishes this.
ca3e32e
to
8699e64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a third blind indexing algorithm (alongside SHA256 and HMAC). I was wondering if you would be open to improving the support for these. The things that are missing when compared to the encrypted fields are:
- Easy key rotation.
- Easy configuration for multiple ciphers (or, in this case MACs) to toggle between (i.e. your Vault concept).
- The documentation is slightly better for the encrypted fields than it is for the index fields.
What do you think of that idea? With that change, we could more easily add support for other algorithms like scrypt and Argon2.
Also, would you be open to a PR that updates the documentation to use the term "blind index" instead of "hash"? It's a term that is used in a Ruby gem, this blog post, and this one. With that, I'd like to suggest the examples change from the form of email_hash
to email_bidx
.
@@ -33,6 +33,7 @@ defmodule Cloak.Mixfile do | |||
[ | |||
{:ecto, ">= 1.0.0"}, | |||
{:flow, "~> 0.13.0"}, | |||
{:pbkdf2, "~> 2.0", optional: true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of ease and "let's use something that has been tested before", I went with adding this Erlang library as an optional dependency. Are you okay with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to be careful about adding core dependencies to the library, but this would be fine as an optional dependency, yes.
lib/cloak/fields/pbkdf2.ex
Outdated
sha384 | ||
sha512 | ||
]a | ||
@sizes [16, 32, 64, 128] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting the size of the blind index to specific byte lengths is an optimization for space within the database. I like the idea of same-size index keys. What do you think about this? There is also a version of the function that doesn't have a specified derived key length, so we can remove this easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support the :size
option, but I don't see a reason to limit the options. Let people choose whatever length they prefer.
Thanks for this PR! I'll do a line-by-line on the PR shortly, but here are my overall thoughts on the questions you asked:
This is the first time I've ever heard the term, actually. "Hash" is a much more commonly used and understood term, and it's even used in one of the blog posts you cited:
The term "blind index" refers to the indexed DB column itself, not the contents of the column. Ecto fields are typically named after the content they contain, not the indexing technique they enable. (For example,
It's difficult to rotate keys when you no longer have the original information. HMAC, SHA256, and PBKDF2 are all irreversible, so you could not use the technique for key rotation that we use for AES. You cannot "decrypt" the fields and then "encrypt" them with a new key. I am not aware of a technique that can rotate keys for irreversibly encrypted fields which Ecto would support. Do you have any suggestions?
The purpose of multiple ciphers for the reversible encryption fields is to facilitate decrypting old data and migrating to a new key. This is irrelevant to irreversible algorithms such as PBKDF2. The purpose of these fields is for querying, which assumes that only one key has been used in the column that you are querying. If you want to have multiple PBKDF2 keys active in the app at the same time, you can create multiple field modules in your app, each with a different key.
Fair point! I'm accepting PRs! |
lib/cloak/fields/pbkdf2.ex
Outdated
@@ -0,0 +1,193 @@ | |||
defmodule Cloak.Fields.PBKDF2 do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module should be wrapped in if Code.ensure_loaded?(PBKDF2)
. If PBKDF2 is going to be an optional dependency, we need to ensure that the compiler won't attempt to compile this file when it isn't present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's neat. I haven't seen that before. Thanks for teaching me about it!
lib/cloak/fields/pbkdf2.ex
Outdated
sha384 | ||
sha512 | ||
]a | ||
@sizes [16, 32, 64, 128] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support the :size
option, but I don't see a reason to limit the options. Let people choose whatever length they prefer.
@@ -33,6 +33,7 @@ defmodule Cloak.Mixfile do | |||
[ | |||
{:ecto, ">= 1.0.0"}, | |||
{:flow, "~> 0.13.0"}, | |||
{:pbkdf2, "~> 2.0", optional: true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to be careful about adding core dependencies to the library, but this would be fine as an optional dependency, yes.
After a little further thought I realized that in most cases, you do still have the original data in a separate field. (e.g.
This will decrypt the data in the original It's such a simple operation that I'm not sure there's much need for Cloak to provide it, other than perhaps to describe how you should do it. Repo.transaction fn ->
MySchema
|> Repo.stream()
|> Stream.each(fn schema ->
schema
|> MySchema.changeset()
|> Repo.update!()
end)
|> Stream.run()
end |
cdade22
to
2ba74be
Compare
Yep, I agree: it's a very easy operation. I thought it would be a nice value-add to bring it up to "feature parity" with the main field encryption. An answer of "it's too easy, let's not do that" is perfectly acceptable to me. Removing as much impedance as possible to handling the case where you think your encryption (or in this case hashing) key might have been compromised seems like it would be a good thing to have. Since this field is a hash instead of an encrypted field, a compromised key would only allow an attacker to more easily create and verify a rainbow table from a stolen dataset.
That's totally fair. I have taken to using the
To be clear, your docs in the latest alpha are 👌. You're doing a fantastic job writing up the documentation for Cloak! I think the fuzziness in the hash fields is merely because they aren't organized as tightly as the standard encrypted fields; that's why I thought making their underpinnings as robust as the encrypted fields would be a good idea. I've addressed the two code-related issues you pointed out and now the build is passing too. Thank you for the in-depth feedback! |
For very sensitive information (Social Security numbers, tax identification numbers, or other ostensibly unique numbers assigned to people), it is a good idea to use a key stretching algorithm for the blind index to make it harder to brute force the hashed values. PBKDF2 is a key stretching algorithm that accomplishes this.