Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PBKDF2 for blind indexing #67

Merged
merged 3 commits into from
Aug 23, 2018

Conversation

michaelherold
Copy link
Contributor

For very sensitive information (Social Security numbers, tax identification numbers, or other ostensibly unique numbers assigned to people), it is a good idea to use a key stretching algorithm for the blind index to make it harder to brute force the hashed values. PBKDF2 is a key stretching algorithm that accomplishes this.

For very sensitive information (Social Security numbers, tax
identification numbers, or other ostensibly unique numbers assigned to
people), it is a good idea to use a key stretching algorithm for the
blind index to make it harder to brute force the hashed values. PBKDF2
is a key stretching algorithm that accomplishes this.
@coveralls
Copy link

coveralls commented Jun 24, 2018

Coverage Status

Coverage decreased (-0.7%) to 97.101% when pulling 8699e64 on michaelherold:add-pbkdf2 into eadb6a1 on danielberkompas:master.

Copy link
Contributor Author

@michaelherold michaelherold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a third blind indexing algorithm (alongside SHA256 and HMAC). I was wondering if you would be open to improving the support for these. The things that are missing when compared to the encrypted fields are:

  1. Easy key rotation.
  2. Easy configuration for multiple ciphers (or, in this case MACs) to toggle between (i.e. your Vault concept).
  3. The documentation is slightly better for the encrypted fields than it is for the index fields.

What do you think of that idea? With that change, we could more easily add support for other algorithms like scrypt and Argon2.

Also, would you be open to a PR that updates the documentation to use the term "blind index" instead of "hash"? It's a term that is used in a Ruby gem, this blog post, and this one. With that, I'd like to suggest the examples change from the form of email_hash to email_bidx.

@@ -33,6 +33,7 @@ defmodule Cloak.Mixfile do
[
{:ecto, ">= 1.0.0"},
{:flow, "~> 0.13.0"},
{:pbkdf2, "~> 2.0", optional: true},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of ease and "let's use something that has been tested before", I went with adding this Erlang library as an optional dependency. Are you okay with that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to be careful about adding core dependencies to the library, but this would be fine as an optional dependency, yes.

sha384
sha512
]a
@sizes [16, 32, 64, 128]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting the size of the blind index to specific byte lengths is an optimization for space within the database. I like the idea of same-size index keys. What do you think about this? There is also a version of the function that doesn't have a specified derived key length, so we can remove this easily.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support the :size option, but I don't see a reason to limit the options. Let people choose whatever length they prefer.

@danielberkompas
Copy link
Owner

danielberkompas commented Jun 26, 2018

Thanks for this PR! I'll do a line-by-line on the PR shortly, but here are my overall thoughts on the questions you asked:

[W]ould you be open to a PR that updates the documentation to use the term "blind index" instead of "hash"?

This is the first time I've ever heard the term, actually. "Hash" is a much more commonly used and understood term, and it's even used in one of the blog posts you cited:

The general idea is to store a keyed hash (e.g. HMAC) of the plaintext in a separate column.

The term "blind index" refers to the indexed DB column itself, not the contents of the column. Ecto fields are typically named after the content they contain, not the indexing technique they enable. (For example, :string, :integer, :datetime) For this reason, I prefer the term "Hash", because it accurately describes what the column will contain.

1. Easy key rotation.

It's difficult to rotate keys when you no longer have the original information. HMAC, SHA256, and PBKDF2 are all irreversible, so you could not use the technique for key rotation that we use for AES. You cannot "decrypt" the fields and then "encrypt" them with a new key.

I am not aware of a technique that can rotate keys for irreversibly encrypted fields which Ecto would support. Do you have any suggestions?

2. Easy configuration for multiple ciphers

The purpose of multiple ciphers for the reversible encryption fields is to facilitate decrypting old data and migrating to a new key.

This is irrelevant to irreversible algorithms such as PBKDF2. The purpose of these fields is for querying, which assumes that only one key has been used in the column that you are querying.

If you want to have multiple PBKDF2 keys active in the app at the same time, you can create multiple field modules in your app, each with a different key.

3. The documentation is slightly better for the encrypted fields than it is for the index fields.

Fair point! I'm accepting PRs!

@@ -0,0 +1,193 @@
defmodule Cloak.Fields.PBKDF2 do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module should be wrapped in if Code.ensure_loaded?(PBKDF2). If PBKDF2 is going to be an optional dependency, we need to ensure that the compiler won't attempt to compile this file when it isn't present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's neat. I haven't seen that before. Thanks for teaching me about it!

sha384
sha512
]a
@sizes [16, 32, 64, 128]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support the :size option, but I don't see a reason to limit the options. Let people choose whatever length they prefer.

@@ -33,6 +33,7 @@ defmodule Cloak.Mixfile do
[
{:ecto, ">= 1.0.0"},
{:flow, "~> 0.13.0"},
{:pbkdf2, "~> 2.0", optional: true},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to be careful about adding core dependencies to the library, but this would be fine as an optional dependency, yes.

@danielberkompas
Copy link
Owner

danielberkompas commented Jun 26, 2018

After a little further thought I realized that in most cases, you do still have the original data in a separate field. (e.g. :email) In that case, the key rotation process would be:

  • Change the key for your hashed fields
  • Loop through all your records, invoke the changeset/2 function on them (which will call your private put_hashed_fields/2 function)
  • Save the records with Repo.update!()

This will decrypt the data in the original :email field, and re-hash the contents into the _hash (or _ibdx) field.

It's such a simple operation that I'm not sure there's much need for Cloak to provide it, other than perhaps to describe how you should do it.

Repo.transaction fn ->
  MySchema
  |> Repo.stream()
  |> Stream.each(fn schema ->
    schema
    |> MySchema.changeset()
    |> Repo.update!()
  end)
  |> Stream.run()
end

@michaelherold
Copy link
Contributor Author

michaelherold commented Jun 27, 2018

1. Easy key rotation.
It's such a simple operation that I'm not sure there's much need for Cloak to provide it ...

Yep, I agree: it's a very easy operation. I thought it would be a nice value-add to bring it up to "feature parity" with the main field encryption. An answer of "it's too easy, let's not do that" is perfectly acceptable to me.

Removing as much impedance as possible to handling the case where you think your encryption (or in this case hashing) key might have been compromised seems like it would be a good thing to have. Since this field is a hash instead of an encrypted field, a compromised key would only allow an attacker to more easily create and verify a rainbow table from a stolen dataset.

The term "blind index" refers to the indexed DB column itself, not the contents of the column.

That's totally fair. I have taken to using the _bidx affix on blind index columns to be a little more communicative to someone looking at the ERD for a table with blind indexed columns. I'm happy to stick with the more well-known "hash" terminology.

Fair point! I'm accepting PRs!

To be clear, your docs in the latest alpha are 👌. You're doing a fantastic job writing up the documentation for Cloak! I think the fuzziness in the hash fields is merely because they aren't organized as tightly as the standard encrypted fields; that's why I thought making their underpinnings as robust as the encrypted fields would be a good idea.

I've addressed the two code-related issues you pointed out and now the build is passing too.

Thank you for the in-depth feedback!

@danielberkompas danielberkompas merged commit 50b9702 into danielberkompas:master Aug 23, 2018
@michaelherold michaelherold deleted the add-pbkdf2 branch August 24, 2018 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants