Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passphrase-based API of Secure Cell #577

Merged
merged 16 commits into from
Feb 12, 2020
Merged

Conversation

ilammy
Copy link
Collaborator

@ilammy ilammy commented Jan 24, 2020

Implement passphrase-based API of Secure Cell in Themis Core:

themis_status_t themis_secure_cell_encrypt_seal_with_passphrase(
        const uint8_t*  passphrase,
        size_t          passphrase_length,
        const uint8_t*  user_context,
        size_t          user_context_length,
        const uint8_t*  message,
        size_t          message_length,
        uint8_t*        encrypted_message,
        size_t*         encrypted_message_length
);

themis_status_t themis_secure_cell_decrypt_seal_with_passphrase(
        const uint8_t*  passphrase,
        size_t          passphrase_length,
        const uint8_t*  user_context,
        size_t          user_context_length,
        const uint8_t*  encrypted_message,
        size_t          encrypted_message_length,
        uint8_t*        plain_message,
        size_t*         plain_message_length
);

The API is described in Themis RFC 2 (not available publicly at the moment). This pull request implements the API in Themis Core on which we’ll base implementations in all high-level language wrappers.

Main issue that is solved by passphrase API is inadeqacy of ZRTP KDF used by master key API when applied to human-readable passphrases with low entropy. Previously we suggested and allowed such use in high-level languages, however it is not secure enough. Master key API should be used only with randomly-generated binary keys. Our symmetric key generation API produces 32-byte-long keys.

New API can be safely used with human-readable strings containing way less entropy. This is made possible by using a proper password key derivation function — PBKDF2 — to generate intermediate key used for encryption. Currently we are using 200,000 iterations — the slowdown is not noticeable for single-time use but is frustrating enough for repeated computation during brute-force attacks.

Passphrase support is currently implemented only for Seal mode of Secure Cell. It can be extended to Token Protect mode in the future. Context Imprint mode will never support passphrases directly because it does not provide storage space for KDF context. However, we may provide separate KDF interface later which will make it possible to use passphrases with Context Imprint mode too.

The implementation is split into five main parts:

  • Define authentication token format

    I’ve opted for more explicit approach to parsing and producing data formats. I believe that such approach is less error-prone and contains less magic. Though it is much more verbose that fast, direct, and incorrect casts to struct pointers that we are currently doing in master key API code.

    Give it a close look as this will be our main sponsor of hidden segmentation faults.

  • Prepend authentication token in Seal mode

    Token Protect mode keeps the token as a separate buffer while in Seal mode it is prepended to the encrypted message. We have to carefully gauge the output buffer size during encryption, and carefully parse input data during decryption.

  • Secure Cell encryption

    It is pretty straightforward. There are some Soter quirks that we need to deal with, and there are a couple of restrictions on the input data that we need to honor.

    Also note that we encrypt data using default KDF parameters which are fixed for Themis release. However, we may review and change them in future releases. Unless we introduce a new KDF algorithm, simple parameter tweaks are backwars and forward-compatible. With PBKDF2 we can adjust salt length (unlikely) and iteration count.

  • Secure Cell decryption

    This code path needs close attention as here we have to deal with actively hostile user input. Separate data format parsing makes it easier to follow, but still be alert.

  • Tests, documentation comments, changelog updates, etc.

    The tests are basic but they cover the most important points. New additions include explicit tests for all AES key lengths that we support. (Yes, I did fail to take that into account during development.)

    As for what they lack, we currently do not verify that Secure Cell is able to detect data corruption, for example. I hope to not forget to add them later.

Have fun reviewing these 1.5k lines of code next week!

Checklist

  • Change is covered by automated tests
  • Benchmark results are attached (coming soon)
  • The coding guidelines are followed
  • Public API has proper documentation
  • Example projects and code samples are updated (later)
  • Changelog is updated

Address feedback:

  • add stream_ prefix to parsing helpers
  • generalize KDF handling
  • try avoiding auth_tag_length comparisons

Add API definitions to <themis/secure_cell.h>. This is new public API
for passphrase-secured data where we apply a *password* key-derivation
function -- existing master key API uses ZRTP KDF which cannot be
securely used with passwords.
Here are some basic tests that are going to hold for the new API.

We verify a bunch of things:

- ability to encrypt and decrypt data
- reaction to invalid argument values
- incompatibility of passphrase and master-key API
- compatibility with older formats

We currently do *not* verify that Secure Cell is able to detect data
corruptions, invalid passphrases, etc. These tests will be added later.

Note the stability test suite. Some of language wrappers have something
like this but Themis Core generally did not verify that old data formats
are still supported. These tests verify that we are able to decrypt data
encrypted be previous versions of Themis. Passphrase API is going to
evolve and use different KDF configurations, effectively changing the
encryption algorithm. We need to be sure that we are still able to make
sense of the data after we change the implementation.

The tests also verify that we understand various AES key lengths that
may be used by different Themis builds. Existing Secure Cell code for
master key API handles them incorrectly. We'll fix it later.

And while we're here, move ARRAY_SIZE utility into common header file so
that it can be used across all tests.
Describe data structures used for Secure Cell authentication tokens.

Here we use much more explicit method than the existing master key
definitions. If we were to use the old approach, these structs would
look like

    struct themis_scell_auth_token_passphrase {
        uint32_t alg;
        uint32_t iv_length;
        uint32_t auth_tag_length;
        uint32_t message_length;
        uint32_t kdf_context_length;
    };
    // followed by IV data, auth tag data, KDF context

    struct themis_scell_pbkdf2_context {
        uint32_t iteration_count;
        uint16_t salt_length;
    };
    // followed by salt data

And then we'd go and cast byte buffers into this structs:

    (const struct themis_scell_seal_passphrase_header*)buffer
    buffer->iv_length;

While these structure definitions are definitely more compact, 'parsing'
code ends up scattered around the file and riddled with hard-to-reason
pointer magic. Furthermore, such pointer casts do not handle endianness
and pointer alignment correctly, leading to undefined behavior on some
ARM systems (and warnings from undefined behavior sanitizer -- that we
currently squelch because of such not-exactly-correct code).

New code is much more verbose -- in this file -- but the 'end-user' code
in Secure Cell implementation looks much cleaner and way easier to read
with all this complexity hidden elsewhere. The compiler does good job
at inlining all of the parsing code and it is as efficient as possible.
(Though yeah, it is *not* zero-copy as it was before.)

We put definitions in a private header file and make all parsing and
serialization functions "static inline" so that they do not cause
duplicate symbol errors and are always available to the compiler.

All of these are internal helpers and they are not intended to be reused
for other encryption modes or for any other purposes. They have rather
idiosyncratic API that might appear weird at first. Let it sink in.
Similar to master key API, introduce internal helpers that actually
implement encryption and leave only seal mode handling in secure_cell.c

Just like with master key API, the helpers can be shared between Seal
and Token Protect modes (though we do not implement Token Protect now).
Seal mode has to do a bit of trickery to prepend the authentication
token and to detect its length relative to the message for decryption.

So it's mostly the same ideas, but there are some differences:

- First of all, we use way fewer magical macros in the implementation.

- THEMIS_FAIL is returned when encrypted message buffer size does not
  match the size expected from the header. Master key API returns
  THEMIS_INVALID_PARAMETER in this case which is not entirely correct.
  New API treats this situation as data corruption.

  THEMIS_INVALID_PARAMETER should be used to indicate programming
  errors, such as passing a NULL pointer for a required argument.

- We also take care to not overwrite out-parameters for message length
  unless we are returning a 'success': either successful decryption
  indicated by THEMIS_SUCCESS, or a successful measurement of required
  output buffer size indicated by THEMIS_BUFFER_TOO_SMALL.

  Also note that the user may pass a buffer larger than necessary. For
  this we need to update the out pointer even in THEMIS_SUCCCESS case
  so that the user knows actual output size.

Also, expose "plain" encryption functions used by master key API so that
we can avoid dealing with Soter API intricacies and reuse the same code.
Implement Secure Cell (Seal) encryption with passphrases as described in
Themis RFC 2.

We use the same pattern as with master key code:

- themis_auth_sym_encrypt_message_with_passphrase() does all parameter
  validation, returns expected output buffer size if requested, and
  hands off processing to...

- themis_auth_sym_encrypt_message_with_passphrase_() which does all
  preparatory tasks for encryption: fills in the message header,
  generates derived key and ancillary data, calls encryption function,
  and then finally writes the header into output buffer

- themis_auth_sym_plain_encrypt() does actual encryption using Soter.
  Note that we have to clear the KDF field in algorithm to prevent
  Soter from using its own PBKDF2 implementation.

There are other points to note in the code:

- Before writing output buffer we verify its size again, now that we
  really know all the component sizes.

- We always wipe intermdidate data that we store on the stack. Derived
  key is never stored anywhere else. Other byte buffers are copied into
  output buffer.
Decryption is more complex that encryption. Here we need to tread
veeeery carefully, treating input data as actively hostile.

Function split is the same as with encryption. Notable points:

- Derived key length depends on the algorithm stored in the message.
  Current implementation of master key API makes a mistake of using
  the default key length always.

- Algorithm field has some reserved bits that are set to zero by our
  implementation. We check that they are indeed set to zero.

- Since we support only PBKDF2 code, we can cheat a little and keep
  everything in one function. I guess it will have to rewritten once
  we migrate from PBKDF2. But it's good enough for now.

- There are around 4 or 5 length checks in total for the output message
  because I really don't want to get a buffer overflow here.

- Don't forget to wipe the derived key from memory. Always.
@ilammy ilammy added the core Themis Core written in C, its packages label Jan 24, 2020
@ilammy ilammy mentioned this pull request Jan 27, 2020
4 tasks
Do not use wording "password" or "passphrase" with master key API of
Secure Cell. Instead, use either generate symmetric key, or something
that looks like a key and does not look like a passphrase.
Okay, sure, clang-format, that makes it much readable.
return (alg & SOTER_SYM_KEY_LENGTH_MASK) / 8;
}

static inline bool soter_alg_reserved_bits_valid(uint32_t alg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very cool to use such helpers and encapsulate hard to read logic into function with a human name

src/themis/secure_cell_seal_passphrase.c Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.c Outdated Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.h Outdated Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.c Outdated Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.c Outdated Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.c Outdated Show resolved Hide resolved
@@ -94,6 +94,123 @@ themis_status_t themis_secure_cell_decrypt_seal(const uint8_t* master_key,
uint8_t* plain_message,
size_t* plain_message_length);

/**
* Encrypts and puts the provided message into a sealed cell.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool functions documentation

src/themis/themis_portable_endian.h Outdated Show resolved Hide resolved
@ilammy ilammy mentioned this pull request Jan 31, 2020
9 tasks
encrypted_message + auth_token_length,
&ciphertext_length);
if (res == THEMIS_SUCCESS || res == THEMIS_BUFFER_TOO_SMALL) {
*encrypted_message_length = auth_token_length + ciphertext_length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse total_length here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional because the themis_auth_sym_encrypt_message_with_passphrase() call might have modified those values. Of course, it does not modify them and accurately reports required lengths so we could reuse the value. But I'd leave that for future-proofing.

For all functions that return buffers, it's not strictly incorrect behavior to return slightly bigger required buffer size during measurement stage (BUFFER_TOO_SMALL) and then return accurate – and possibly smaller – actual size of the buffer used on SUCCESS. Currently we are able to accurately determine required length without actually processing the data, but it might not be the case in the future.

src/themis/secure_cell_seal_passphrase.c Outdated Show resolved Hide resolved
src/themis/secure_cell_seal_passphrase.h Outdated Show resolved Hide resolved
src/themis/themis_portable_endian.h Outdated Show resolved Hide resolved
Add a "stream" prefix so that they are better associated with file
streams and similar wStream utilities from WinPR.

Unify read and write API to use in/out pointers in last position, with
first argument always being the stream to modify, with its new value
returned from the function.

The behavior is the same, but the API is neater and easier to read.
Allow non-NULL pointer to associated context buffer when its length is
set to zero. This plays nicely with languages that do not return NULL
for their empty byte containers. (And it was not tested either...)
Encryption paths of master key and passphrase API has an unnecessary
and weird check of auth tag length output. Strictly speaking, we should
measure the size of tag but we just know that it's 12 bytes for AES-GCM.

Tweak the pointer type in internal helper function so that it is able to
directly write the auth tag length into the header struct. This way we
don't have to check the length is unchanged.

However, since Soter API outputs size_t, we do need to check that it's
safe to truncate the value returned by Soter.
Move KDF context parsing into seprate helper function with takes in
Secure Cell header and a passphrase, and outputs a derived key. This
will make it easier to extend the decryption code path to handle other
KDFs when we add them.
Similar to decryption, make KDF computation dependent on whatever we
write into hdr->alg so that it stays consistent. This should make adding
new KDFs later much easier: you only need to write a new function which
computes derived key and add it to the KDF switch.
Copy link
Collaborator

@Lagovas Lagovas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ilammy ilammy requested a review from ignatk February 11, 2020 08:24
Woops, this is a good typo. clang-analyzer catches it, by the way.
@ilammy ilammy merged commit a0a3ebe into cossacklabs:master Feb 12, 2020
@ilammy
Copy link
Collaborator Author

ilammy commented Feb 12, 2020

This PR blocks a number of other PRs so I'm merging it right now with only one approval. Since most of the issues have been addressed and this PR has received some reviewer attention, I guess it's okay. @ignatk, @storojs72, if you have any concerns about these changes, please write them here, I'll address them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Themis Core written in C, its packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants