Skip to content

Reduce session size with encoding and compression improvements#7703

Merged
mitchellhenke merged 10 commits intomainfrom
mitchellhenke/small-session
Feb 10, 2023
Merged

Reduce session size with encoding and compression improvements#7703
mitchellhenke merged 10 commits intomainfrom
mitchellhenke/small-session

Conversation

@mitchellhenke
Copy link
Contributor

🛠 Summary of changes

This set of changes follows a similar path to #6315 to update the structure and processing of session data.

Currently, our sessions are encrypted and encoded in a few layers:

  • KMS Encrypt Sensitive Data in Session Hash
    • Split sensitive data into KMS-sized chunks
    • Base64 encode
    • KMS Encrypt
    • JSON encode concatenated chunks
  • Base64 encode
  • JSON encode
  • Fingerprint Data
    • Generate data fingerprint and Base64 encode
    • Base64 encode JSON and concatenate with encoded fingerprint
    • Base64 encode concatenated data
  • AES Encryption
    • AES Encrypt Data
    • Base64 encode AES initialization vector (IV)
    • Base64 encode ciphertext
    • Base64 encode AES tag
    • Build hash of IV, ciphertext and tag and then JSON encode
  • Base64 encode

The shape of this process has existed since near the beginning of the project, was conservative in its approach and worked reliably. It also provides a lot of flexibility by returning encoded data at any step, which meant it could be saved to a given data store without worrying about whether it would accept unencoded binary data.

The current process traded some overhead on session size for flexibility. We do not anticipate significant changes to how or where we'll save session data, and the volume to Redis is a potential bottleneck going forward. Redis and KMS support storing and encrypting binary data, so we can trim that overhead out while trying to retain some of the flexibility and have a path towards iterating on the structure if desired.

One of the larger complications in this refactor is in the AesCipher and AesEncryptor classes, which are used in a variety of places for encryption, some of which we cannot easily convert into using updated formats. The most difficult is profile PII which we'd only be able to re-encrypt once the user has decrypted it with their password. This PR moves the existing classes and replaces them with the Legacy prefix in the class name. One pattern I'd like to establish in this is that the Encryptor classes should minimize the overhead of processing and storage size and allow the caller to do any encoding or post-processing if needed. I've intentionally not renamed the new classes over the old ones for now to avoid collisions with any ongoing work. I've also not modified the KmsClient. It does include one extraneous Base64 encoding that we could drop, but the gains in other areas were relatively significant already. We could certainly explore updating that in the future.

Two significant changes in this is compression, which we will now do above a defined threshold and using MessagePack for encoding rather than JSON. The primary reason for this is JSON does not support binary data natively.

The new process is:

  • KMS Encrypt Sensitive Data in Session Hash
    • Split sensitive data into KMS-sized chunks
    • Base64 encode
    • KMS Encrypt
    • JSON encode concatenated chunks
  • JSON encode
  • Optionally GZIP compress
  • Fingerprint Data
    • Generate data fingerprint
    • Concatenate data and data fingerprint and MessagePack encode
  • AES Encryption
    • AES Encrypt Data
    • Build hash of initialization vector (IV), ciphertext and tag and then MessagePack encode
  • Build hash of encrypted data, whether it was compressed and the schema version and then MessagePack encode

Preliminary testing suggests the we could save 80+% in the common cases and potentially more. This has been running in the load testing environment.

@mitchellhenke mitchellhenke requested a review from a team January 27, 2023 14:38
Copy link
Contributor

@zachmargolis zachmargolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Is there any good spot to log what version a session is when we decrypt it? would be good to get a sense of how much pii bundle is v2 vs v3, how many sessions are v2 vs v3 etc

@mitchellhenke mitchellhenke force-pushed the mitchellhenke/small-session branch from ea9458f to ccfa365 Compare January 30, 2023 23:24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PII::FIngerprinter.fingerprint returns a hexdigested fingerprint. We could save even more space using bytes here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good catch! I tried this a bit, and the improvements are relatively constant. It's 20-32 bytes saved regardless of the size of the total payload, so the savings may not necessarily scale. I think the only place it is used is once in the AES encryption, which would track with that.

244c4c063 is my draft attempt at this, I'm kind of on the fence on the tradeoff. If it's useful for the future, it would be more worthwhile though.

@jmhooper
Copy link
Contributor

It looks like we get msgpack as a side-effect of having bootsnap installed. It doesn't look like a dep of rails. We should probably explicitly add the gem to the Gemfile

@mitchellhenke
Copy link
Contributor Author

It looks like we get msgpack as a side-effect of having bootsnap installed. It doesn't look like a dep of rails. We should probably explicitly add the gem to the Gemfile

Ah good catch, added

@mitchellhenke mitchellhenke force-pushed the mitchellhenke/small-session branch 3 times, most recently from 83aadcc to 5c54a88 Compare February 1, 2023 19:39
@mitchellhenke mitchellhenke force-pushed the mitchellhenke/small-session branch from 895a776 to 2bde069 Compare February 1, 2023 20:34
@mitchellhenke mitchellhenke marked this pull request as ready for review February 2, 2023 14:39
@mitchellhenke mitchellhenke force-pushed the mitchellhenke/small-session branch 3 times, most recently from 80ed2f4 to 37df769 Compare February 9, 2023 20:33
@mitchellhenke mitchellhenke force-pushed the mitchellhenke/small-session branch from 37df769 to 15ed9b1 Compare February 10, 2023 15:18
@mitchellhenke mitchellhenke merged commit 059afe8 into main Feb 10, 2023
@mitchellhenke mitchellhenke deleted the mitchellhenke/small-session branch February 10, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants