Skip to content

Using String instead of LargeBinary for impl of EncryptedType#426

Merged
kvesteri merged 3 commits intokvesteri:masterfrom
aicioara-forks:0.36.1-fix-encrypted-type
May 3, 2020
Merged

Using String instead of LargeBinary for impl of EncryptedType#426
kvesteri merged 3 commits intokvesteri:masterfrom
aicioara-forks:0.36.1-fix-encrypted-type

Conversation

@aicioara
Copy link
Contributor

@aicioara aicioara commented Mar 15, 2020

Closes #425

LargeBinary is problematic in python3 (see issue description). I recommend that we use String instead of LargeBinary as the impl for EncryptedType.

self.fernet.encrypt() and all other encrypt() methods in this class already return base64 encodings of the data, they are just formatted as byte instead of str. It does not make much sense to store them as binary because:

  • encoding/decoding is problematic
  • They are BASE64 (i.e. string) already

I manually tested this change by encoding and decoding strings using all three encryption engines (AesEngine, AesGcmEngine, FernetEngine) and everything works as expected.

Additionally, I went ahead and included

git+git://github.com/aicioara-forks/sqlalchemy-utils@0.36.1-fix-encrypted-type#egg=SQLAlchemy-Utils

In my requirements.txt and my entire flask application works as expected.

Happy to take feedback to make this perfect.

@kvesteri kvesteri merged commit 0d9bee2 into kvesteri:master May 3, 2020
@leosussan
Copy link

leosussan commented May 7, 2020

This seems to have broken things for me & probably will for others.

In my implementation (PostgreSQL 11, using FernetEngine), up to this point, encrypted values are stored as a bytea column, which is serialized into a bytes in Python prior to encryption. bytes has no .encode() function and subsequently breaks here.

This would conceivably break on any non-string object when extracted from the database (e.g. the sa.Unicode implementation in the docs for EncryptedType).

Moving .encode() to the str() step & removing from value fixes extraction.

        if isinstance(value, six.text_type):
            value = str(value).encode()
        decrypted = self.fernet.decrypt(value)

This doesn't solve the problem of encryption into the DB, which expects bytea, not a string.

@aicioara
Copy link
Contributor Author

aicioara commented May 9, 2020

I am sorry. I made this PR as a suggestion, I was not expecting it to be merged as-is, but I assumed things are alright for all use cases given the merge. It's been a while since March 14 when I opened it first, so I cannot remember much, but is there a way of making this work for both myself and @leosussan ?

@leosussan
Copy link

leosussan commented May 11, 2020

I personally believe that your change is a good one, for the reasons you've outlined when you started this PR. To add to this, at least in Postgres, bytea is generally less performative than text when storing encoded data.

But - at the very least, the documentation / changelog should make clear that the change will break existing implementation.

For the sake of continuity, here's my current workaround. It works by subclassing EncryptedType, redefining impl, and reversing the adjustments:

class CustomEncryptedType(EncryptedType):
    impl = LargeBinary

    def process_bind_param(self, value, dialect):
        value = super().process_bind_param(value=value, dialect=dialect)
        if isinstance(value, str):
            value: bytes = value.encode()
        return value

    def process_result_value(self, value, dialect):
        if isinstance(value, bytes):
            value: str = value.decode()
        value: Optional[Any] = super().process_result_value(
            value=value, dialect=dialect
        )
        return value

It might make sense to introduce this as a legacy option, e.g. LegacyEncryptedType, but urge users of EncryptedType to write a migration for their respective databases.

rushilsrivastava added a commit to rushilsrivastava/sqlalchemy-utils that referenced this pull request May 23, 2020
kvesteri pushed a commit that referenced this pull request May 24, 2020
* update env and update dep

* add warnings

* clean up __init__

* fix references to StringEncryptedType

* fixed the init call to StringEncryptedType
@dsimidzija
Copy link

Hey folks, this warning is not good on its own; it implies that we can easily switch from one type to another, but the switch is not backwards compatible, and this is not made obvious. A better message would be to "switch to ByteEncryptedType/LegacyEncryptedType to keep working with LargeBinary" in the future, or alternatively make a roadmap on how to actually migrate the db from EncryptedType to StringEncryptedType without losing any data in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EnryptedType in python3 and postgres fails with 'string argument without an encoding'

4 participants