Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same MRENCLAVE for both trusted and protected files #513

Open
AgentRX opened this issue Apr 10, 2022 · 44 comments
Open

The same MRENCLAVE for both trusted and protected files #513

AgentRX opened this issue Apr 10, 2022 · 44 comments
Labels
enhancement New feature or request P: 3

Comments

@AgentRX
Copy link

AgentRX commented Apr 10, 2022

Hello,
My name is Andrei Pogoreltsev,
I'm CTO of Super Protocol Team and we glad to use Gramine in our project. Thank you for the product you're developing!

So, we want to protect our clients application code which run inside TEE (enclave). The TEE hardware is not owned by application owners. To do it we want to use protected files mechanism by this way:

Prepare:

  1. We separate "base image" with some entrypoint and the "application". For example we can use Python base image and run /entrypoint/start.py script at start. Base image is no need to be protected and can be reusable by different applications.

  2. The application owner combine base image and the application together to create Gramine manifest and get mrenclave. Also the "/entrypoint/start.py" path is both trusted and protected file. The main idea - it allows to check application before run and be sure nobody replaced the code because Gramine can check trusted files based on manifest hashes.

  3. After it we can protect (encrypt by pf files mechanism) application using some encryption key.

So the idea is simple:

  1. Separate protected/unprotected code
  2. Prepare Gramine manifest for base image and not crypted application together
  3. Encrypt application by some encryption key

Also, using the base image we can create different protected applications with different mrenclaves (see step 2)

To initialize Gramine PF encryption key we want to use Secret Provisioning mechanism which allows Gramine to get the key by connecting to our local server (which also runs inside enclave). The server attests enclave and send protection key matched to mrenclave.

One of using mechanism example:

  1. First we run our local server
  2. We attest the server remotely
  3. Then all keys for different protected applications are transferred inside the server
  4. We copy all base images and protected applications to the server side
  5. Now we can run enclaves and send them keys using Secret Provisioning mechanism

The problem
The main problem is that Gramine doesn't check trusted file which also is protected:
https://github.com/gramineproject/gramine/blob/master/Pal/src/host/Linux-SGX/db_files.c#L112-L168

So in this case the intruder (for example hardware owner) can use his own malware to get other protected application encryption key by spoofing attack:

Prepare:

  1. The intruder steals attacked application Gramine manifest because it's not encrypted
  2. Then the intruder creates own protected application with the same base image BUT replace his own manifest by the attacked application
  3. Then the intruder send the malware encryption key to the trusted server

So now we have two protected applications with the same not encrypted base image and the same manifest but the applications are different

Using:

  1. Now the intruder runs his malware application by standard usage mechanism
  2. The malware is simple client which tries to get key from the trusted server using the Secret Provisioning mechanism, for example every second.
  3. Now we are try to run attacked application. To do it application owner sends encryption key inside the server
  4. The malware gets the key before standard Secret Provisioning mechanism of the attacked application and decrypt attacked application content.

To solve the problem we suggest to check trusted files if they are also protected and prepared PR: https://github.com/Villain88/gramine/pull/1/files

Also you can get the attack example here: https://github.com/Villain88/gramine-pf-poc

@boryspoplawski
Copy link
Contributor

I think the problem is that you have one server holding all PF keys, which then forwards those keys to different enclaves. If you do that, then it's also your job to make sure that the enclaves are valid. If all enclaves (i.e. the one holding all these keys and those that are running latter) are signed with the same key, then there is no problem - attacker doesn't have this signing key. If they are signed using different keys, then I think your scenario is insecure by design - you effectively allow running arbitrary code in enclaves which you then supply with secrets.
Can this be made secure? Possibly.
Why cant the party providing the PF key just remotely attest the spawned enclave? You must attest the first one anyway - you can instead just attest the 2nd one.

Quoting your repo:

a flaw in the gramine protected files mechanism

I definitely fail to see what the flaw is in Gramine.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 11, 2022

@AgentRX Let me see if I understand your attack scenario correctly.

  • You have the "base image" (e.g., Python interpreter) with the generic manifest like this:
sgx.enclave_size = "4G"
sgx.thread_num = 4
... and so on ...
sgx.protected_files = [ "file:encrypted_script.py" ]
  • You also have the enhanced Secret Provisioning server, which keeps a bunch of PF keys, each corresponding to the MRENCLAVE + MRSIGNER tuple. So this server contains a map like this:
[
  (mrenclave1, mrsigner1) -> pfkey1,
  (mrenclave2, mrsigner1) -> pfkey2,
  (mrenclave3, mrsigner2) -> pfkey3,
]

Good run

  • You have a good "application" from Alice, with the same generic manifest (and thus the same MRENCLAVE).

    • Alice creates some random key pfkeyA and encrypts its helloworld.py script with this key, transforming it to encrypted_script.py,
    • Alice is now supposed to load her pfkeyA into the Secret Provisioning server,
    • so Alice connects to the Secret Provisioning server, attests it and verifies its MRENCLAVE+MRSIGNER,
    • Alice is sure now that the Secret Provisioning server is a good one, and sends pfkeyA to the server,
    • The Secret Provisioning server verifies that the MRENCLAVE+MRSIGNER of Alice is the expected "base" one.
  • After this preparatory step from Alice, the server contains:

[
  (mrenclaveA, mrsignerA) -> pfkeyA,
]
  • Now good "application" is registered by the Secret Provisioning server. The application can start now:
    • The application connects to the Secret Provisioning server using Secret Prov/RA-TLS flows of Gramine,
    • The Secret Provisioning server iterates through its map and finds the tuple (mrenclaveA, mrsignerA) and sends back the found key pfkeyA,
    • The application (more specifically, Gramine library that runs before the application itself) receives pfkeyA,
    • Now the application tries to open encrypted_script.py and it is decrypted with the pfkeyA,
    • The application runs the good helloworld.py script.

Attack run

  • There is a malicious "application" (malware) from Mallory, with the same generic manifest (and thus the same MRENCLAVE).

    • Mallory creates some random key pfkeyM and encrypts its maliciously-steal-pf-key.py script with this key, transforming it to encrypted_script.py,
    • Mallory now loads her pfkeyM into the Secret Provisioning server,
    • so Mallory connects to the Secret Provisioning server, attests it and verifies its MRENCLAVE+MRSIGNER,
    • Mallory is sure now that the Secret Provisioning server is a good one, and sends pfkeyM to the server,
    • The Secret Provisioning server verifies that the MRENCLAVE+MRSIGNER of Mallory is the expected "base" one.
  • After this preparatory step from Mallory, the server contains:

[
  (mrenclaveA, mrsignerA) -> pfkeyM,
]
  • Now Mallory's "malware application" is registered by the Secret Provisioning server. The application can start now:

    • The malware application connects to the Secret Provisioning server using Secret Prov/RA-TLS flows of Gramine,
    • The Secret Provisioning server iterates through its map and finds the tuple (mrenclaveM, mrsignerM) and sends back the found key pfkeyM,
    • The malware application (more specifically, Gramine library that runs before the application itself) receives pfkeyM,
    • Now the malware application tries to open encrypted_script.py and it is decrypted with the pfkeyM,
    • The malware application runs the attacking script maliciously-steal-pf-key.py, which spins forever and periodically sends requests to the Secret Provisioning server.
  • Finally, the attacked application from Alice is starting:

    • Alice registers the application in the Secret Provisioning server,
    • The server simply overwrites its mapping: [ (mrenclaveA, mrsignerA) -> pfkeyA, ],
  • Now Mallory's malware app (that was running in the background all this time) sends a request again and gets the key pfkeyA

    • The attacking script maliciously-steal-pf-key.py simply prints out this key,
    • Alice's key is stolen, game over.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 11, 2022

If the flows I outlined above are correct, then I think you need to harden the Secret Provisioning server.

The main problem is that the Secret Provisioning server updates keys and sends keys as many times as it is asked to. In particular, the attacker uses the fact that he/she can request the key again and again, even if the key was changed in the meantime.

So a simple defense against this attack would be a cryptographic nonce:

  • When registering the key, the application generates and sends a nonce to the Secret Provisioning server,
  • The Secret Provisioning server memorizes this nonce ([ (mrenclaveA, mrsignerA, nonceA) -> pfkeyA, ]),
  • Now the application must provide the same nonce to actually get the PF key.

The attacker (Mallory) won't be able to guess the nonce at the pre-last step, where Alice sent her pfkeyA with nonceA to the Secret Provisioning server. Thus, Mallory can't steal the key of Alice.

Of course, this needs some changes to Gramine's Secret Provisioning library. But changing this library is far better than changing Gramine's core source code.

@mkow
Copy link
Member

mkow commented Apr 11, 2022

Instead of nonces I'd just require the application to sign the add-key request with the mrsigner key. This way only the owner of mrsigner key could modify the corresponding entries on the provisioning server.

@boryspoplawski
Copy link
Contributor

@dimakuv how does A and M have both the same mrsignerA??

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

@dimakuv Yep, your pipeline is right, but you missed some things:

  1. You can initialize malware by your own server. We united secrets provisioning server because it's our architecture decision but if you're attacker you can create your own initializer to run your malware.
  2. To use nonce from enclave we should put it to protected area and change starting mechanism or secret provisioning library

In our architecture the server also reencrypt the applications and we have a solution idea:

  1. During the reencryption we generate some nonce and add it to application
  2. Then out server generates new run manifest with new MRENCAVE but it's not beatiuful idea

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

@dimakuv how does A and M have both the same mrsignerA??

Because M stole manifest file of A

@boryspoplawski
Copy link
Contributor

Because M stole manifest file of A

So I can send an arbitrary (mrenclave, mrsigner, PFkey) tuple to the server and it gets accepted? That sounds like a big issue.

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

So I can send an arbitrary (mrenclave, mrsigner, PFkey) tuple to the server and it gets accepted? That sounds like a big issue.

The main idea of malware is to get PFkey of A sending (mrenclave, mrsigner). And if M steals manifest of A the secret provisioning server won't differ them based only on manifest because Gramine doesn't check protected part during the run malware from protected file. So M can use manifest of A

@boryspoplawski
Copy link
Contributor

But why does your server allows anyone to change keys of other parties without verification?

@dimakuv
Copy link
Contributor

dimakuv commented Apr 11, 2022

Unrelated to the current discussion of these attack scenarios:

We had a meeting today where we discussed the idea of hashed + encrypted Protected Files. Gramine developers are fine with adding such a feature, with the following (preliminary) implementation details:

  • Each such protected file must be mounted separately (in the new fs-mount scheme developed by Pawel), with a new field sha256 = "< hash >".
  • The hash is calculated over the first 4KB (the "metablock") of the protected file (i.e., the hash over the final, encrypted contents)
    • we do not want to hash over the plaintext file, because it lowers confidentiality guarantees on the file (knowing the hash, it would be easy to find the corresponding file, if it is "famous");
    • we do not want to hash over the whole encrypted file, because loading the whole encrypted file on each open() may lead to sub-optimal performance and won't give any additional security benefits, since the first 4KB ("metablock") already identifies the file uniquely.

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

But why does your server allows anyone to change keys of other parties without verification?

Our server like a task manager with upper level architecture. We simplified it just to show the problem.

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

Unrelated to the current discussion of these attack scenarios:

We had a meeting today where we discussed the idea of hashed + encrypted Protected Files. Gramine developers are fine with adding such a feature, with the following (preliminary) implementation details:

  • Each such protected file must be mounted separately (in the new fs-mount scheme developed by Pawel), with a new field sha256 = "< hash >".

  • The hash is calculated over the first 4KB (the "metablock") of the protected file (i.e., the hash over the final, encrypted contents)

    • we do not want to hash over the plaintext file, because it lowers confidentiality guarantees on the file (knowing the hash, it would be easy to find the corresponding file, if it is "famous");
    • we do not want to hash over the whole encrypted file, because loading the whole encrypted file on each open() may lead to sub-optimal performance and won't give any additional security benefits, since the first 4KB ("metablock") already identifies the file uniquely.

Good news. Will your team implement it or we can participate in developing?

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

@dimakuv will Gramine support multikeys for different files in the new fs-mount scheme developed by Pawel (#358)?

Because we reencrypt application and data with session key to avoid one of them to steal content of each other and also this allows them to work together.

For example, we have NN speech recognition python in /run/index.py and speeches in /data/speechOne, /data/speechTwo etc. They all are encrypted by different keys but our server reencrypt them by session key. With the model of trusting/protected files that your suggested we can't reencrypt them and will need multiple PF keys supporting. But how we should combine it in manifest?

Or mb new FS volumes will support different keys without changing manifest?

@boryspoplawski
Copy link
Contributor

But why does your server allows anyone to change keys of other parties without verification?

Our server like a task manager with upper level architecture. We simplified it just to show the problem.

But either it does accept, or it does not and M having enclave+signed manifest of A doesn't change anything. Besides manifest is assumed to be known to everyone (public), at least in our security model.

will Gramine support multikeys for different files in the new fs-mount scheme developed by Pawel

The support is already pending #504. Afaik it allows to set a key per mount (note you can just mount each file individually).

@AgentRX
Copy link
Author

AgentRX commented Apr 11, 2022

But why does your server allows anyone to change keys of other parties without verification?
Our server like a task manager with upper level architecture. We simplified it just to show the problem.
But either it does accept, or it does not and M having enclave+signed manifest of A doesn't change anything. Besides manifest is assumed to be known to everyone (public), at least in our security model.

Yes, manifest is opened and M can be run separately of the server (with PFKeyA) for example to make spoofing attack (it's not the replay attack).

will Gramine support multikeys for different files in the new fs-mount scheme developed by Pawel
The support is already pending #504. Afaik it allows to set a key per mount (note you can just mount each file individually).

Thank you

@dimakuv
Copy link
Contributor

dimakuv commented Apr 12, 2022

Good news. Will your team implement it or we can participate in developing?

Our team will implement it. Currently Pawel is working on the Protected Files rewrite (the new scheme that you mentioned), and Pawel will also add this feature as well.

Our hope is that this feature will be useful for your scenarios. So if some design/implementation decisions are bad for your scenarios, please don't hesitate to ask. For example, please note that our current design is to hash the first 4KB of the encrypted file, and this hash is what is put in the manifest file. We hope this works for you.

Or mb new FS volumes will support different keys without changing manifest?

Yes, as Borys mentioned, you can check #504. Each FS volume specifies a key name (e.g. my_custom_key), and the actual key value can always be modified by writing into a pseudo-file /dev/attestation/keys/my_custom_key. We will update our RA-TLS and Secret Prov examples similarly.

@AgentRX
Copy link
Author

AgentRX commented Apr 16, 2022

@dimakuv sorry for the late reply

Yes, as Borys mentioned, you can check #504. Each FS volume specifies a key name (e.g. my_custom_key), and the actual key value can always be modified by writing into a pseudo-file /dev/attestation/keys/my_custom_key. We will update our RA-TLS and Secret Prov examples similarly.

Of course, it's good feature but it won't update secret provisioning mechanism that we use, so it will need a time to implement this feature for secret provisioning (design and implement protocol, etc)

Our hope is that this feature will be useful for your scenarios. So if some design/implementation decisions are bad for your scenarios, please don't hesitate to ask. For example, please note that our current design is to hash the first 4KB of the encrypted file, and this hash is what is put in the manifest file. We hope this works for you.

It could be great to allow use hash for unprotected files. Let me describe our logic:

  1. Our TEE server (enclave trusted server) is just tasks manager for different confidential tasks at the moment. Every task has unprotected base image and application with datas from different providers which are protected by different keys.
  2. That's why we use session protection key generated inside out trusted server for every task. Trusted server reencrypt application and datas by the session key and combine them together and with base image to run. Also the server gets tasks results from encrypted output folder.

If you allow to use unprotected hash for applications it will fit ideally for our solution. Otherwise in current situation we will need to regenerate manifest file with session MRSIGNER inside the trusted server. Of course when you support multiple protection keys with secret provisioning mechanism we'll simplify server solution just to return necessary keys and it will work correctly with your current decision.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 20, 2022

It could be great to allow use hash for unprotected files.

The main problem with this approach is that hash(plaintext-file) reveals the information about the file: an attacker may calculates hashes of a set of well-known files (e.g., Python scripts) and compare with your hash. If you're using some well-known Python script, then the attacker will easily figure out which script is used. This leaks information about the code you want to run in the enclave, and this is a significant security risk.

Of course when you support multiple protection keys with secret provisioning mechanism we'll simplify server solution just to return necessary keys and it will work correctly with your current decision.

This sounds like a reasonable solution. If I understand you correctly, your current implementation of the TEE server generates a new key and re-encrypts all Protected Files with this key, and then sends this key to the SGX enclave. But your future implementation of the TEE server will not generate the key but instead will get the key from the code/data provider and forward this key to the SGX enclave. Thus, there won't be any need to re-generate the manifest file, because hash(encrypted-file) will be correctly calculated by Gramine in the final SGX enclave (as Gramine will get the same key as the code/data provider encrypted all Protected Files with in the first place).

@boryspoplawski
Copy link
Contributor

boryspoplawski commented Apr 20, 2022

The main problem with this approach is that hash(plaintext-file) reveals the information about the file

Also this gives an oracle - given a plaintext (file) you can tell whether it matches the ciphertext. When hashing encrypted part, you wouldn't be able to tell if plaintext and ciphertext match.
I guess it's the same, just restated to have more severe implications.

@AgentRX
Copy link
Author

AgentRX commented Apr 21, 2022

@dimakuv @boryspoplawski I got your point. So we're waiting you implement new mechanisms and integrate them in our solution.

@AgentRX
Copy link
Author

AgentRX commented Apr 22, 2022

BTW thank you very much. We added some architecture features to increase whole solution protection.

@Villain88
Copy link

Unrelated to the current discussion of these attack scenarios:

We had a meeting today where we discussed the idea of hashed + encrypted Protected Files. Gramine developers are fine with adding such a feature, with the following (preliminary) implementation details:

  • Each such protected file must be mounted separately (in the new fs-mount scheme developed by Pawel), with a new field sha256 = "< hash >".

  • The hash is calculated over the first 4KB (the "metablock") of the protected file (i.e., the hash over the final, encrypted contents)

    • we do not want to hash over the plaintext file, because it lowers confidentiality guarantees on the file (knowing the hash, it would be easy to find the corresponding file, if it is "famous");
    • we do not want to hash over the whole encrypted file, because loading the whole encrypted file on each open() may lead to sub-optimal performance and won't give any additional security benefits, since the first 4KB ("metablock") already identifies the file uniquely.

Hello. Is this feature not implemented yet? I couldn't find something similar in documentation and commits.

@mkow
Copy link
Member

mkow commented Jun 17, 2022

@Villain88: Nope, it's not implemented yet (you can see that this issue is still open ;) ).

@dimakuv
Copy link
Contributor

dimakuv commented Jun 7, 2023

Looking at this old thread again.

I still don't think the scenario described by @AgentRX can only be solved by the encrypted-and-hashed files (i.e., the combination of Encrypted Files + Trusted Files, as we call them in Gramine). I'm missing the details of the proposed deployment and why e.g. hardening of the initial communication with the Secret Provisioning server is not enough.

Nevertheless, I'd like to put here some of my notes. I still don't know if this is a good idea, and if this is actually helpful to anyone (possibly? the idea doesn't sound terrible to me)...

It could be great to allow use hash for unprotected files.

(What was meant by this is "calculate the hash over the decrypted file contents".)

I think this part is actually trivial to implement:

  1. Add smth like fs.mounts = [ { type = "encrypted", ..., sha256 = "abcd1234" } ], and enforce that such FS mounts take as input only the files, not the directories.
  2. Upon opening such an encrypted file, we read the whole file in the in-enclave-memory buffer and take a hash over it, and compare with the one specified in the manifest. This is probably slow, but this also should affect only these special files (so should be rare).
  3. We can put a note in the documentation that this construction leads to a possibility of the known-plaintext attack (though the AES algorithm used in our Encrypted Files is known to be resistant to such attacks).
    • But note this: knowing the hash, it would be easy to find the corresponding file, if it is "famous". So this construction cannot be used to "hide" the contents of the code (unless it's truly unique).

The main problem with this approach is that hash(plaintext-file) reveals the information about the file

Also this gives an oracle - given a plaintext (file) you can tell whether it matches the ciphertext. When hashing encrypted part, you wouldn't be able to tell if plaintext and ciphertext match.
I guess it's the same, just restated to have more severe implications.

I don't think it has any severe implications? We use AES in Encrypted/Protected Files in Gramine, and it's resistant to known-plaintext attacks: https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks/

@Villain88
Copy link

Villain88 commented Jun 7, 2023

I still don't think the scenario described by @AgentRX can only be solved by the encrypted-and-hashed files (i.e., the combination of Encrypted Files + Trusted Files, as we call them in Gramine). I'm missing the details of the proposed deployment and why e.g. hardening of the initial communication with the Secret Provisioning server is not enough.

Hardening of the initial communication with the Secret Provisioning server do not protect from TOCTOU attack.
Attacker can replace the file after checking it. How he will do it is another question, but such a mechanism looks unsafe.
"Encrypted-and-hashed" files provide reliable protection throughout the operation of the enclave.

@AgentRX
Copy link
Author

AgentRX commented Jun 7, 2023

I still don't think the scenario described by @AgentRX can only be solved by the encrypted-and-hashed files (i.e., the combination of Encrypted Files + Trusted Files, as we call them in Gramine). I'm missing the details of the proposed deployment and why e.g. hardening of the initial communication with the Secret Provisioning server is not enough.

The main idea of our suggestion is to include critical protected files in the integrity checking. We thought a lot about possible vulnerabilities of our solution but haven't found it. May be your team could analyse it better.

  • But note this: knowing the hash, it would be easy to find the corresponding file, if it is "famous". So this construction cannot be used to "hide" the contents of the code (unless it's truly unique).

Sha256 is a cryptographic hash and it's too hard to create dictionaries of "famous" files for it. I think the complexity of the task is not less then enumeration complexity but I could be wrong.

@mkow
Copy link
Member

mkow commented Jun 9, 2023

Attacker can replace the file after checking it. How he will do it is another question, but such a mechanism looks unsafe.
"Encrypted-and-hashed" files provide reliable protection throughout the operation of the enclave.

But in that case the attacker won't know the key which the enclave uses, because it would only use the keys provisioned by a verified server? So, it won't be able to modify the encrypted file without being detected.

Sha256 is a cryptographic hash and it's too hard to create dictionaries of "famous" files for it.

Being cryptographic or not is completely irrelevant here. You can create such directories as long as the function is deterministic and the output is big and +/- uniformly distributed, and the computation time is bearable.

I think the complexity of the task is not less then enumeration complexity but I could be wrong.

In practice this whole "complexity" is just entering the hash into Google, which is exactly an example of such a dictionary.

@dimakuv
Copy link
Contributor

dimakuv commented Jun 9, 2023

Sha256 is a cryptographic hash and it's too hard to create dictionaries of "famous" files for it.

Being cryptographic or not is completely irrelevant here. You can create such directories as long as the function is deterministic and the output is big and +/- uniformly distributed, and the computation time is bearable.

I think the complexity of the task is not less then enumeration complexity but I could be wrong.

In practice this whole "complexity" is just entering the hash into Google, which is exactly an example of such a dictionary.

Quick note here: if we ask the user to add a unique salt, then we protect at least against rainbow tables.

I envision it like this: fs.mounts = [ { type = "encrypted", ..., salt = "dcba4321", sha256 = "abcd1234" } ].

To uniquify the file, the user will do smth like the Python script:

import base64, hashlib, os

with open('my_file', 'rb') as f:
    data = f.read()

salt = base64.b64encode(os.urandom(32))
hash = hashlib.sha256(salt + data).hexdigest()
print(f"salt: {salt}, hash: {hash}")

@mkow
Copy link
Member

mkow commented Jun 9, 2023

No, please don't, this is just making the attack a bit slower/harder but doesn't really change anything. We just shouldn't hash the plaintext, it's a risky idea in general and I don't see a justification to implement it.

@dimakuv
Copy link
Contributor

dimakuv commented Jun 12, 2023

No, please don't, this is just making the attack a bit slower/harder but doesn't really change anything. We just shouldn't hash the plaintext, it's a risky idea in general and I don't see a justification to implement it.

Yes, after some additional thoughts and googling, I'm also leaning towards that this is simply unsafe and a broken crypto structure. I'm currently collecting opinions from other folks who are better at crypto then myself, but I also start to believe that hash-then-encrypt is plain wrong.

The encrypt-then-hash structure seems fine though, and surely simpler to implement.

@lejunzhu
Copy link
Contributor

The encrypt-then-hash structure seems fine though, and surely simpler to implement.

Encrypt-then-hash means changing the key will require a new manifest, right?

If that's the case, the original problem may have an easier solution. Just use a "fingerprint" function and calculate fpA = fingerprint(pfkeyA). Write fpA to a text file and make it a protected file (which gets into MRENCLAVE). Then, after reading pfkeyA from the server, the common code always check fpA == fingerprint(pfkeyA). Then the attacker M won't be able to run his malicious script, because he doesn't have pfkeyA in the first place and the common code will refuse to decrypt his script.

I'm no a crypto expert, so I'm not sure which fingerprint function is secure, but there are some options like sha512(key), KDF, etc.

@AgentRX
Copy link
Author

AgentRX commented Jun 13, 2023

No, please don't, this is just making the attack a bit slower/harder but doesn't really change anything. We just shouldn't hash the plaintext, it's a risky idea in general and I don't see a justification to implement it.

Can you really de-hash sha256? Is it really risk try to reverse crypto hash?

PS. Just to remind - we speak about big files to run with different data

@mkow
Copy link
Member

mkow commented Jun 13, 2023

PS. Just to remind - we speak about big files to run with different data

In your specific case. But you want us to add this as a generic feature to Gramine, which various users will use differently on their data.

Can you really de-hash sha256? Is it really risk try to reverse crypto hash?

? See the discussion above, I think everything was explained there what exactly you can do when knowing the hash.

@AgentRX
Copy link
Author

AgentRX commented Jun 13, 2023

In your specific case. But you want us to add this as a generic feature to Gramine, which various users will use differently on their data.

I agree but encrypt-than-hash is not good solution too because you have to keep you encrypted data with the same key all time or you have to regenerate manifest. Also data couldn't be packed in this case.

? See the discussion above, I think everything was explained there what exactly you can do when knowing the hash.

I don't agree with the big dangerous of sha256 rainbow tables or brute forcing it but I have an idea to solve this problen:

Can you use structure like this?
fs.mounts = [ { type = "encrypted", ..., encryptedSalt = "cipherSalt", sha256 = "abcd1234" } ]

So in this case you can use encryptedSalt and hash of plaintext XOR salt.

@mkow
Copy link
Member

mkow commented Jun 13, 2023

Also data couldn't be packed in this case.

What do you mean?

Can you use structure like this? fs.mounts = [ { type = "encrypted", ..., encryptedSalt = "cipherSalt", sha256 = "abcd1234" } ]

This doesn't change the amount of information leaked about the original file at all. Only makes the attack slower.

@AgentRX
Copy link
Author

AgentRX commented Jun 13, 2023

Also data couldn't be packed in this case.

What do you mean?

You can't use compression for encrypted files. For example if you create tar.gz for the huge python solution it could be very useful to compress it.

If you use protected files from the start they won't be compressed.

Can you use structure like this? fs.mounts = [ { type = "encrypted", ..., encryptedSalt = "cipherSalt", sha256 = "abcd1234" } ]

This doesn't change the amount of information leaked about the original file at all. Only makes the attack slower.

Also this case also doesn't fit for our solution because we generate session keys to re-encrypt files and will lose salt on this case.

May be you can use sha512 but I think the rainbow tables is not about big files and it doesn't help in this cases.

@mkow
Copy link
Member

mkow commented Jun 13, 2023

Also data couldn't be packed in this case.

What do you mean?

You can't use compression for encrypted files. For example if you create tar.gz for the huge python solution it could be very useful to compress it.

If you use protected files from the start they won't be compressed.

You always have to first compress and then encrypt, but how is this related to the discussion?

May be you can use sha512 but I think the rainbow tables is not about big files and it doesn't help in this cases.

As said previously, rainbow tables are just speed optimization to reduce the cost of repeated attacks.

@AgentRX
Copy link
Author

AgentRX commented Jun 13, 2023

You always have to first compress and then encrypt, but how is this related to the discussion?

Of course, you can first compress and then encrypt, but does the protected files mechanism support working from compressed files? If not, then it turns out that users will also need to come up with a solution to decompress files, am I right? Thats why this is relevant to the topic under discussion.

May be you can use sha512 but I think the rainbow tables is not about big files and it doesn't help in this cases.

As said previously, rainbow tables are just speed optimization to reduce the cost of repeated attacks.

Could you describe the risk of attacking a configuration stored in a 256 byte long json file, for example? And how the rainbow tables will help in this case?

@mkow
Copy link
Member

mkow commented Jun 13, 2023

Of course, you can first compress and then encrypt

"can" -> "have to". The other way just doesn't do anything.

but does the protected files mechanism support working from compressed files?

No, why would it? It's a mechanism to encrypt application files, why would it compress them?

If not, then it turns out that users will also need to come up with a solution to decompress files, am I right? Thats why this is relevant to the topic under discussion.

If you want your app to use compressed files, then just compress them and decompress in the app. If you want to propose a feature that Gramine compresses the mounted files and decompresses them on the fly then I'll be against this, it's the app role to do this, because it's the app which knows the file formats and how to compress them (e.g. if you work on images it's your app's role to pick the right image format and use it, making Gramine compress your BMPs into PNGs makes no sense).
Whichever of these two you meant, this is irrelevant to the protected files.

@dimakuv
Copy link
Contributor

dimakuv commented Jun 14, 2023

I created a PR that explores one way of encrypting-and-hashing files: #1390

@AgentRX
Copy link
Author

AgentRX commented Jun 14, 2023

Can you implement hash of source files as a separate option or think about how to bind the hash to the source file without being tied to encryption? I understand that it is not safe for very small files, but I have the following arguments:

  1. In our solution we store the entire structure of the files, first putting them in tar.gz, then encrypting them with aes-256-gcm. And just before launching in Gramine, we decompress and re-encrypt files using the protected files mechanism

  2. If we immediately use the protected files mechanism, this will impose restrictions on compression - users will have to work with compressed files, for example, the same gz / zip, which is very inconvenient for them. Moreover, not compressing data is also expensive for store and transfer because inside there can be trained models 1-2 GB in size.

  3. If users will prepare and protect small files (for example less than 256 bytes) out CLI will warn about it

@dimakuv
Copy link
Contributor

dimakuv commented Jul 21, 2023

@lejunzhu Could you expand on your comment:

If that's the case, the original problem may have an easier solution. Just use a "fingerprint" function and calculate fpA = fingerprint(pfkeyA). Write fpA to a text file and make it a protected file (which gets into MRENCLAVE). Then, after reading pfkeyA from the server, the common code always check fpA == fingerprint(pfkeyA). Then the attacker M won't be able to run his malicious script, because he doesn't have pfkeyA in the first place and the common code will refuse to decrypt his script.

In particular, what does it mean Write fpA to a text file and make it a protected file (which gets into MRENCLAVE)? Protected files do not get into MRENCLAVE. So what do you mean by this?

Also, if fpA would get into MRENCLAVE, then we are tied to this particular key (pfkeyA). So we're back to the problem of: the folks do not want to be tied to a particular key at all.

@lejunzhu
Copy link
Contributor

If that's the case, the original problem may have an easier solution. Just use a "fingerprint" function and calculate fpA = fingerprint(pfkeyA). Write fpA to a text file and make it a protected file (which gets into MRENCLAVE). Then, after reading pfkeyA from the server, the common code always check fpA == fingerprint(pfkeyA). Then the attacker M won't be able to run his malicious script, because he doesn't have pfkeyA in the first place and the common code will refuse to decrypt his script.

In particular, what does it mean Write fpA to a text file and make it a protected file (which gets into MRENCLAVE)? Protected files do not get into MRENCLAVE. So what do you mean by this?

Sorry, it should be "a trusted file" instead of protected.

Also, if fpA would get into MRENCLAVE, then we are tied to this particular key (pfkeyA). So we're back to the problem of: the folks do not want to be tied to a particular key at all.

Then my idea won't work.

P.S. Later on this thread, the OP seems to want to decompress a tarball and its content as trusted files. IMO this won't work, because trusted files are read-only.

@dimakuv dimakuv added enhancement New feature or request P: 3 labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P: 3
Projects
None yet
Development

No branches or pull requests

6 participants