Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middleware for Crypt4GH Support #170

Open
athith-g opened this issue May 9, 2024 · 1 comment
Open

Middleware for Crypt4GH Support #170

athith-g opened this issue May 9, 2024 · 1 comment

Comments

@athith-g
Copy link
Contributor

athith-g commented May 9, 2024

Problem
Crypt4GH is a file format developed by GA4GH that allows sensitive genomic data to remain encrypted at rest and in transit. Currently, TES implementations do not support the use of crypt4GH files as inputs.

Solution
I will be developing middleware for proTES that enables the use of crypt4GH files without the user having to alter the initial TES request. The middleware should detect the presence of a crypt4GH file and alter the initial request such that a decryption step is included in the task.

The decryption step is the addition of an executor that decrypts the crypt4GH file and temporarily places the decrypted file in a volume. Essentially, the executor:

  1. Generates an ephemeral key pair
  2. Fetches the crypt4GH file with the ephemeral public key
  3. Decrypts the file with the ephemeral private key
  4. Places the decrypted file in a volume.
Screenshot 2024-04-01 at 10 39 17 PM Screenshot 2024-04-01 at 11 10 47 PM

The diagram on the left describes a workflow without a crypt4GH file as input. The diagram on the right describes a workflow with a crypt4GH file as input.

Possible Alternative Approach
Rather than generating a new decryption executor in each TES instance that utilizes a crypt4GH file, the proTES middleware can decrypt the files itself and store them temporarily in some repository. The TES instances can then read the file from this repository without having to utilize a decryption executor. This approach would use less compute (since files are only decrypted once total rather than once per TES instance) and may be less complex to implement. However, the downsides of this approach are that the decrypted data would be centralized and data would be decrypted in transit (when each instance fetches that data from the repository), making this approach less secure.

@uniqueg
Copy link
Member

uniqueg commented May 18, 2024

This is brilliant, thanks a lot @athith-g.

The alternative approach would be useful only in a situtation where proTES and all TES endpoints are part of the same organization with the same security architecture. I guess it really is too limiting. Security is expensive 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants