Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pal/Linux-SGX] tools: Fix resource leak in Secret Prov lib #647

Closed
wants to merge 5 commits into from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Jun 10, 2022

Description of the changes

The resources allocated for mbedTLS functioning on the client side of Secret Prov lib (the one which attests itself to the remote party and runs inside the Gramine-SGX enclave) were not freed if the user of the Secret Prov lib used secret_provision_start(..., &ctx) API. This led to memory and resource leaks. In particular, the TLS connection to the remote party was shutdown but the underlying FD was not closed.

Kudos to Liang Fang [email protected] who reported this leak.

Closes #646.

How to test this PR?

One can manually try to hit a resource limit using the secret_prov_client example:

diff --git a/CI-Examples/ra-tls-secret-prov/src/secret_prov_client.c

 #define CA_CRT_PATH "ssl/ca.crt"

-int main(int argc, char** argv) {
+int loop(void) {
     int ret;
     int bytes;

@@ -83,3 +83,14 @@ out:
     secret_provision_close(&ctx);
     return ret;
 }
+
+int main(int argc, char** argv) {
+    setbuf(stdout, NULL);
+    for (int i = 0; i < 1024 * 1024; i++) {
+        printf("[%06d] Receiving secret\n", i);
+        int ret = loop();
+        if (ret < 0)
+            return ret;
+    }
+    printf("DONE");
+    return 0;
+}

If you want to try out this patch, be careful -- this will try to generate 1,000,000 requests to the IAS web service (if you use EPID), which will throttle your connections and will be slow and painful. Hopefully Intel won't ban me for this stress test :)


This change is Reviewable

The resources allocated for mbedTLS functioning on the client side of
Secret Prov lib (the one which attests itself to the remote party and
runs inside the Gramine-SGX enclave) were not freed if the user of the
Secret Prov lib used `secret_provision_start(..., &ctx)` API. This led
to memory and resource leaks. In particular, the TLS connection to the
remote party was shutdown but the underlying FD was not closed.

Kudos to Liang Fang <[email protected]> who reported this leak.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
/* destroy a previously provisioned secret, if any */
secret_provision_destroy();

g_provisioned_secret_size = received_secret_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this line should be moved to line 271 where malloc have succeeded.

free(provisioned_secret);
provisioned_secret = NULL;
provisioned_secret_size = 0;
free(g_provisioned_secret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems need to add"if (g_provisioned_secret)", otherwise free() may lead to exception when calling secret_provision_destroy() twice

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @liangintel)

a discussion (no related file):
UPDATE: In my experiments, I went to [001026] Receiving secret in the modified secret_prov_client, and I stopped there. Seems that no resource leaks are there (I examined opened FDs of the process, looked good).



Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 77 at r1 (raw file):

Previously, liangintel (Liang Fang) wrote…

seems need to add"if (g_provisioned_secret)", otherwise free() may lead to exception when calling secret_provision_destroy() twice

No, free(NULL) does nothing bad. So there is no need for an IF statement.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 265 at r1 (raw file):

Previously, liangintel (Liang Fang) wrote…

seems this line should be moved to line 271 where malloc have succeeded.

Done. Good catch.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @liangintel)

a discussion (no related file):
@liangintel please use reviewable for commenting / review



Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

    if (ret < 0) {
        secret_provision_destroy();
        secret_provision_free_resources();

What about case of !out_ctx which was handled before?
Update: dont we just miss secret_provision_close here in some cases?
This API is confusing with close, destroy and free...


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

            /* use well-known error code for a typical case when remote party closes connection */
            if (ret == MBEDTLS_ERR_SSL_PEER_CLOSE_NOTIFY)
                ret = -ECONNRESET;

ughh, we mix mbedtls error codes with unix ones?

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 4 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @liangintel)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):
Yes, I agree that the naming got confusing. A quick explanation:

  • close terminates the TLS connection gracefully and frees the used resources, if any (by calling free_resources).
  • destroy deallocates the secret blob (nothing more than that).
  • free_resources is a helper func to free the resources used by the TLS connection.

close and destroy are user-facing APIs of the Secret Prov library, and so we cannot modify them. We can modify free_resources though, if you wish so -- we can change its name to something better.

Update: dont we just miss secret_provision_close here in some cases?

Not sure what you mean. If we are in this error handling, then we don't care about graceful termination of the TLS session, so no need to call close(). It is enough to call free_resources() to kill the connection.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

ughh, we mix mbedtls error codes with unix ones?

Done, you're right. Hopefully this is good enough for this case.

But I looked at other functions in the Secret Prov library, we have this mixing of mbedTLS vs UNIX error codes in other funcs. Do we want to fix it (not in this PR)? If yes, then how -- convert all mbedTLS error codes to "close enough" UNIX error codes?

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @liangintel)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

are user-facing APIs of the Secret Prov library, and so we cannot modify them.

Why we cannot modify them?

Not sure what you mean. If we are in this error handling, then we don't care about graceful termination of the TLS session, so no need to call close(). It is enough to call free_resources() to kill the connection.

The old version called free if !out_ctx, the new version does not


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done, you're right. Hopefully this is good enough for this case.

But I looked at other functions in the Secret Prov library, we have this mixing of mbedTLS vs UNIX error codes in other funcs. Do we want to fix it (not in this PR)? If yes, then how -- convert all mbedTLS error codes to "close enough" UNIX error codes?

Maybe mbedtls has some option to embed unix errno inside their errors? Like unix_to_mbedtls(x) function.

Copy link
Contributor

@liangintel liangintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran my workload(https://github.com/intel/confidential-computing-zoo/tree/branch-dev/cross_lang_framework/cczoo/cross_lang_framework), and it's fine now.

Reviewed 2 of 4 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @liangintel)

Copy link
Contributor

@liangintel liangintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

are user-facing APIs of the Secret Prov library, and so we cannot modify them.

Why we cannot modify them?

Not sure what you mean. If we are in this error handling, then we don't care about graceful termination of the TLS session, so no need to call close(). It is enough to call free_resources() to kill the connection.

The old version called free if !out_ctx, the new version does not

from my point of view, the function secret_provision_destroy() did too few things, and destroy should be the last step (open->close->destroy). Meanwhile, because no bug has been reported previously, I guess this API is not widely used. So we may can change this API here.

Copy link
Contributor

@liangintel liangintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

Previously, liangintel (Liang Fang) wrote…

from my point of view, the function secret_provision_destroy() did too few things, and destroy should be the last step (open->close->destroy). Meanwhile, because no bug has been reported previously, I guess this API is not widely used. So we may can change this API here.

I think we can let the bug fix go first (a customer is evaluating my cross-language framework, and the bug fix is urgent), and code refine can be in another PR

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

The old version called free if !out_ctx, the new version does not

This is because when !out_ctx, then we do secret_provision_close(&ctx); see my other comment for the exact line of code.

And in this PR, I add the call to secret_provision_free_resources() in this function secret_provision_close(). So I shifted the old "resource freeing" code there.

are user-facing APIs of the Secret Prov library, and so we cannot modify them.

Why we cannot modify them?

Because this is a Secret Provisioning library that is used by users -- I know at least a couple users (including production systems) that use it. We will need some kind of a versioning/deprecation strategy. But this goes out of scope of this PR, I think. We can surely discuss this.

That said, I regret I haven't introduced a proper opaque object as an argument to all these functions (like the context). And the names could be better.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 280 at r3 (raw file):

        out_ctx->ssl = ctx.ssl;
    } else {
        secret_provision_close(&ctx);

FYI: @boryspoplawski , here we handle the case !out_ctx.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

Maybe mbedtls has some option to embed unix errno inside their errors?

No, I looked into this. Unfortunately, no such thing. We'll have to introduce such a function ourselves.

I'm actually deeply in this RA-TLS and Secret Provisioning code now, and I see that we mix error codes in many places there. This will need a proper PR to untangle, and a discussion beforehand -- do we want to always return mbedTLS error codes, or UNIX error codes? Can we mix them sometimes (but looks like not -- mbedTLS codes can overlap with UNIX ones, and the whole mbedTLS codebase is very accurate to use only mbedTLS codes)?

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

That said, I regret I haven't introduced a proper opaque object as an argument to all these functions (like the context). And the names could be better.

Wait, sorry, this was my mental lapse. We actually have the opaque struct ra_tls_ctx* context in every SecretProv function. So I could stash those global variables there, if needed. (This is actually how @liangintel did it in his original PR.)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 37 at r3 (raw file):

/* these are globals because the user may continue using the SSL session even after invoking
 * secret_provision_start() (in the user-supplied callback) */

Why can't they reside in the context? Can the user use them after the context was destroyed? If so, then this PR will break this case?
I mean, this comment seem to stand in the opposition of what this PR does. Am I missing something?

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 37 at r3 (raw file):

Why can't they reside in the context?

They could, I just didn't implement it at the time (and I didn't implement it in this PR as well, for the sake of minimality of changes).

Can the user use them after the context was destroyed?

No, the user can't use them after he calls secret_provision_close() -- at this point the context is destroyed (via calls like mbedtls_x509_crt_free(&g_my_ratls_cert)).

I mean, this comment seem to stand in the opposition of what this PR does.

No, why. The user calls secret_provision_start(), then does some secret_provision_read/secret_provision_write, and then -- when the user is done with secret provisioning -- closes the secret-provisioning session with secret_provision_close(). See the example: https://github.com/gramineproject/gramine/blob/master/CI-Examples/ra-tls-secret-prov/src/secret_prov_client.c

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @dimakuv)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 37 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why can't they reside in the context?

They could, I just didn't implement it at the time (and I didn't implement it in this PR as well, for the sake of minimality of changes).

Can the user use them after the context was destroyed?

No, the user can't use them after he calls secret_provision_close() -- at this point the context is destroyed (via calls like mbedtls_x509_crt_free(&g_my_ratls_cert)).

I mean, this comment seem to stand in the opposition of what this PR does.

No, why. The user calls secret_provision_start(), then does some secret_provision_read/secret_provision_write, and then -- when the user is done with secret provisioning -- closes the secret-provisioning session with secret_provision_close(). See the example: https://github.com/gramineproject/gramine/blob/master/CI-Examples/ra-tls-secret-prov/src/secret_prov_client.c

Ok, then I think we really should move them to the context if they die together with it. Right now this library is a weird singleton object with some of its fields taken out as globals, but still destroying them in its destructor.
It should also make this PR more logical, IMO (and the whole library).

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @liangintel)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

That said, I regret I haven't introduced a proper opaque object as an argument to all these functions (like the context). And the names could be better.

Wait, sorry, this was my mental lapse. We actually have the opaque struct ra_tls_ctx* context in every SecretProv function. So I could stash those global variables there, if needed. (This is actually how @liangintel did it in his original PR.)

Okay, but then why don't we call close on errors here?
Sorry, I still cannot wrap my head around this.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe mbedtls has some option to embed unix errno inside their errors?

No, I looked into this. Unfortunately, no such thing. We'll have to introduce such a function ourselves.

I'm actually deeply in this RA-TLS and Secret Provisioning code now, and I see that we mix error codes in many places there. This will need a proper PR to untangle, and a discussion beforehand -- do we want to always return mbedTLS error codes, or UNIX error codes? Can we mix them sometimes (but looks like not -- mbedTLS codes can overlap with UNIX ones, and the whole mbedTLS codebase is very accurate to use only mbedTLS codes)?

If they overlap then we cannot. Please add it to your TODO list or create an issue :)

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

a discussion (no related file):
FYI: I updated the PR significantly; it's best to compare against the base revision, not the last one.



Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov.h line 133 at r4 (raw file):

__attribute__ ((visibility("default")))
int secret_provision_start(const char* in_servers, const char* in_ca_chain_path,
                           struct ra_tls_ctx* out_ctx);

FYI: I came up with this wrong function signature. It should have been struct ra_tls_ctx** out_ctx, so that this function would allocate the context instead of stuffing the user-provided context with values.

Because of this blunder, I can't make struct ra_tls_ctx truly opaque... And I don't want to change this function's signature because it is public API.

Well, on the other hand, not a big deal, this Secret Prov library is anyhow just a reference implementation. Power users are supposed to write their own, ad-hoc Secret Prov flows, not blindly use this library.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 287 at r2 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Okay, but then why don't we call close on errors here?
Sorry, I still cannot wrap my head around this.

Done. The new logic should be more straight-forward.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 37 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ok, then I think we really should move them to the context if they die together with it. Right now this library is a weird singleton object with some of its fields taken out as globals, but still destroying them in its destructor.
It should also make this PR more logical, IMO (and the whole library).

Done. Now the global variables are replaced with these two sets of variables:

  • Local stack-allocated vars (e.g. ctr_drbg, entropy): they are used only inside the secret_provision_start() function and do not constitute the context of the TLS session after it is established;
  • Long-lived heap-allocated vars (ssl, net, conf): they can be used outside of the secret_provision_start() function and constitute the context of the TLS session.

Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 37 at r4 (raw file):

static uint8_t* g_provisioned_secret = NULL;
static size_t g_provisioned_secret_size = 0;

FYI: I'm a bit unhappy that the "provisioned secret" object is a global variable, and not part of the context of the TLS session. But we already have the public API secret_provision_get() and secret_provision_destroy() that don't have a context arg, so this has to stay global.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 62 at r4 (raw file):

extern int common_write(mbedtls_ssl_context* ssl, const uint8_t* buf, size_t size);
extern int common_read(mbedtls_ssl_context* ssl, uint8_t* buf, size_t size);
extern int common_close(mbedtls_ssl_context* ssl);

FYI: These externs are a bit ugly, but I didn't want to introduce a new header secret_prov_internal.h just for these three funcs. If anyone insists, I can put these declarations in the internal header.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 86 at r2 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

If they overlap then we cannot. Please add it to your TODO list or create an issue :)

Done. See item 3 in #654.

@dimakuv dimakuv force-pushed the dimakuv/secret-prov-fix-leak branch from f000e84 to 3649d7a Compare June 15, 2022 09:32
Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @liangintel, and @mkow)

a discussion (no related file):
I think it's the time we tackle the problem of backward compatibility here - we need some way of updating the code with braking changes. We probably can:
A) Not be compatible - the users will need to adjust
B) Be fully compatible (I believe this is the option atm) - I don't think we can keep up with this, at some point (even if not now) we will have to change stuff
C) Create another version of library? This would probably mean having two copies of the same code (because I would expect that some/most of the code would be the same). The older version would have some deprecation time (like 1-2 releases, 3 months, whatever)
or something else I didn't come up with.
I would like to decide it now (before we review this PR), because this might change how we proceed (e.g. option C would most likely mean introducing small bugfix in the old version and moving the rest of the changes to the new version)


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

I think it's the time we tackle the problem of backward compatibility here - we need some way of updating the code with braking changes. We probably can:
A) Not be compatible - the users will need to adjust
B) Be fully compatible (I believe this is the option atm) - I don't think we can keep up with this, at some point (even if not now) we will have to change stuff
C) Create another version of library? This would probably mean having two copies of the same code (because I would expect that some/most of the code would be the same). The older version would have some deprecation time (like 1-2 releases, 3 months, whatever)
or something else I didn't come up with.
I would like to decide it now (before we review this PR), because this might change how we proceed (e.g. option C would most likely mean introducing small bugfix in the old version and moving the rest of the changes to the new version)

I think we're still in a young and unstable phase of the project and we should go with A in cases like this one. To mitigate the problems we could help users adjust to the changes (by describing nicely all the needed changes in release notes under "breaking changes" and by directly helping them to adjust their codebase).


Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

I think we're still in a young and unstable phase of the project and we should go with A in cases like this one. To mitigate the problems we could help users adjust to the changes (by describing nicely all the needed changes in release notes under "breaking changes" and by directly helping them to adjust their codebase).

We had an offline discussion, and the agreement is like this:

  1. Use option A -- we don't have many users, so they can adjust,
  2. Mention this breaking change in the public API of the affected funcs in this PR description and in the commit message (for history).
  3. Describe this breaking change in the Release Notes of the next stable version of Gramine.

mbedtls_net_context* verifier_fd = malloc(sizeof(*verifier_fd));
mbedtls_ssl_config* conf = malloc(sizeof(*conf));
mbedtls_ssl_context* ssl = malloc(sizeof(*ssl));
if (!verifier_fd || !conf || !ssl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if verifier_fd was allocated successfully, but ssl failed to be allocated, then return -ENOMEM here will lead to memory leak.


static uint8_t* provisioned_secret = NULL;
static size_t provisioned_secret_size = 0;
static uint8_t* g_provisioned_secret = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global variables means cannot be used in multi-threads environment.

}

int secret_provision_close(struct ra_tls_ctx* ctx) {
mbedtls_ssl_context* ssl = (mbedtls_ssl_context*)ctx->ssl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret_provision_close may be called twice by user. Consider to add parameter check here? e,g,
if(!ctx)
if(!ssl)

free(ssl);
free(conf);
free(net);
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider to zero the variables before return.
ctx->net = 0;
ctx->ssl = 0;
...

@@ -16,17 +16,19 @@

#include "secret_prov.h"

int secret_provision_write(struct ra_tls_ctx* ctx, const uint8_t* buf, size_t size) {
int common_write(mbedtls_ssl_context* ssl, const uint8_t* buf, size_t size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's too common to use prefix common_ , may harm other function definition in Gramine modules
actually it's tls_

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 36 at r5 (raw file):

Previously, liangintel (Liang Fang) wrote…

global variables means cannot be used in multi-threads environment.

Yes, that's a problem.

And that's why we're discussing to modify the public API of this library to use the context. In particular, for secrets we'll have:

int secret_provision_get(struct ra_tls_ctx* ctx, uint8_t** out_secret, size_t* out_secret_size);
void secret_provision_destroy(struct ra_tls_ctx* ctx);

And the secret variable will be kept inside the context ctx. And it will be thread-safe then.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 75 at r5 (raw file):

Previously, liangintel (Liang Fang) wrote…

secret_provision_close may be called twice by user. Consider to add parameter check here? e,g,
if(!ctx)
if(!ssl)

Done.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 87 at r5 (raw file):

Previously, liangintel (Liang Fang) wrote…

consider to zero the variables before return.
ctx->net = 0;
ctx->ssl = 0;
...

Done.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_attest.c line 109 at r5 (raw file):

Previously, liangintel (Liang Fang) wrote…

if verifier_fd was allocated successfully, but ssl failed to be allocated, then return -ENOMEM here will lead to memory leak.

Done. Good catch.


Pal/src/host/Linux-SGX/tools/ra-tls/secret_prov_common.c line 19 at r5 (raw file):

Previously, liangintel (Liang Fang) wrote…

it's too common to use prefix common_ , may harm other function definition in Gramine modules
actually it's tls_

Done.

Copy link
Contributor

@liangintel liangintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r4, 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We had an offline discussion, and the agreement is like this:

  1. Use option A -- we don't have many users, so they can adjust,
  2. Mention this breaking change in the public API of the affected funcs in this PR description and in the commit message (for history).
  3. Describe this breaking change in the Release Notes of the next stable version of Gramine.

One of the users who is using the Secret Prov library told me this (paraphrasing): "Our code does not use the public functions of the SecretProv library. We have written our code based on the SecretProv code, but we didn't use this library directly."

So we are good to use option A and modify the public API of SecretProv as we see fit.

I will submit a new fixup commit that tweaks the public-function signatures for SecretProv tomorrow, and we can continue this review.


Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @liangintel, and @mkow)

a discussion (no related file):
A reminder to myself: I need to revive this PR as soon as we merge #796 (which should happen today/tomorrow).


@dimakuv
Copy link
Contributor Author

dimakuv commented Aug 2, 2022

I created another PR #813 that supersedes this one. I tried to address all outstanding comments in that new PR. Closing this one.

@dimakuv dimakuv closed this Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants