Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mTLS #470

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

chrisstaite-menlo
Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo commented Dec 12, 2023

Description

There is a need to secure the Action Cache for production builds which is not possible with out authentication of the store. This can be provided by mTLS and the ability to make the action cache read only for some servers.

Fixes #467

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Have a test system made of a storage, scheduler, build workers and Reclient consumers all utilising mTLS.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @chrisstaite-menlo)

a discussion (no related file):
@chrisstaite-menlo what is the dependency here between tls and readonly store? I think it would be better if we teased apart this into two different pull requests. Thoughts @aaronmondal @allada


Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 19 files at r1.
Reviewable status: 3 of 19 files reviewed, 1 unresolved discussion (waiting on @aaronmondal, @adam-singer, and @allada)

a discussion (no related file):

Previously, adam-singer (Adam Singer) wrote…

@chrisstaite-menlo what is the dependency here between tls and readonly store? I think it would be better if we teased apart this into two different pull requests. Thoughts @aaronmondal @allada

It's all for the same purpose, to protect the action cache from being polluted from outside of the build network. Without it a user can simply inject results into the action cache.


Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 19 files reviewed, 1 unresolved discussion (waiting on @aaronmondal, @adam-singer, and @allada)

a discussion (no related file):

Previously, chrisstaite-menlo (Chris Staite) wrote…

It's all for the same purpose, to protect the action cache from being polluted from outside of the build network. Without it a user can simply inject results into the action cache.

I do agree we should do this in two PRs, but this PR is not all that big, so I'm not too strict on it in this particular case case.

I'll leave it to @adam-singer for the final call though.


Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 19 files reviewed, 1 unresolved discussion (waiting on @aaronmondal and @allada)

a discussion (no related file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I do agree we should do this in two PRs, but this PR is not all that big, so I'm not too strict on it in this particular case case.

I'll leave it to @adam-singer for the final call though.

sg, we can continue to review on this one


Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the hold up on this Chris. We were in the middle of a sprint to release push and didn't want to land new features. Now that we have it all out, internally we are in stabilization mode for a little bit. I think it's completely fine for external contribution features to be excluded from our "stabilization push".

It turns out we also need this for some other work we are doing internally, so we really appreciate this work 😄

Reviewed 12 of 19 files at r1, all commit messages.
Dismissed @adam-singer from a discussion.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-20.04, Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), pre-commit-checks, ubuntu-20.04, ubuntu-22.04, windows-2022


nativelink-config/src/stores.rs line 483 at r1 (raw file):

}

impl TryFrom<&ClientTlsConfig> for tonic::transport::ClientTlsConfig {

nit: I'd rather keep everything in nativelink-config pure and near zero code (with the exception of macros).

Can we make a function that does the conversion at the place it's used instead? I suspect this function will only ever be called once in a single place, so putting it where it's called seems logical to me.


nativelink-scheduler/src/grpc_scheduler.rs line 80 at r1 (raw file):

        let endpoint = Uri::try_from(&config.endpoint)
            .map_err(|e| make_err!(Code::Internal, "Unable to parse endpoint {}: {:?}", config.endpoint, e))?;

nit: We can now use the syntax:

.map_err(|e| make_err!(Code::Internal, "Unable to parse endpoint {config.endpoint}: {e:?}"))?;

(same below)


nativelink-store/src/grpc_store.rs line 104 at r1 (raw file):

                .map_err(|e| make_input_err!("Invalid URI for worker endpoint : {:?}", e))?;

            let endpoint_transport = if let Some(tls_config) = &tls_config {

nit: Lets abstract this into a helper file, since there's a few duplicates.


nativelink-worker/src/local_worker.rs line 399 at r1 (raw file):

                    .connect_timeout(timeout_duration)
                    .timeout(timeout_duration);
                let endpoint = if let Some(tls_config) = tls_config {

nit: ditto.


src/bin/cas.rs line 52 at r1 (raw file):

use tokio::net::TcpListener;
use tokio::task::spawn_blocking;
use tokio_rustls::rustls::pki_types::PrivatePkcs1KeyDer;

nit: Combine this and one below.


src/bin/cas.rs line 54 at r1 (raw file):

use tokio_rustls::rustls::pki_types::PrivatePkcs1KeyDer;
use tokio_rustls::rustls::pki_types::PrivateSec1KeyDer;
use tokio_rustls::rustls::RootCertStore;

nit: Add to below block.


src/bin/cas.rs line 446 at r1 (raw file):

        // Configure our TLS acceptor if we have TLS configured.
        let maybe_tls_acceptor = server_cfg.tls.map_or(Ok(None), |tls_config| {

nit: I think it might now be worth moving all this into the same helper file (referenced above).

This main file is getting a bit bloated, so I think from now on we need to start moving data to helpers.

I think there's enough TLS functions to make a nice useful TLS helper file.


src/bin/cas.rs line 489 at r1 (raw file):

                    client_auth_roots
                        .add(cert)
                        .map_err(|e| make_err!(Code::Internal, "Could not read client CA: {:?}", e))?;

nit: ditto


src/bin/cas.rs line 494 at r1 (raw file):

                    let mut crl_reader = std::io::BufReader::new(
                        std::fs::File::open(client_crl_file)
                            .err_tip(|| format!("Could not open CRL file {}", client_crl_file))?,

nit: ditto


src/bin/cas.rs line 497 at r1 (raw file):

                    );
                    crls(&mut crl_reader)
                        .err_tip(|| format!("Could not extract CRLs from file {}", client_crl_file))?

nit: ditto


src/bin/cas.rs line 507 at r1 (raw file):

                    .with_crls(crls)
                    .build()
                    .map_err(|e| make_err!(Code::Internal, "Could not create WebPkiClientVerifier: {:?}", e))?

nit: ditto

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Chris. Would you like one of us to pickup this PR?

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-20.04, Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), pre-commit-checks, ubuntu-20.04, ubuntu-22.04, windows-2022

@chrisstaite-menlo
Copy link
Collaborator Author

I'm happy to do it, I was just waiting for the go-ahead that you'd finished the major rework in the stabilisation. I assume that's now the case and I'll update this.

Copy link

vercel bot commented Jan 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nativelink-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 17, 2024 9:03am

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 19 files at r1.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Bazel Dev / ubuntu-22.04, Local / ubuntu-22.04, Vercel, docker-compose-compiles-nativelink (20.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 23 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, Great job!
:lgtm:

Reviewed 1 of 19 files at r1, 14 of 23 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained


nativelink-worker/src/local_worker.rs line 384 at r5 (raw file):

                let timeout_duration = Duration::from_secs_f32(timeout);

                let tls_config = tls_utils::load_client_config(&config.worker_api_endpoint.tls_config)?;

nit: Lets .err_tip() this.

There is a need to secure the Action Cache for production builds which is
not possible with out authentication of the store.  This can be provided
by mTLS and the ability to make the action cache read only for some servers.
Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-singer just waiting on you to resolve so we can merge.

Reviewed 3 of 23 files at r2.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Local / ubuntu-22.04, Vercel, asan / ubuntu-22.04, pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), publish-image, ubuntu-20.04, windows-2022 / stable

@chrisstaite-menlo chrisstaite-menlo merged commit 6a379b3 into TraceMachina:main Jan 17, 2024
22 checks passed
@chrisstaite-menlo chrisstaite-menlo deleted the mtls_support branch January 17, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for mTLS
3 participants