Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gcp_auth for GCP authentication #5939

Merged
merged 25 commits into from
Dec 16, 2022

Conversation

Corwinpro
Copy link
Contributor

@Corwinpro Corwinpro commented Sep 21, 2022

Motivation for features / changes

See #5934

At the moment, there are 2 types of Credentials:

  • Anonymous (default),
  • RefreshToken (effectively, loaded from a file with credentials).

In GKE service managed account, none of those would work if we want to access non-public data. Accessing public data works via the Anonymous access. We cannot use file-based credentials management for security reasons.

This strongly limits the usage of the tensorboard --load-fast=true application on GKE / k8s deployed instances. Enabling the rustboard backend to be compatible with GKE service managed accounts will positively affect the adoption and operations of the tensorboard tool.

Technical description of changes

In this PR, the existing code for GCP authentication (via Credentials) is replaced with gcp_auth. This requires a new dependency: gcp_auth. That cargo manages pretty much everything related to GCP authentication

Screenshots of UI changes

None

Detailed steps to verify changes work correctly (as executed by you)

We built the rustboard from the last commit. We replaced the existing executable for rustboard that comes as part of the tensorboard package with the custom build. We executed the tensorboard command from the GKE machine and confirmed that the access is granted as expected (service account authentication was successful).

Alternate designs / implementations considered

Happy to hear about that.

@bmd3k bmd3k requested review from arcra and rileyajones September 26, 2022 11:22
@bmd3k bmd3k added core:rustboard //tensorboard/data/server/... rust Pull requests that update Rust code labels Sep 26, 2022
@Corwinpro
Copy link
Contributor Author

Sorry for the delay. I will provide more context and questions in the next few days.

@Corwinpro Corwinpro changed the title [WIP] GKE auth for service account [RFC] GKE auth for service account Sep 28, 2022
@Corwinpro Corwinpro marked this pull request as ready for review September 28, 2022 13:45
@bmd3k bmd3k requested review from groszewn and removed request for arcra September 29, 2022 14:44
@rileyajones
Copy link
Contributor

You mention in your PR description that you think the existing auth implementation could be replaced by gcp_auth, I'm unfamiliar with it, how much work do you think that would be?

@Corwinpro
Copy link
Contributor Author

You mention in your PR description that you think the existing auth implementation could be replaced by gcp_auth, I'm unfamiliar with it, how much work do you think that would be?

Hi, I would estimate it to several days / a week. A day to understand what gcp_auth can do, a couple of days (maybe less for a person more experienced with rustboard) to document what can / cannot be replaced, a couple of days to actually migrate to gcp_auth.

@groszewn
Copy link
Contributor

groszewn commented Oct 3, 2022

Hey @Corwinpro! Really appreciate your contribution here.

A day to understand what gcp_auth can do, a couple of days (maybe less for a person more experienced with rustboard) to document what can / cannot be replaced,

If we're pulling in the gcp_auth crate as a dependency, we're already going to need to do this during our review.

From a cursory glance at the gcp_auth docs it seems like it supports reading credentials from disk, so our custom implementation could be made obsolete. Would you be willing to try removing the current custom auth mechanism and replacing it with gcp_auth?

@Corwinpro
Copy link
Contributor Author

Corwinpro commented Oct 3, 2022

Hi @groszewn,

Thank you! I noticed that apparently gcp_auth does not support Windows. It thus appears to me that we won't be able to just replace the existing implementation. For the GKE use case however it still seems to be a valueable update.

@rileyajones
Copy link
Contributor

@Corwinpro Rustboard does not support Windows so that shouldn't be an issue.

Copy link
Contributor

@groszewn groszewn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the work done on this already!

Left a few notes. Additionally, ensure you're following steps 5 and on from Adding or updating third-party dependencies to ensure the third_party folder is updated as well.

tensorboard/data/server/Cargo.toml Show resolved Hide resolved
tensorboard/data/server/Cargo.toml Outdated Show resolved Hide resolved
tensorboard/data/server/cargo/BUILD.bazel Show resolved Hide resolved
tensorboard/data/server/Cargo.lock Show resolved Hide resolved
tensorboard/data/server/gcs/auth.rs Outdated Show resolved Hide resolved
tensorboard/data/server/gcs/auth.rs Outdated Show resolved Hide resolved
tensorboard/data/server/cargo/BUILD.bazel Show resolved Hide resolved
tensorboard/data/server/Cargo.lock Show resolved Hide resolved
@groszewn
Copy link
Contributor

Hey @Corwinpro! Thanks again for working with us through this.

We do believe the GOOGLE_APPLICATION_CREDENTIALS implementation is a regression, but it should be tolerable. For reference, gcloud credentials are called out as supported for this environment variable here.

Since usage of gcloud credentials is still covered by the final auth path in gcp_auth (i.e. creds from default file location), could you just document the workaround in your PR. That is, something like:

gcp_auth's implementation of authentication when specifying GOOGLE_APPLICATION_CREDENTIALS only supports service account based authentication. If you are attempting to provide OAuth refresh tokens or other gcloud credential based authentication mechanisms and running into issues (e.g. unable to initialize authentication manager: CustomServiceAccountCredentials(Error("missing field private_key"))), you should unset the GOOGLE_APPLICATION_CREDENTIALS environment variable, as the final authentication mechanism in gcp_auth will look for credentials in the standard file location (i.e. ~/.config/gcloud/application_default_credentials.json) and properly handle authentication.

@Corwinpro
Copy link
Contributor Author

Corwinpro commented Nov 28, 2022

Thank you very much @groszewn , I really appreciate it.

I pushed the change to add a comment about the regression.

With regards to the cargo raze,

You'll need to check in the updates to third_party/rust/... from cargo raze (Step 6).

I have not worked with this build system before (and cargo raze) so apologies if I misunderstood something. I have ~200 (!!!) changes in the third_party/rust and I am not sure if that's expected. I pushed those changes as one commit (d8a525d). Please let me know if it does not look right.

Thank you!

@Corwinpro Corwinpro changed the title [RFC] GKE auth for service account Use gcp_auth for GCP authentication Dec 1, 2022
@groszewn
Copy link
Contributor

groszewn commented Dec 5, 2022

Hey @Corwinpro, apologies for the delay. Thanks for the updates!

Yes, the number of files for cargo raze is as expected, thanks for splitting those out into a separate commit!

@groszewn
Copy link
Contributor

groszewn commented Dec 5, 2022

I've opened #6089 to deal with the current data server build failures in the CI workflow that are showing up.

@groszewn
Copy link
Contributor

groszewn commented Dec 5, 2022

You'll want to pull in the changes from #6089 to try and get your CI build passing now that it's merged.

@Corwinpro
Copy link
Contributor Author

Should this be updated to fix CI?

rust_version: ['1.58.1']

@groszewn
Copy link
Contributor

groszewn commented Dec 5, 2022

Sorry about that, created #6091 to resolve that.

@groszewn
Copy link
Contributor

groszewn commented Dec 6, 2022

You can go ahead and pull the latest changes into your PR branch again, the rust version for CI lint has been updated.

@groszewn
Copy link
Contributor

groszewn commented Dec 6, 2022

Sorry for the churn here, go/tbpr/6093 updates to use the latest cargo_raze crate for CI.

@rileyajones
Copy link
Contributor

It looks like this is going to require a rebase.

@Corwinpro
Copy link
Contributor Author

It is very confusing but I get an error from the current master when I run bazel build //tensorboard/data/server/cargo:rand:

ERROR: /private/var/tmp/_bazel_petr/9a5ec17de0507b3382077672a1f36f39/external/raze__getrandom__0_1_15/BUILD.bazel:38:13: Compiling Rust rlib getrandom v0.1.15 (25 files) failed: (Exit 1): process_wrapper failed: error executing command bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/rules_rust/util/process_wrapper/process_wrapper --subst 'pwd=${pwd}' -- external/rust_darwin_aarch64/bin/rustc ... (remaining 21 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
error[E0432]: unresolved import `libc`
  --> external/raze__getrandom__0_1_15/src/util_libc.rs:22:13
   |
22 |         use libc::__error as errno_location;
   |             ^^^^ use of undeclared crate or module `libc`

(and more similar to that). Same on this branch.

@Corwinpro
Copy link
Contributor Author

Also, I think we need to update this as well:

rust_repositories(version = "1.58.1")

@groszewn
Copy link
Contributor

@Corwinpro thanks for catching that, I've created #6106 to update rust.bzl.

I'm not seeing any issues on my end when running bazel build //tensorboard/data/server/cargo:rand (at HEAD, from your PR HEAD, on #6106 HEAD, or adding #6106 on top of your current PR), but that's on a linux machine. Looks like your build is targeting Darwin, but the macos-11 workflow is passing. I'll see if I can dig around some more at that.

@Corwinpro
Copy link
Contributor Author

I'll see if I can dig around some more at that.

Thanks, I think it might be a problem on my side. Do you think there is more that we need to address in this PR?

@groszewn
Copy link
Contributor

@Corwinpro No, I think this looks great! Really appreciate you working with us to land this PR, thank you!

Copy link
Contributor

@groszewn groszewn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@groszewn groszewn merged commit b6b4ecd into tensorflow:master Dec 16, 2022
qihach64 pushed a commit to qihach64/tensorboard that referenced this pull request Dec 19, 2022
## Motivation for features / changes

See tensorflow#5934 

At the moment, there are 2 types of `Credentials`:

- `Anonymous` (default),
- `RefreshToken` (effectively, loaded from a file with credentials).

In GKE service managed account, none of those would work if we want to
access non-public data. Accessing public data works via the `Anonymous`
access. We cannot use file-based credentials management for security
reasons.

This strongly limits the usage of the `tensorboard --load-fast=true`
application on GKE / k8s deployed instances. Enabling the `rustboard`
backend to be compatible with GKE service managed accounts will
positively affect the adoption and operations of the `tensorboard` tool.

## Technical description of changes

In this PR, the existing code for GCP authentication (via `Credentials`)
is replaced with `gcp_auth`. This requires a new dependency:
[`gcp_auth`](https://docs.rs/gcp_auth/0.7.3/gcp_auth/). That cargo
manages pretty much everything related to GCP authentication

##  Screenshots of UI changes

None

## Detailed steps to verify changes work correctly (as executed by you)

We built the `rustboard` from the last commit. We replaced the existing
executable for `rustboard` that comes as part of the `tensorboard`
package with the custom build. We executed the `tensorboard` command
from the GKE machine and confirmed that the access is granted as
expected (service account authentication was successful).

## Alternate designs / implementations considered

Happy to hear about that.
yatbear pushed a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023
## Motivation for features / changes

See tensorflow#5934 

At the moment, there are 2 types of `Credentials`:

- `Anonymous` (default),
- `RefreshToken` (effectively, loaded from a file with credentials).

In GKE service managed account, none of those would work if we want to
access non-public data. Accessing public data works via the `Anonymous`
access. We cannot use file-based credentials management for security
reasons.

This strongly limits the usage of the `tensorboard --load-fast=true`
application on GKE / k8s deployed instances. Enabling the `rustboard`
backend to be compatible with GKE service managed accounts will
positively affect the adoption and operations of the `tensorboard` tool.

## Technical description of changes

In this PR, the existing code for GCP authentication (via `Credentials`)
is replaced with `gcp_auth`. This requires a new dependency:
[`gcp_auth`](https://docs.rs/gcp_auth/0.7.3/gcp_auth/). That cargo
manages pretty much everything related to GCP authentication

##  Screenshots of UI changes

None

## Detailed steps to verify changes work correctly (as executed by you)

We built the `rustboard` from the last commit. We replaced the existing
executable for `rustboard` that comes as part of the `tensorboard`
package with the custom build. We executed the `tensorboard` command
from the GKE machine and confirmed that the access is granted as
expected (service account authentication was successful).

## Alternate designs / implementations considered

Happy to hear about that.
dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023
## Motivation for features / changes

See tensorflow#5934 

At the moment, there are 2 types of `Credentials`:

- `Anonymous` (default),
- `RefreshToken` (effectively, loaded from a file with credentials).

In GKE service managed account, none of those would work if we want to
access non-public data. Accessing public data works via the `Anonymous`
access. We cannot use file-based credentials management for security
reasons.

This strongly limits the usage of the `tensorboard --load-fast=true`
application on GKE / k8s deployed instances. Enabling the `rustboard`
backend to be compatible with GKE service managed accounts will
positively affect the adoption and operations of the `tensorboard` tool.

## Technical description of changes

In this PR, the existing code for GCP authentication (via `Credentials`)
is replaced with `gcp_auth`. This requires a new dependency:
[`gcp_auth`](https://docs.rs/gcp_auth/0.7.3/gcp_auth/). That cargo
manages pretty much everything related to GCP authentication

##  Screenshots of UI changes

None

## Detailed steps to verify changes work correctly (as executed by you)

We built the `rustboard` from the last commit. We replaced the existing
executable for `rustboard` that comes as part of the `tensorboard`
package with the custom build. We executed the `tensorboard` command
from the GKE machine and confirmed that the access is granted as
expected (service account authentication was successful).

## Alternate designs / implementations considered

Happy to hear about that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:rustboard //tensorboard/data/server/... rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants