Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate k8s-triage, k8s-metrics to terraform, setup k8s-triage dataset #2461

Merged
merged 6 commits into from
Aug 12, 2021

Conversation

spiffxp
Copy link
Member

@spiffxp spiffxp commented Aug 4, 2021

Related:

Since converting go.k8s.io/triage to triage.k8s.io is going to be more involved than I thought, setup to migrate gs://k8s-gubernator/triage to gs://k8s-triage. The bucket is currently populated with whatever it contained a while ago (which is basically as fresh as it's going to get since kettle has been down this whole time)

The dependency test-infra PR sets up a job that runs in k8s-infra-prow-build-trusted to update gs://k8s-triage

While I initially did all the provisioning for this via ensure-main-project.sh, I have since updated the PR to do all of this via terraform. I migrated all of the setup code for gs://k8s-metrics while doing so.

This also adds a bigquery dataset kubernetes-public:k8s-triage for the ci-test-infra-triage-canary job to be able to write to. A future test-infra PR will configure the job to write to it instead of currently failing to write to k8s-gubernator:temp.triage

In setting up the dataset, I'm trying out use of IAM roles instead of basic access (ref: https://cloud.google.com/bigquery/docs/access-control-basic-roles)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 4, 2021
@k8s-ci-robot k8s-ci-robot requested review from aojea and thockin August 4, 2021 18:21
@k8s-ci-robot k8s-ci-robot added area/bash Bash scripts, testing them, writing less of them, code in infra/gcp/ approved Indicates a PR has been approved by an approver from all required OWNERS files. area/infra Infrastructure management, infrastructure design, code in infra/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. wg/k8s-infra size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 4, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 4, 2021

Ran ./infra/gcp/ensure-main-project.sh and manually removed gs://k8s-project-triage

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spiffxp spiffxp changed the title [wip] migrate go.k8s.io/triage to gs://k8s-triage Migrate k8s-triage, k8s-metrics to terraform, setup k8s-triage dataset Aug 12, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 12, 2021

/assign @ameukam
/hold
I'm going to deploy this now but anticipate I may need to edit the terraform state slightly, so don't want to merge until I've confirmed everything is as expected.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2021
Comment on lines +2 to +5
This file defines:
- bigquery dataset for triage to store temp results
- GCS bucket to serve go.k8s.io/triage results
- IAM bindings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasta comments need updating

I also think we need to put boilerplate license in our tf files...

@spiffxp
Copy link
Member Author

spiffxp commented Aug 12, 2021

Out of excessive paranoia I'm backing up the buckets first:

gsutil mb -p kubernetes-public gs://k8s-triage-bak
gsutil mb -p kubernetes-public gs://k8s-metrics-bak
gsutil -m rsync -r gs://k8s-triage gs://k8s-triage-bak # Operation completed over 1.5k objects/7.6 GiB.
gsutil -m rsync -r gs://k8s-metrics gs://k8s-metrics-bak # Operation completed over 7.7k objects/5.1 GiB.

@ameukam
Copy link
Member

ameukam commented Aug 12, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2021
The buckets were created as STANDARD storage class in US, so let's
not delete them unless we really need to.

The bigquery dataset can't have `-` in its title

The bigquery iam policy binding needs to include project in its fields,
apparently it won't default to the project the dataset is in
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 12, 2021

In terms of manual stuff I had to do to get this to work, this was basically it...

terraform import google_storage_bucket.triage_bucket kubernetes-public/k8s-triage
terraform import google_storage_bucket.metrics_bucket kubernetes-public/k8s-metrics

The rest was terraform apply, read error, do what it said, repeat...

It's maybe worth noting that after my first successful terraform apply, I ran again and saw the following

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # google_bigquery_dataset.triage_dataset has been changed
  ~ resource "google_bigquery_dataset" "triage_dataset" {
      ~ etag                            = "ObAE6ASzv6jJuAl9fPwfqg==" -> "cyljn5Sy1j+fDr/MKBXOag=="
        id                              = "projects/kubernetes-public/datasets/k8s_triage"
      ~ last_modified_time              = 1628805757044 -> 1628806369623
        # (10 unchanged attributes hidden)

      - access {
          - role          = "OWNER" -> null
          - user_by_email = "[email protected]" -> null
        }
      - access {
          - role          = "OWNER" -> null
          - special_group = "projectOwners" -> null
        }
      - access {
          - role          = "READER" -> null
          - special_group = "projectReaders" -> null
        }
      + access {
          + role          = "WRITER"
          + user_by_email = "[email protected]"
        }
      - access {
          - role          = "WRITER" -> null
          - special_group = "projectWriters" -> null
        }
      + access {
          + group_by_email = "[email protected]"
          + role           = "OWNER"
        }
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

─────────────────────────────────────────────────────────────

No changes. Your infrastructure matches the configuration.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 12, 2021

/hold cancel
Changes in here have been deployed

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 12, 2021

Some manual cleanup...

gsutil rm -r gs://k8s-project-triage # what existed prior to this PR

@ameukam
Copy link
Member

ameukam commented Aug 12, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4ea13f5 into kubernetes:main Aug 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 12, 2021
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spotted 2 typos. Nothing serious!

resource "google_bigquery_dataset" "triage_dataset" {
dataset_id = "k8s_triage"
project = data.google_project.project.project_id
description = "Dataset for kubernetes/test-infra/triage to store temprorary results"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge nit:

Suggested change
description = "Dataset for kubernetes/test-infra/triage to store temprorary results"
description = "Dataset for kubernetes/test-infra/triage to store temporary results"

]
role = "roles/storage.admin"
}
// Preserve legacy storage bindings, give storage.admim members legacy bucket owner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge nit:

Suggested change
// Preserve legacy storage bindings, give storage.admim members legacy bucket owner
// Preserve legacy storage bindings, give storage.admin members legacy bucket owner

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

This broke https://testgrid.k8s.io/wg-k8s-infra-prow#metrics-bigquery

#2548 should address

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apps Application management, code in apps/ area/bash Bash scripts, testing them, writing less of them, code in infra/gcp/ area/infra Infrastructure management, infrastructure design, code in infra/ area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/contributor-experience Categorizes an issue or PR as relevant to SIG Contributor Experience. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants