Skip to content

Add initial terraform manifests for monitoring#1877

Closed
rikatz wants to merge 2 commits intokubernetes:mainfrom
rikatz:monitoring
Closed

Add initial terraform manifests for monitoring#1877
rikatz wants to merge 2 commits intokubernetes:mainfrom
rikatz:monitoring

Conversation

@rikatz
Copy link
Member

@rikatz rikatz commented Apr 6, 2021

Add initial manifest for gitops monitoring.

Some points of attention for this PR:

  • This is not repeatable. Although I think adding too much modules for terraform can turn things more complicated, maybe we can define how and what we monitor in URLs. Assuming we always use SSL/TLS we could create a module and have this simpler in a future, like:
module "thockin_test1_monitoring" {
  source = "../../../modules/monitoring"
  project_name       = "kubernetes-public"
  url       = "thockin-test1.k8s.io"
  cert_expiration_threshold = 15
  // Channel needs to be manually created in GCP interface with the following display name
  channel_notification = "Kubernetes.io Cert Alert"
}
  • We should probably want to define the structure for this. I'm proposing inside infra/monitoring just because got no better idea :D
  • We can add more parameters, and more notification channel/options (like, we can add a list of emails that can turn into Email Notification Channel on GCP)
  • I'm using all the GCP locations, but getting only metrics from usa-iowa as valid to cert_expiration. This can be changed later as well, depending on what we want to do (like, properly monitoring and alerting in case of some downtime/increased latency)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 6, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and nikhita April 6, 2021 19:32
@k8s-ci-robot k8s-ci-robot added wg/k8s-infra size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 6, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rikatz
To complete the pull request process, please assign spiffxp after the PR has been reviewed.
You can assign the PR to them by writing /assign @spiffxp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

*/

terraform {
required_version = "~> 0.14.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still use the 0.13 version

// Manual step: Create a StackDriver alert channel pointing to a channel in Slack
// It will select the channel here by its display name
data "google_monitoring_notification_channel" "alertchannel" {
display_name = "Kubernetes.io Cert Alert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the existing channel : #k8s-infra-alerts. Can you mention the type ?

@rikatz
Copy link
Member Author

rikatz commented Apr 7, 2021

@ameukam thanks!

I've changed the channel searcher to use the channel name (label) and the type slack, and did changed the terraform to >= 0.13.0 :)

@rikatz
Copy link
Member Author

rikatz commented May 12, 2021

@ameukam anything else on this PR? I wanted to check if monitoring is working :)

terraform {

backend "gcs" {
bucket = "k8s-infra-clusters-terraform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikatz the terraform states are now in different buckets : #1952
Feel free to add a new one.

required_providers {
google = {
source = "hashicorp/google"
version = "~> 3.63.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = "~> 3.63.0"
version = "~> 3.66.0"

*/

terraform {
required_version = ">= 0.13.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required_version = ">= 0.13.0"
required_version = ">= 0.14.0"

See: #2019
Sorry for make you revert it. :-D

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a rebase / update I think now, should be:

Suggested change
required_version = ">= 0.13.0"
required_version = "~> 1.0.0"

@ameukam
Copy link
Member

ameukam commented May 12, 2021

@rikatz I dropped some comments.

@spiffxp
Copy link
Contributor

spiffxp commented May 19, 2021

I have a meta-level nit which is that right now I think @ameukam has preferred to keep all of our terraform stuff in infra/gcp/clusters

I feel like we need to chat a bit about what we want organization of files to look like, but until then I think having them all homed under one directory (even if its name is misleading) makes more sense than allowing sprawl

So move to something like infra/gcp/clusters/projects/kubernetes-public/monitoring

@spiffxp
Copy link
Contributor

spiffxp commented May 19, 2021

I'm ok with merge and iterate if we immediately follow up with that move

@ameukam
Copy link
Member

ameukam commented Jun 15, 2021

@rikatz What do you think about closing this PR now the domains are gone : #1838 ?

@rikatz
Copy link
Member Author

rikatz commented Jun 15, 2021

Let's do this :)
/close

@k8s-ci-robot
Copy link
Contributor

@rikatz: Closed this PR.

Details

In response to this:

Let's do this :)
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants