Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented May 31, 2024

The PromQL is:

  • If we know IPsec is enabled, we're exposed, or...
  • If we know IPsec is not enabled, we're not exposed, or...
  • If are OVN, but aren't sure if IPsec is enabled (e.g. ovnkube_master_ipsec_enabled scraping is failing), we might be exposed. -1 will cause an evaluation failure. Or...
  • If we know we are not OVN, we are not exposed, or...
  • If we aren't sure we're on OVN (e.g. apiserver_storage_objects scraping is failing), we might be exposed. Returning no time series will cause an evaluation failure.

The label_replace makes it easier to tell when we're in the "we know IPsec is not enabled" case.

The max_over_time avoids hiccuping if metrics are interrupted; see 5e4480f (#4763).

I'm also adding _id="" to the queries as a pattern to support HyperShift and other systems that could query the cluster's data out of a PromQL engine that stored data for multiple clusters. More context in 5cb2e93 (#3591).

Generated by creating the 4.14.0 file by hand and copying it out to the other 4.14 releases with:

$ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=candidate-4.14&arch=amd64' | jq -r '.nodes[] | .version' | grep '^4[.]14[.]' | grep -v '^4[.]14[.]0$' | while read VERSION; do sed "s/4.14.0/${VERSION}/" blocked-edges/4.14.0-OVNInterConnectTransitionIPsec.yaml > "blocked-edges/${VERSION}-OVNInterConnectTransitionIPsec.yaml"; done

The PromQL is:

* If we know IPsec is enabled, we're exposed, or...
* If we know IPsec is not enabled, we're not exposed, or...
* If are OVN, but aren't sure if IPsec is enabled
  (e.g. ovnkube_master_ipsec_enabled scraping is failing), we might be
  exposed.  -1 will cause an evaluation failure [1].  Or...
* If we know we are not OVN, we are not exposed, or...
* If we aren't sure we're on OVN (e.g. apiserver_storage_objects
  scraping is failing), we might be exposed.  Returning no time series
  will cause an evaluation failure.

The label_replace makes it easier to tell when we're in the "we know
IPsec is not enabled" case.

The max_over_time avoids hiccuping if metrics are interrupted; see
5e4480f (blocked-edges/4.14.*-AzureRegistryImagePreservation: Look
for active registry use, 2024-02-09, openshift#4763).

I'm also adding _id="" to the queries as a pattern to support
HyperShift and other systems that could query the cluster's data out
of a PromQL engine that stored data for multiple clusters.  More
context in 5cb2e93 (blocked-edges/4.11.*-KeepalivedMulticastSkew:
Explicit _id="", 2023-05-09, openshift#3591).

Generated by creating the 4.14.0 file by hand and copying it out to
the other 4.14 releases with:

  $ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=candidate-4.14&arch=amd64' | jq -r '.nodes[] | .version' | grep '^4[.]14[.]' | grep -v '^4[.]14[.]0$' | while read VERSION; do sed "s/4.14.0/${VERSION}/" blocked-edges/4.14.0-OVNInterConnectTransitionIPsec.yaml > "blocked-edges/${VERSION}-OVNInterConnectTransitionIPsec.yaml"; done

[1]: https://github.com/openshift/enhancements/blob/4668a0825c59739dfafd2ae661c16cf30f540946/enhancements/update/targeted-update-edge-blocking.md?plain=1#L119
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2024
@wking
Copy link
Member Author

wking commented May 31, 2024

I dunno if we run IPsec-enabled CI, but looking at 4.13.0-0.nightly-2024-05-31-112506 runs in PromeCIeus, e2e-aws-sdn correctly shows the no-label "not OVN":

Screenshot 2024-05-31 11 31 03

while e2e-aws-ovn has a brief moment during install when it isn't OVN yet, and then it transitions to knowing IPsec is disabled:

Screenshot 2024-05-31 11 31 09

@wking wking changed the title blocked-edges/4.14.*-OVNInterConnectTransitionIPsec: Declare risk SDN-4871: blocked-edges/4.14.*-OVNInterConnectTransitionIPsec: Declare risk May 31, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 31, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 31, 2024

@wking: This pull request references SDN-4871 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

Details

In response to this:

The PromQL is:

  • If we know IPsec is enabled, we're exposed, or...
  • If we know IPsec is not enabled, we're not exposed, or...
  • If are OVN, but aren't sure if IPsec is enabled (e.g. ovnkube_master_ipsec_enabled scraping is failing), we might be exposed. -1 will cause an evaluation failure. Or...
  • If we know we are not OVN, we are not exposed, or...
  • If we aren't sure we're on OVN (e.g. apiserver_storage_objects scraping is failing), we might be exposed. Returning no time series will cause an evaluation failure.

The label_replace makes it easier to tell when we're in the "we know IPsec is not enabled" case.

The max_over_time avoids hiccuping if metrics are interrupted; see 5e4480f (#4763).

I'm also adding _id="" to the queries as a pattern to support HyperShift and other systems that could query the cluster's data out of a PromQL engine that stored data for multiple clusters. More context in 5cb2e93 (#3591).

Generated by creating the 4.14.0 file by hand and copying it out to the other 4.14 releases with:

$ curl -s 'https://api.openshift.com/api/upgrades_info/graph?channel=candidate-4.14&arch=amd64' | jq -r '.nodes[] | .version' | grep '^4[.]14[.]' | grep -v '^4[.]14[.]0$' | while read VERSION; do sed "s/4.14.0/${VERSION}/" blocked-edges/4.14.0-OVNInterConnectTransitionIPsec.yaml > "blocked-edges/${VERSION}-OVNInterConnectTransitionIPsec.yaml"; done

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@PratikMahajan PratikMahajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: PratikMahajan, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [PratikMahajan,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit efbb45d into openshift:master May 31, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2024

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking wking deleted the OVNInterConnectTransitionIPsec branch May 31, 2024 19:33
wking added a commit to wking/cincinnati-graph-data that referenced this pull request Nov 12, 2024
Checking 4.13 OVN CI [1] in [2]:

  group by (__name__) ({__name__=~".*ipsec.*"})

only turns up ovnkube_master_ipsec_enabled, which we'd used
previously, e.g. in 2797989
(blocked-edges/4.14.*-OVNInterConnectTransitionIPsec: Declare risk,
2024-05-31, openshift#5334).  But checking 4.14 OVN CI [3], that same __name__
search turns up:

* openshift:openshift_network_operator_ipsec_state:info,
* openshift_network_operator_ipsec_state, and
* ovnkube_controller_ipsec_enabled,

but not 4.13's ovnkube_master_ipsec_enabled.  The PromQL I'm adding
here looks for the 4.14 ovnkube_controller_ipsec_enabled, if it can't
find that it falls back to the 4.13 ovnkube_master_ipsec_enabled, and
if it can't find that it falls back to the Kube-API standard
apiserver_storage_objects we'd been using before for "am I OVN or
not?".

[1]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-azure-ovn-upgrade/1851915878388469760
[2]: https://promecieus.dptools.openshift.org/
[3]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial/1851940515621113856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants