Skip to content

Conversation

@sadasu
Copy link
Contributor

@sadasu sadasu commented Mar 31, 2023

This makes sure that the AWS API call to GetHostedZone would not hang indefinitely.

Co-authored-by: Rafael F. [email protected]

@sadasu sadasu changed the title Make AWS API call to GetHostedZone with a context OCPBUGS-10767: Make AWS API call to GetHostedZone with a context Mar 31, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 31, 2023
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-10767, which is invalid:

  • expected the bug to target the "4.14.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This makes sure that the AWS API call to GetHostedZone would hang indefinitely.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from mtulio and patrickdillon March 31, 2023 17:11
@sadasu
Copy link
Contributor Author

sadasu commented Mar 31, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 31, 2023
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-10767, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you change client API interfaces, make sure you also update the mock generated code. The unit tests also need to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an empty context that never expires [1] so this is not doing anything for the API call. We want a context.WithTimeout() where we specify a reasonable timeout for the call to execute and return

[1] https://pkg.go.dev/context#TODO

@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-10767, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

Details

In response to this:

This makes sure that the AWS API call to GetHostedZone would not hang indefinitely.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu force-pushed the fix-aws-hostedZone branch 4 times, most recently from 6852c1c to 5a07157 Compare April 4, 2023 18:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 4, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sadasu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sadasu
Copy link
Contributor Author

sadasu commented Apr 4, 2023

@mtulio could you make sure this fix doesn't break anything for AWS edge pools?

@sadasu sadasu force-pushed the fix-aws-hostedZone branch from 5a07157 to 770cc19 Compare April 4, 2023 19:04
@openshift-ci-robot
Copy link
Contributor

@sadasu: This pull request references Jira Issue OCPBUGS-10767, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

Details

In response to this:

This makes sure that the AWS API call to GetHostedZone would not hang indefinitely.

Co-authored-by: Rafael F. [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtulio
Copy link
Contributor

mtulio commented Apr 4, 2023

Validating feature: edge subnets / installing in existing subnets with Local Zones (#6371)

  • Flow create manifests -> create cluster: OK
$ grep 'VPC(' pkg/asset/installconfig/aws/metadata.go -A 5
func (m *Metadata) VPC(ctx context.Context) (string, error) {
	if m.vpc == "" {
		if len(m.Subnets) == 0 {
			return "", errors.New("cannot calculate VPC without configured subnets")
		}
$ ./openshift-install version
./openshift-install unreleased-master-7973-g5a071576b25c947686d2569938db79168dd935ef
built from commit 5a071576b25c947686d2569938db79168dd935ef
release image registry.ci.openshift.org/origin/release:4.13
release architecture amd64

$ cat ${INSTALL_DIR}/install-config.yaml |grep subnets -A 7
    subnets:
    - subnet-0f4a8fe9bc90d98b6
    - subnet-09f6b127d615c5609
    - subnet-06efea54043e1c84e
    - subnet-070961c54703a466a
    - subnet-0d4491d7a6d8931ad
    - subnet-03d1e337796fd51c7
    - subnet-088a8ce5bf940f109

$ OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE" \
>   $INSTALLER create manifests --dir $INSTALL_DIR
INFO Credentials loaded from the "openshift-dev" profile in file "/home/mtulio/.aws/credentials" 
INFO Consuming Install Config from target directory 
INFO Manifests created in: ocp-lz10-10/manifests and ocp-lz10-10/openshift 

 $ grep us-east- ocp-lz10-10/openshift/99_openshift-cluster-api_worker-machineset-3.yaml 
  name: ocp-lz10-srtgg-edge-us-east-1-iah-1a
      machine.openshift.io/cluster-api-machineset: ocp-lz10-srtgg-edge-us-east-1-iah-1a
        machine.openshift.io/cluster-api-machineset: ocp-lz10-srtgg-edge-us-east-1-iah-1a
          machine.openshift.io/zone-group: us-east-1-iah-1
            availabilityZone: us-east-1-iah-1a
            region: us-east-1

$ OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE"   $INSTALLER create cluster --dir $INSTALL_DIR
WARNING Found override for release image (quay.io/openshift-release-dev/ocp-release:4.13.0-rc.0-x86_64). Please be warned, this is not advised 
INFO Consuming OpenShift Install (Manifests) from target directory 
(...)
INFO Waiting up to 20m0s (until 4:50PM) for the Kubernetes API at https://api.ocp-lz10.devcluster.openshift.com:6443... 
  • Flow create cluster: OK
$ grep subnets -A7 ${INSTALL_DIR}/install-config.yaml
    subnets:
    - subnet-086577468fec3c450
    - subnet-0dd4798fcd7c9692a
    - subnet-046cb9c1281f77e4e
    - subnet-067771e48f0c8ba49
    - subnet-056743401e0642f40
    - subnet-0fee2d3ac858ba8ff
    - subnet-0e027a92e2706750b

$ OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE" $INSTALLER create cluster --dir $INSTALL_DIR
INFO Credentials loaded from the "openshift-dev" profile in file "/home/mtulio/.aws/credentials" 
WARNING Found override for release image (quay.io/openshift-release-dev/ocp-release:4.13.0-rc.0-x86_64). Please be warned, this is not advised 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
(...)
INFO Waiting up to 20m0s (until 4:59PM) for the Kubernetes API at https://api.ocp-lz11.devcluster.openshift.com:6443... 
INFO API v1.26.2+06e8c46 up                       
INFO Waiting up to 30m0s (until 5:11PM) for bootstrapping to complete...

$ KUBECONFIG=$PWD/ocp-lz11-10/auth/kubeconfig oc get machinesets -n openshift-machine-api  ocp-lz11-lgkts-edge-us-east-1-bue-1a -o yaml |grep us-east
  name: ocp-lz11-lgkts-edge-us-east-1-bue-1a
      machine.openshift.io/cluster-api-machineset: ocp-lz11-lgkts-edge-us-east-1-bue-1a
        machine.openshift.io/cluster-api-machineset: ocp-lz11-lgkts-edge-us-east-1-bue-1a
          machine.openshift.io/zone-group: us-east-1-bue-1
            availabilityZone: us-east-1-bue-1a
            region: us-east-1

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 4, 2023
@r4f4
Copy link
Contributor

r4f4 commented Apr 4, 2023

@sadasu it would be good for posterity to document in the commit description what the mutex problem is and how we're fixing it.

@zaneb
Copy link
Member

zaneb commented Apr 5, 2023

/retest

@sadasu
Copy link
Contributor Author

sadasu commented Apr 5, 2023

/retest-required

Current failure in e2e-aws-ovn is not related to this fix.

@sadasu sadasu changed the title OCPBUGS-10767: Make AWS API call to GetHostedZone with a context WIP: Make AWS API call to GetHostedZone with a context Apr 5, 2023
@openshift-ci-robot openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 5, 2023
@openshift-ci-robot openshift-ci-robot removed the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 5, 2023
@openshift-ci-robot
Copy link
Contributor

@sadasu: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

This makes sure that the AWS API call to GetHostedZone would not hang indefinitely.

Co-authored-by: Rafael F. [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Apr 5, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 5, 2023

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-workers-rhel8 770cc19 link false /test e2e-aws-ovn-workers-rhel8
ci/prow/e2e-aws-ovn 770cc19 link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-single-node 770cc19 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yunjiang29
Copy link
Contributor

Hello @sadasu @mtulio, FYI, I did pre-merge test with following configuration:

  • byo PHZ, create cluster directly. PASS
  • byo PHZ, create manifests/ignition/cluster. PASS
  • Local Zone + byo PHZ, create cluster directly. PASS
  • Local Zone + byo PHZ, create manifests/ignition/cluster. PASS

@r4f4
Copy link
Contributor

r4f4 commented Apr 6, 2023

Hello @sadasu @mtulio, FYI, I did pre-merge test with following configuration:

* byo PHZ, create cluster directly. PASS

* byo PHZ, create manifests/ignition/cluster. PASS

* Local Zone + byo PHZ, create cluster directly. PASS

* Local Zone + byo PHZ, create manifests/ignition/cluster. PASS

@yunjiang29 that's great testing. I appreciate the comprehensive create cluster and create manifests + cluster checks. @sadasu is working on an alternative solution at #7070 which is functionally similar to this PR but with code cleanups to avoid similar bugs in the future.

@yunjiang29
Copy link
Contributor

an alternative solution at #7070 which is functionally similar to this PR but with code cleanups to avoid similar bugs in the future.

@r4f4, so #7070 will be the final solution, and this PR won't be merged, is it right?

@r4f4
Copy link
Contributor

r4f4 commented Apr 6, 2023

an alternative solution at #7070 which is functionally similar to this PR but with code cleanups to avoid similar bugs in the future.

@r4f4, so #7070 will be the final solution, and this PR won't be merged, is it right?

I think that's the idea, but I'll leave it to @sadasu to confirm or deny.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Apr 25, 2023

Issue fixed by #7070.

@sadasu sadasu closed this Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants