Skip to content

Conversation

@rgolangh
Copy link
Contributor

@rgolangh rgolangh commented Dec 8, 2019

oVirt worker will most of the time failed to join the cluster because they don't have a nameserver
set to resolve the api or api-int.
The problem is that a /etc/dhclient/dhclient.conf is written during the network manager pre-up hook, using the file d3d4e5f#diff-35faebd7db8179e62b9eb7ca9604ce72
and because of a race, dhclient doesn't recognize the new config.

  • What I did
    This change will write /etc/dhcp/dhclient.conf using ignition, with the DNS VIP prepended
    as the first nameserver. The DNSVIP is interpolated by the server so it is written in its final form
    in the ignition phase (just like the masters). This eliminates the race I'm seeing.

Related-to: #1312

  • How to verify it

Run IPI installation of oVirt, 3 master (all of them use those static pods) and 2 workers which are added using cluster-api-provider-ovirt, and see them join and go into Ready state.

  • Description for the changelog
    Set the DNSVIP as the first nameserver for worker by writing /etc/dhcp/dhclient.conf using ignition.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 8, 2019
@ericavonb
Copy link

💯 to removing templating where possible. I only did a quick look over and don't see anything jump out at me. Lemme go through it more thoughtfully to be sure of the underlying logic. lmk if you need this merged more urgently tho.

@rgolangh
Copy link
Contributor Author

@ericavonb do note that the first commit here is taken from a PR #1298
It is not depended on it but I wanted to see them both running and to verify the behaviour.

@rgolangh
Copy link
Contributor Author

/retest

1 similar comment
@rgolangh
Copy link
Contributor Author

/retest

@yboaron
Copy link
Contributor

yboaron commented Dec 12, 2019

@rgolangh Any reason for including the static pods resources customizations to this PR?
I would suggest splitting this PR

@rgolangh
Copy link
Contributor Author

rgolangh commented Dec 12, 2019 via email

@rgolangh
Copy link
Contributor Author

@ericavonb I can't seem to get rid of the aws errors. Is that a known issue?

@ericavonb
Copy link

/retest

@ericavonb
Copy link

@ericavonb I can't seem to get rid of the aws errors. Is that a known issue?

Looks like AWS has not doing well yesterday: https://prow.svc.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-machine-config-operator-master-e2e-aws?buildId=

I'll look at this PR's failures to make sure they are failing for the same reason the others were.

@ericavonb
Copy link

Looks like e2e-aws passed 🎉 ! Leaving the lgmt to someone from the ovirt side.

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 12, 2019
The way we provision the non-virtual IP nameserver is not working in a
reliable way for workers. Instead use the known to work way of using the
DNSVIP in dhclient.conf which is written directly by ignition to disk,
so NM picks is up with no known issues when the machine starts.

Signed-off-by: Roy Golan <rgolan@redhat.com>
@rgolangh rgolangh force-pushed the use_dnsvip_for_workers branch from ed65df3 to a4a4e92 Compare December 18, 2019 12:38
@rgolangh
Copy link
Contributor Author

/lgtm

@openshift-ci-robot
Copy link
Contributor

@rgolangh: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rgolangh
Copy link
Contributor Author

/retest

1 similar comment
@rgolangh
Copy link
Contributor Author

/retest

@cuppett
Copy link
Member

cuppett commented Jan 6, 2020

@ericavonb looks we are ready now. Can this be approved by somebody else on the team?

@rgolangh
Copy link
Contributor Author

rgolangh commented Jan 9, 2020

@yboaron @celebdor Can one of you lgtm this?

@celebdor
Copy link
Contributor

celebdor commented Jan 9, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 9, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: celebdor, ericavonb, rgolangh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants