Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect driver.FlagDefaults even if --extra-config is set #7509

Merged
merged 7 commits into from
Apr 8, 2020

Conversation

medyagh
Copy link
Member

@medyagh medyagh commented Apr 8, 2020

this one line change is result of a full day of deep investigation and yes it fixes it !
here is what was happpenning .

Before this PR:

minikube start -p p2 --memory=2200 --alsologtostderr -v=3 --wait=true --container-runtime=containerd --driver=docker  --kubernetes-version=v1.15.7 --extra-config=kubeadm.ignore-preflight-errors=SystemVerification

was failiing on waiting because coreDNS container was stuck in Creating:

kube-system   coredns-5d4dd4b4db-qhpdp   0/1     ContainerCreating

because

Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bf4e32817a8917e27d39b549b18f426b7db09cbd2204d0429aded681db77d2c4": failed to set bridge addr: could not add IP address to "cni0": permission denied

since we added a new --wait component "aps_running" the 'TestStartStop' for 'Crio' and 'Containerd' revealed the hidden failure that we had not seen but always existed (on our previous stable releases it exists)

infact users had reported this issue #7354 but it was hidden to our eyes before the new --wait flag.

What was happening ?

for crio and contanerd on docker driver we need a CNI to make it work.
and we automatically set the Extra Options for the Pod CIDR to be routed to use the kic CNI (overlay).

however ---> if someone adds an Extra Option through the flag , we that messed up the auto setting the CNI Extra option!

(thank you so much anyone who wrote that Integration Tests to start with extra options, you did a great job)

This one line change was found thanks to the new --wait=true that waits for more stuff and revealed all the secret bugs we had covering under a lot of dust.

(also tweaking the logs and the retry logic to be more efficient and produce less noise)

Cheers !

After this PR :

Works like a Charm !

this PR fixes this issue for Docker and Podman Drivers.

I still hope one day we unify the CNI solution across VM and container and Multinode I hope we do this in the future : #7428

not fixed in this PR

this PR still does not fix the CRIO problem with CNI on docker driver. that would be subject for another investigation. see more here #7522

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 8, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2020
@medyagh medyagh changed the title fix wait retry logic Fix CNI Apr 8, 2020
@medyagh medyagh changed the title Fix CNI Fix CoreDNS failing on Containerd and CRIO on KIC drivers. Apr 8, 2020
@codecov-io
Copy link

codecov-io commented Apr 8, 2020

Codecov Report

Merging #7509 into master will not change coverage by %.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7509   +/-   ##
=======================================
  Coverage   36.69%   36.69%           
=======================================
  Files         146      146           
  Lines        8976     8976           
=======================================
  Hits         3294     3294           
  Misses       5290     5290           
  Partials      392      392           
Impacted Files Coverage Δ
cmd/minikube/cmd/start.go 17.24% <0.00%> (ø)

@medyagh medyagh changed the title Fix CoreDNS failing on Containerd and CRIO on KIC drivers. Fix CoreDNS failing on Containerd & CRIO on KIC drivers. Apr 8, 2020
@medyagh
Copy link
Member Author

medyagh commented Apr 8, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 8, 2020
@medyagh medyagh requested a review from priyawadhwa April 8, 2020 07:47
@minikube-pr-bot
Copy link

Error: running mkcmp: exit status 1

@minikube-pr-bot
Copy link

All Times minikube: [ 67.486090 63.650628 65.347021]
All Times Minikube (PR 7509): [ 65.258261 73.590560 65.250705]

Average minikube: 65.494580
Average Minikube (PR 7509): 68.033175

Averages Time Per Log

+----------------------+-----------+--------------------+
|         LOG          | MINIKUBE  | MINIKUBE (PR 7509) |
+----------------------+-----------+--------------------+
| minikube v           |  0.171908 |           0.156350 |
| Creating kvm2        | 40.677702 |          41.817563 |
| Preparing Kubernetes | 22.487182 |          23.590596 |
| Pulling images       |           |                    |
| Launching Kubernetes |           |                    |
| Waiting for cluster  |           |                    |
+----------------------+-----------+--------------------+

@medyagh medyagh changed the title Fix CoreDNS failing on Containerd & CRIO on KIC drivers. Fix CoreDNS failing on Containerd on KIC drivers. Apr 8, 2020
if err := ExpectAppsRunning(cs, expected); err != nil {
if time.Since(start) > minLogCheckTime {
glog.Infof("error waiting for apps to be running: %v", err)
time.Sleep(kconst.APICallRetryInterval * 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this multiply the standard retry interval by 5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was something I stole from the simmilar func system pods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do it to avoid pegging the system because they are running:

announceProblems(r, bs, cfg, cr)

You may want to call the same function here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont have the parameters that announceProblems needs in this function. (r cruntime.Manager, bs bootstrapper.Bootstrapper,)

@tstromberg
Copy link
Contributor

Can you give this PR a title that reflects the change?

It doesn't seem to be specific to containerd.

@medyagh medyagh changed the title Fix CoreDNS failing on Containerd on KIC drivers. Fix extra-option overwriting Apr 8, 2020
@medyagh medyagh changed the title Fix extra-option overwriting Fix setting --extra-config should not break docker/podman driver CoreDNS Apr 8, 2020
if err := ExpectAppsRunning(cs, expected); err != nil {
if time.Since(start) > minLogCheckTime {
glog.Infof("error waiting for apps to be running: %v", err)
time.Sleep(kconst.APICallRetryInterval * 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do it to avoid pegging the system because they are running:

announceProblems(r, bs, cfg, cr)

You may want to call the same function here.

@medyagh medyagh changed the title Fix setting --extra-config should not break docker/podman driver CoreDNS Respect driver.FlagDefaults even if --extra-config is set Apr 8, 2020
@minikube-pr-bot
Copy link

All Times Minikube (PR 7509): [ 65.755934 65.131846 64.926113]
All Times minikube: [ 64.794872 61.864391 68.093661]

Average minikube: 64.917641
Average Minikube (PR 7509): 65.271297

Averages Time Per Log

+----------------------+-----------+--------------------+
|         LOG          | MINIKUBE  | MINIKUBE (PR 7509) |
+----------------------+-----------+--------------------+
| minikube v           |  0.157385 |           0.150700 |
| Creating kvm2        | 40.417326 |          40.431551 |
| Preparing Kubernetes | 22.280751 |          22.691045 |
| Pulling images       |           |                    |
| Launching Kubernetes |           |                    |
| Waiting for cluster  |           |                    |
+----------------------+-----------+--------------------+

@medyagh
Copy link
Member Author

medyagh commented Apr 8, 2020

The TestContainerd is related to BindAddress #7505
#7521

@medyagh
Copy link
Member Author

medyagh commented Apr 8, 2020

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 8, 2020
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 8, 2020
@minikube-pr-bot
Copy link

All Times minikube: [ 69.184964 65.362801 67.889831]
All Times Minikube (PR 7509): [ 67.036276 62.159522 63.519268]

Average minikube: 67.479199
Average Minikube (PR 7509): 64.238355

Averages Time Per Log

+----------------------+-----------+--------------------+
|         LOG          | MINIKUBE  | MINIKUBE (PR 7509) |
+----------------------+-----------+--------------------+
| minikube v           |  0.164748 |           0.151520 |
| Creating kvm2        | 41.355013 |          40.743942 |
| Preparing Kubernetes | 23.821626 |          21.102037 |
| Pulling images       |           |                    |
| Launching Kubernetes |           |                    |
| Waiting for cluster  |           |                    |
+----------------------+-----------+--------------------+

@medyagh medyagh merged commit 689eca4 into kubernetes:master Apr 8, 2020
@medyagh medyagh deleted the extra_opt_overwrite branch May 2, 2020 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants