You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We expect no non-infra related flakes in the last month as a GA graduation criteria.
375
382
-->
376
383
377
-
A new node_e2e test with `serialize-image-pulls==false` will be added to make sure that when maxParallelImagePulls is reached, all further image pulls will be blocked.
384
+
A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits.
385
+
1. When maxParallelImagePulls is reached, all further image pulls will be blocked.
386
+
2. Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is `Always`.
378
387
379
388
- <test>: <linktotestcoverage>
380
389
@@ -385,6 +394,7 @@ A new node_e2e test with `serialize-image-pulls==false` will be added to make s
385
394
386
395
#### Beta
387
396
- Gather feedback from developers and surveys
397
+
- Add e2e test to cover the parallel image pull case
388
398
389
399
#### GA
390
400
- Gather feedback from real-world usage from kubernetes vendors.
This section must be completed when targeting beta to a release.
586
596
-->
587
597
598
+
588
599
###### How can a rollout or rollback fail? Can it impact already running workloads?
589
600
590
601
<!--
@@ -597,13 +608,22 @@ rollout. Similarly, consider large clusters and how enablement/disablement
597
608
will rollout across nodes.
598
609
-->
599
610
611
+
This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail.
612
+
No running workloads will be imapcted.
613
+
614
+
Note that when changing MaxParallelImagePulls, kubelet restart is required. Since the parallel image pull counter
615
+
is maintained in memory, restarting kubelet will reset the counter and potentially allow more image pulls than the limit.
616
+
600
617
###### What specific metrics should inform a rollback?
601
618
602
619
<!--
603
620
What signals should users be paying attention to when the feature is young
604
621
that might indicate a serious problem?
605
622
-->
606
623
624
+
In worst case, image pulls might fail. Users can monitor image pull k8s events and `runtime_operations_errors_total` metric to see if there is an increase
625
+
of image pull failures.
626
+
607
627
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
608
628
609
629
<!--
@@ -612,12 +632,33 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
612
632
are missing a bunch of machinery and tooling and can't do that now.
613
633
-->
614
634
635
+
This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet config and
636
+
restarting kubelet.
637
+
638
+
The manual test steps are as following:
639
+
640
+
1. Create an one-node 1.27 k8s cluster, which has MaxParallelImagePulls support but the value is nil (no limit) by default.
641
+
2. Manually change the MaxParallelImagePulls setting by SSH-ing to the node and adding the following to the kubelet config:
642
+
```
643
+
serializeImagePulls: false
644
+
maxParallelImagePulls: 2
645
+
```
646
+
3. Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior.
647
+
4. Observe the k8s events by running `kubectl get events`, and observe that exactly two images finish pulling first, and then the remaining one image finishes.
648
+
5. Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the `serializeImagePulls` entry and `maxParallelImagePulls` entry.
649
+
6. Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3.
650
+
7. Observe the k8s events by running `kubectl get events`, and observe that exactly one image finishes pulling first, and then the remaining one image finishes.
651
+
652
+
653
+
615
654
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
616
655
617
656
<!--
618
657
Even if applying deprecation policies, they may still surprise some users.
619
658
-->
620
659
660
+
No.
661
+
621
662
### Monitoring Requirements
622
663
623
664
<!--
@@ -634,6 +675,10 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
634
675
checking if there are objects with field X set) may be a last resort. Avoid
635
676
logs or events for this purpose.
636
677
-->
678
+
Image pulling is managed by kubelet, and does not affect how workloads run. That said, when parallel image pulling is enabled (SerialImagePulls is set to false), an operator will observe that
679
+
a pod could start while kubelet is still pulling images for another pod.
680
+
681
+
To observe the effect of different `MaxParallelImagePulls` settings, please refer to the next section.
637
682
638
683
###### How can someone using this feature know that it is working for their instance?
639
684
@@ -646,13 +691,11 @@ and operation of this feature.
646
691
Recall that end users cannot usually observe component logs or access metrics.
647
692
-->
648
693
649
-
-[ ] Events
650
-
- Event Reason:
651
-
-[ ] API .status
652
-
- Condition name:
653
-
- Other field:
654
-
-[ ] Other (treat as last resort)
655
-
- Details:
694
+
-[X] Events
695
+
- Event Reason: Pulling
696
+
697
+
Assuming `MaxParallelImagePulls` is set to _X_, an operator can look at the container runtime log, and see _X_ PullImageRequests sent to container runtime at the same time.
698
+
If the image pulls take roughly the same amount of time, an operator can see k8s event and see _X_ images finish pulling at roughly the same time.
656
699
657
700
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
658
701
@@ -677,15 +720,19 @@ question.
677
720
Pick one more of these and delete the rest.
678
721
-->
679
722
680
-
-[ ] Metrics
681
-
- Metric name:
682
-
-[Optional] Aggregation method:
723
+
We can rely on the existing metrics on image pull to determine if this feature has any impact on image pulling.
0 commit comments