@@ -185,7 +185,8 @@ overage.
185185
186186- ** What happens if we reenable the feature if it was previously rolled back?**
187187
188- It should continue to work as expected.
188+ New objects with expanded DNS configuration will be accepted by the apiserver
189+ and new Pods with expanded configuration will be created by the kubelet.
189190
190191- ** Are there any tests for feature enablement/disablement?**
191192
@@ -195,39 +196,42 @@ We will add unit tests.
195196
196197- ** How can a rollout fail? Can it impact already running workloads?**
197198
198- N/A
199+ If a kubelet starts with invalid ` resolvConf ` , new workloads will fail DNS
200+ lookups.
199201
200202- ** What specific metrics should inform a rollback?**
201203
202- N/A
204+ If new workloads start to fail DNS lookups due to a corrupted resolv.conf, or
205+ due to older resolver libraries, that would be an indication to rollback the
206+ enablement.
203207
204208- ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
205209
206- N/A
210+ We will do test.
207211
208212- ** Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?**
209213
210- N/A
214+ No
211215
212216### Monitoring Requirements
213217
214218- ** How can an operator determine if the feature is in use by workloads?**
215219
216- N/A
220+ There is no metric to indicate the enablement. The operator has to check if
221+ there are objects or DNS resolver configuration files with expanded
222+ configuration to determine if the feature is in use.
217223
218224- ** What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
219225 - [ ] Metrics
220226 - Metric name:
221227 - [ Optional] Aggregation method:
222228 - Components exposing the metric:
223- - [ ] Other (treat as last resort)
224- - Details:
225-
226- N/A
229+ - [x] Other (treat as last resort)
230+ - Success of DNS lookups
227231
228232- ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
229233
230- N/A
234+ DNS lookups should not fail as before the feature was enabled.
231235
232236- ** Are there any missing metrics that would be useful to have to improve observability of this feature?**
233237
@@ -237,42 +241,50 @@ N/A
237241
238242- ** Does this feature depend on any specific services running in the cluster?**
239243
240- N/A
244+ No
241245
242246### Scalability
243247
244248- ** Will enabling / using this feature result in any new API calls?**
245249
246- N/A
250+ No
247251
248252- ** Will enabling / using this feature result in introducing new API types?**
249253
250- N/A
254+ No
251255
252256- ** Will enabling / using this feature result in any new calls to the cloud provider?**
253257
254- N/A
258+ No
255259
256260- ** Will enabling / using this feature result in increasing size or count of the existing API objects?**
257261
258- N/A
262+ The sum of the lengths of ` PodSpec.DNSConfig.Searches ` could be increased to 2048.
259263
260264- ** Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?**
261265
262- N/A
266+ The DNS lookup time could be increased, but it will be negligible.
263267
264268- ** Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?**
265269
266- N/A
270+ No
267271
268272### Troubleshooting
269273
270274- ** How does this feature react if the API server and/or etcd is unavailable?**
271275
276+ N/A
277+
272278- ** What are other known failure modes?**
273279
280+ N/A
281+
274282- ** What steps should be taken if SLOs are not being met to determine the problem?**
275283
284+ If DNS lookups fail, you can check error messages. And then, validate the
285+ kubelet's ` resolvConf ` if it is corrupted or use newer DNS resolver libraries if
286+ they are too old.
287+
276288## Implementation History
277289
278290- 2021-03-26: [ Initial
0 commit comments