@@ -190,8 +190,10 @@ incorrectly or objects being garbage collected mistakenly.
190190
191191## Proposal
192192
193- API change: To the apiservices API, add an "alternates" clause, a list of
194- apiservers which believe they can serve the group-version.
193+ API changes:
194+ * To the apiservices API, add an "alternates" clause, a list of
195+ apiservers which believe they can serve the group-version.
196+ * To ??? API, add ability to tell which apiservers can serve a resource.
195197
196198API server change:
197199* A controller adds the apiserver to the list of alternates for its built-in
@@ -202,22 +204,34 @@ API server change:
202204 - If the request is for a group/version the apiserver doesn't have locally, it
203205 will proxy the request to one of the alternates instead.
204206
205- Unsolved problem: to be completely accurate and achive the goals in this KEP, we
206- will need to track what resources apiservers can serve, not just what
207- group-versions.
208-
209207### User Stories (Optional)
210208
211- <!--
212- Detail the things that people will be able to do if this KEP is implemented.
213- Include as much detail as possible so that people can understand the "how" of
214- the system. The goal here is to make this feel real for users without getting
215- bogged down.
216- -->
209+ #### Garbage Collector
210+
211+ The garbage collector makes decisions about deleting objects when all
212+ referencing objects are deleted. A discovery gap / apiserver mismatch, as
213+ described above, could result in GC seeing a 404 and assuming an object has been
214+ deleted; this could result in it deleting a subsequent object that it should
215+ not.
217216
218- #### Story 1
217+ This proposal will cause the GC to see either the correct object or get a 503
218+ (which it handles safely).
219219
220- #### Story 2
220+ #### Namespace Lifecycle Controller
221+
222+ This controller seeks to empty all objects from a namespace when it is deleted.
223+ Discovery failures cause NLC to be unable to tell if objects of a given resource
224+ are present in a namespace. It fails safe, meaning it refuses to delete the
225+ namespace until it can verify it is empty: this causes slowness deleteing
226+ namespaces that is a common source of complaint.
227+
228+ Additionally, if the NLC knows about a resource that the apiserver it is talking
229+ to does not, it may incorrectly get a 404, assume a collection is empty, and
230+ delete the namespace too early, leaving garbage behind in etcd. This is a
231+ correctness problem, the garbage will reappear if a namespace of the same name
232+ is recreated.
233+
234+ This proposal addresses both problems.
221235
222236### Notes/Constraints/Caveats (Optional)
223237
@@ -230,26 +244,32 @@ This might be a good place to talk about core concepts and how they relate.
230244
231245### Risks and Mitigations
232246
233- <!--
234- What are the risks of this proposal, and how do we mitigate? Think broadly.
235- For example, consider both security and how this will impact the larger
236- Kubernetes ecosystem .
247+ Cluster admins might not read the release notes and realize they should enable
248+ network/firewall connectivity between apiservers. In this case clients will
249+ recieve 503s instead of transparently being proxied. 503 is still safer than
250+ today's behavior .
237251
238- How will security be reviewed, and by whom?
252+ Requests will consume egress bandwidth for 2 apiservers when proxied. We can cap
253+ the number if needed, but upgrades aren't that frequent and few resources are
254+ changed on releases, so these requests should not be common. We will count them
255+ with a metric.
239256
240- How will UX be reviewed, and by whom?
241-
242- Consider including folks who also work outside the SIG or subproject.
243- -->
257+ TODO: security / cert stuff.
244258
245259## Design Details
246260
247- <!--
248- This section should contain enough information that the specifics of your
249- change are understandable. This may include API specs (though not always
250- required) or even code snippets. If there's any ambiguity about HOW your
251- proposal will be implemented, this is the place to discuss them.
252- -->
261+ TODO: specific API change (x2)
262+
263+ TODO: explanation of how the handler will determine a request is for a resource
264+ that should be proxied.
265+
266+ TODO: explanation of how the security handshake between apiservers works.
267+ * What we need to fix: random processes / external users / etc should not be
268+ able to proxy requests, so the receiving apiserver needs to be able to verify
269+ the source apiserver.
270+ * generate self-signed cert on startup, put pubkey in apiserver identity lease
271+ object?
272+
253273
254274### Test Plan
255275
0 commit comments