doc(bundles): propose strategy for handling big bundles#40
Conversation
3ae16bc to
7d938f7
Compare
|
|
||
| #### Partitioning and Storage | ||
|
|
||
| Instead of extracting bundle content directly to a `ConfigMap` -- as outlined in the [previous section](#podspec-pull-requirement) -- the unpack `Job` will first encode it as JSON and compress the result. The compressed JSON will then be split into chunks of a [fixed size](#determining-chunk-size), each wrapped with a header to keep track of its position in the chunk sequence. Each chunk is then encoded as JSON and stored in a respective `ConfigMap`. To each of these `ConfigMaps` a `porcelain.operators.coreos.com/bundle: <bundle-name>` label is applied to faciliate later retrieval. A `porcelain.operators.coreos.com/signature: <signature>` annotation is also generated for each chunk and is used to validate its integrity. When the content for a particular bundle is then retrieved, the `ConfigMaps` with the matching bundle name label are collected, sorted by the position tracked in each header, reconstituted into a single artifact, and decompressed before being returned for a `/content` request. If the signature verification fails for any chunk, a status condition indicating will be added to the associated bundle which will trigger a new unpack `Job` to run in a bid to rectify the issue. |
There was a problem hiding this comment.
my guess is that the majority of individual manifests (I have no data, just guessing) will fit within a single configmap. could we reserve the chunking for manifests that don't fit?
Then unpacking a large bundle could look like:
ConfigMap1
- service
- serviceaccount
- rbac
ConfigMap2
- csv
- CRD A
ConfigMap3
- CRD B Part 1
ConfigMap4
- CRD B Part 2
This seems like it will increase the debuggability of bundle unpacking for the common case
There was a problem hiding this comment.
If the majority of resource will fit into a single ConfigMap, wouldn't we get something fairly reasonable anyway (with better space utilization)?
e.g.
ConfigMap1:
- service
- serviceaccount
- rbac
- CRD A
- CSV part 1
ConfigMap2:
- CSV part 2
...
There was a problem hiding this comment.
That looks good to me - I'm not sure I'm seeing the distinction as writ though.
will the configmap look like
data:
{service}{serviceaccount}{rbac}{CRDA}{CSVpart1}
---
data:
{CSVPart2}or like:
data:
- {service}
- {serviceaccount}
- {rbac}
- {CRDA}
- {CSVpart1}
---
data:
- {CSVPart2}There was a problem hiding this comment.
Ah, I see now. I had envisioned the former, but the latter may be more clear. I do think this will be opaque to most end users, but I see your point.
|
https://etcd.io/docs/v3.4.0/dev-guide/limit/ that shows a request limit of 1.5Mb with a --max-request-bytes flag to allow for a larger request limit...was wondering about your thoughts of just leveraging this etcd feature or if that is not possible? |
@jmccormick2001 that's something we've considered in earlier discussions. AFAIK, changing the storage limit for a cluster's etcd is a highly privileged (and possibly disruptive?) task -- further, we would need to bump this value whenever we unpackage a bundle larger than the current limit. So, while I do think that successively large bundles will be a rarity, I would prefer a solution that would avoid perturbing etcd directly. |
|
|
||
| #### Determining Chunk Size | ||
|
|
||
| Chunk-size will be determined by an option on the `catalog-operator` command, which if unset, directs the operator to probe the cluster for the resource storage limit. One potential probing strategy could be to create ConfigMaps of various sizes; akin to a "binary search" for the limit. |
There was a problem hiding this comment.
I would leave the size probing as future work - there doesn't seem to be an immediate need, and it may be worth pursuing more direct options (i.e. should this value be exposed as a property of the cluster than anyone can query from kube-apisever?)
Remove descriptions from unstored versions of CRDs to keep the bundle size < 1MiB. Until operator-framework/enhancements#40 got implemented, a bundle should not exceed 1MiB because its internally (in the opm tool) managed with a configmap. Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Remove descriptions from unstored versions of CRDs to keep the bundle size < 1MiB. Until operator-framework/enhancements#40 got implemented, a bundle should not exceed 1MiB because its internally (in the opm tool) managed with a configmap. Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
| message: "no manifests detected in bundle" | ||
| ``` | ||
|
|
||
| Most importantly, a `Bundle` will expose a `/content` subresource which -- upon a successful unpackaging -- will return a [JSON encoded representation]() of that bundle. |
There was a problem hiding this comment.
Would it make sense to have the /content subresource accept a filename parameter and return only that specific manifest? If I understand correctly, the catalog-operator consume manifests step by step, so this can avoid some wasteful traffic between the catalog-operator and the aggregated api server serving that subresource.
Remove descriptions from unstored versions of CRDs to keep the bundle size < 1MiB. Until operator-framework/enhancements#40 got implemented, a bundle should not exceed 1MiB because its internally (in the opm tool) managed with a configmap. Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Remove descriptions from unstored versions of CRDs to keep the bundle size < 1MiB. Until operator-framework/enhancements#40 got implemented, a bundle should not exceed 1MiB because its internally (in the opm tool) managed with a configmap. Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Address operator-framework/operator-lifecycle-manager#1523
To ensure Operator Bundles of arbitrary size can be unpacked and used in a Kubernetes cluster by OLM, we can ...