Skip to content

Conversation

@joelanford
Copy link
Member

@joelanford joelanford commented Aug 1, 2022

Signed-off-by: Joe Lanford [email protected]

Description of the change:
This PR introduces an optional persistent caching layer to the opm serve command to reduce the cost of initializing the GRPC server for file-based catalogs -- and thus improve the startup time.

It also update the Dockerfile generated by opm generate dockerfile to build a cache at image build time and use it at runtime. This does have the side affect of essentially doubling the image size when compared to an image that does not pre-generate the cache.

It adds --cache-dir and --cache-only flags to the opm serve command to enable interactions with the cache.

  1. The --cache-dir flag specifies a persistent cache directory to use. If --cache-dir is unset, the existing behavior (which builds and uses an ephemeral cache directory) remains.
  2. The --cache-only flag causes opm serve to sync the cache and then exit without actually serving the GRPC API. This is useful when building catalog images -- if the catalog image contains a pre-synced cache, the container startup time is significantly improved.

Motivation for the change:
The opm serve command's startup time is proportional to the size of the FBC. This is a problem because OLM creates catalog pods with somewhat aggressive health/liveness probes that cause affected FBC catalogs to never become healthy.

This problem has been partially addressed in current/future OLM versions via the introduction of a startupProbe. However, that is mainly a workaround for the root problem. This PR attempts to improve the startup time itself.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2022
@codecov
Copy link

codecov bot commented Aug 1, 2022

Codecov Report

Merging #1005 (49ce3e6) into master (8a159e8) will increase coverage by 0.35%.
The diff coverage is 63.15%.

@@            Coverage Diff             @@
##           master    #1005      +/-   ##
==========================================
+ Coverage   52.62%   52.97%   +0.35%     
==========================================
  Files         104      105       +1     
  Lines        9422     9606     +184     
==========================================
+ Hits         4958     5089     +131     
- Misses       3535     3562      +27     
- Partials      929      955      +26     
Impacted Files Coverage Δ
alpha/action/generate_dockerfile.go 81.81% <ø> (ø)
pkg/registry/tar.go 61.11% <61.11%> (ø)
pkg/registry/query.go 61.22% <63.63%> (-0.76%) ⬇️
alpha/action/list.go 73.25% <0.00%> (+0.31%) ⬆️
alpha/veneer/semver/semver.go 61.85% <0.00%> (+3.61%) ⬆️
alpha/declcfg/write.go 82.29% <0.00%> (+9.27%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@joelanford joelanford marked this pull request as ready for review August 16, 2022 03:49
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 16, 2022
@openshift-ci openshift-ci bot requested review from benluddy and kevinrizza August 16, 2022 03:49
@jdockter
Copy link

@joelanford was able to test your branch by building opm binary local and pulling based image quay.io/joelanford/opm:opm-serve-cache-personal in FBC build of the IBM Operator catalog. Tested internally on OCP 4.11 x, p, z cluster as well as OCP 4.6 x and p with no issue. Saw startup times of catalog pod around 15 seconds using updated dockerfile for catalog build like the following:

FROM --platform=linux/amd64 quay.io/joelanford/opm:opm-serve-cache-personal AS builder
FROM --platform=linux/amd64 registry.redhat.io/ubi8/ubi-minimal
COPY --from=builder /bin/opm/ /bin/opm
COPY --from=builder /bin/grpc_health_probe /bin/grpc_health_probe
### add licenses to this directory
RUN mkdir /licenses
COPY licenses/LICENSE /licenses
WORKDIR /tmp
ARG user=1001
RUN microdnf clean all && chmod -vR g=u /etc/passwd
USER ${user}
ENTRYPOINT ["/bin/opm"]
CMD ["serve", "/configs", "--cache-dir=/tmp/cache"]
COPY amd64 /configs
RUN ["/bin/opm", "serve", "/configs", "--cache-dir=/tmp/cache", "--cache-only"]
LABEL operators.operatorframework.io.index.configs.v1=/configs

@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2022
@openshift-merge-robot openshift-merge-robot merged commit 494b68e into operator-framework:master Aug 19, 2022
@joelanford joelanford deleted the opm-serve-cache branch August 24, 2022 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants