Skip to content

Conversation

@soltysh
Copy link
Contributor

@soltysh soltysh commented May 20, 2021

This shows openshift/library-go#1084 applied to oc image info

/assign @smarterclayton

/hold
for real bump

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 20, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 20, 2021

@soltysh: This pull request references Bugzilla bug 1823143, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhouying7780

Details

In response to this:

Bug 1823143: wire ICSP lookups to oc image info

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from zhouying7780 May 20, 2021 19:21
@openshift-ci openshift-ci bot requested review from mfojtik and smarterclayton May 20, 2021 19:22
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2021
Comment on lines 118 to 127
uniqueMirrors := make([]reference.DockerImageReference, 0, len(imageSources))
uniqueMap := make(map[reference.DockerImageReference]bool)
for _, imageSourceMirror := range imageSources {
if _, ok := uniqueMap[imageSourceMirror]; !ok {
uniqueMap[imageSourceMirror] = true
uniqueMirrors = append(uniqueMirrors, imageSourceMirror)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not follow https://github.com/openshift/api/blob/a99ffa1cac6709edf8f502b16890b16f9a557e00/operator/v1alpha1/types_image_content_source_policy.go#L34-L39 .

If at all possible, this should share the implementation in https://github.com/openshift/runtime-utils/blob/master/pkg/registries/registries.go (used e.g. in machine-config-* to generate CRI-O configuration from ICSP), and perhaps even the consumer from https://github.com/containers/image/blob/97b3ffa7bb92a7778238b58010454031b0c2cbee/pkg/sysregistriesv2/system_registries_v2.go#L815 + https://github.com/containers/image/blob/97b3ffa7bb92a7778238b58010454031b0c2cbee/pkg/sysregistriesv2/system_registries_v2.go#L130 , OTOH that would require API additions to make it usable with in-memory configs) instead of maintaining an independent implementation. There are RFEs to add hostname wildcards and the like to the configuration; of course it’s possible to maintain independent implementations but there are enough corner cases that sharing an implementation is the easiest way to remain consistent.


(I haven’t read the code in detail, but the behavior of ICSP is not “fallback alternatives”; the mirrors, if any are tried before the primary location, e.g. so that in a disconnected environment there isn’t a firewall/DNS noise, and delays, for attempts to reach the global internet for images that are available on mirrors.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional, oc is not a cluster. For cli commands not tied to the behavior of a cluster the appropriate behavior is fallback. Not the least of which is that ICSP defines alternatives, and the tradeoffs different clients make drives different benefits. A cli is low latency on the happy path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be explicit, my primary concern is about sharing the implementation: the primary vs. mirrors ordering is far less important to me.)

If the user wants to hit the primary location directly, it’s trivial not to submit an ICSP to oc; all of this is only invoked on explicit user action to involve ICSP.

The linked bug also motivates this with a situation where the primary location is not available, and even suggests that oc extract is used during cluster installation. Both of these would benefit from using the mirrors first.

The --cluster-icsp option documentation also says:

honor the ordering of those sources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect a user to provide --use-cluster-icsp in scripting, and for it not to fail if one exists, and for us to "do the right thing for the user". Since ICSP lookup on some commands (anything not doing bulk retrieval) is better suited for latency, I think we're reserving the right to "do the right thing" by underspecifying.

The clause honor the ordering should be removed from the flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clause honor the ordering should be removed from the flag.

Removed

if err != nil {
return nil, err
}
if imageRef.AsRepository() != rdmSourceRef.AsRepository() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way ICSP actually works when pulling, configuring {Source: quay.io/foo, Mirrors: example.local/mirror-of-quay-foo} will also match quay.io/foo/bar and rewrite to example.local/mirror-of-quay-foo/bar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO there, we'll tackle this in the next revisions.

@djzager
Copy link
Member

djzager commented May 26, 2021

Just providing the link. This PR is blocked by openshift/library-go#1084

o.SecurityOptions.Bind(flags)
flags.StringVarP(&o.Output, "output", "o", o.Output, "Print the image in an alternative format: json")
flags.StringVar(&o.FileDir, "dir", o.FileDir, "The directory on disk that file:// images will be read from.")
flags.BoolVar(&o.ClusterICSP, "cluster-icsp", o.ClusterICSP, "When set to true, look for alternative image sources from ImageContentSourcePolicy objects in cluster, honor the ordering of those sources, and fail if an ImageContentSourcePolicy is not found in cluster.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that this fails if there is no ICSP. As a user, if I want ICSP to be used, I'll specify the flag in scripts and tolerate when a cluster doesn't have an ICSP. The flag should be use-cluster-icsp. There's no real disadvantage for a user using an ICSP, or there not being one on some clusters. If we had an explicit ICSP locator flag, then I would agree we would fail.

Copy link
Contributor Author

@soltysh soltysh Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that, actually the code reading that from cluster didn't fail, it was only the comment that says so. I've dropped the latter part of this flag's description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we do fail when you specify a file with --icsp-file and we have issues reading the file.

flags.StringVarP(&o.Output, "output", "o", o.Output, "Print the image in an alternative format: json")
flags.StringVar(&o.FileDir, "dir", o.FileDir, "The directory on disk that file:// images will be read from.")
flags.BoolVar(&o.ClusterICSP, "cluster-icsp", o.ClusterICSP, "When set to true, look for alternative image sources from ImageContentSourcePolicy objects in cluster, honor the ordering of those sources, and fail if an ImageContentSourcePolicy is not found in cluster.")
flags.StringVar(&o.ICSPFile, "icsp-file", o.ICSPFile, "Path to an ImageContentSourcePolicy file. If set, data from this file will be used to set alternative image sources.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data from this file will be used to find alternative locations for images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

}
if len(o.ICSPFile) > 0 || o.ClusterICSP {
registryContext = registryContext.WithAlternateBlobSourceStrategy(
strategy.NewSimpleLookupICSPStrategy(o.ICSPFile, o.operatorClient.ImageContentSourcePolicies()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have two strategies - one for file (NewExplicitICSPStrategy) and one for cluster lookup (NewICSPFromClusterOnError).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should both of them have similar behavior or they will behave differently, as in:

  • NewExplicitICSPStrategy reads always first from alternative (just like Miroslav is saying)
  • NewICSPFromClusterOnError is similar to current SimpleLookup in that it fails first and tries alternates only then?

@smarterclayton
Copy link
Contributor

Some things we missed (and need to fix before this is correct):

  1. If PING fails against the registry
I0602 11:37:32.225230   24632 round_trippers.go:432] GET https://registry-1.docker.io/v2/
I0602 11:37:32.225235   24632 round_trippers.go:438] Request Headers:
I0602 11:37:47.226765   24632 round_trippers.go:457] Response Status:  in 15001 milliseconds
I0602 11:37:47.226796   24632 round_trippers.go:460] Response Headers:
I0602 11:37:47.226862   24632 workqueue.go:143] about to send work queue error: unable to read image docker.io/openshift/origin-cli@sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

we don't fail over correctly (we just exit). This is a bug in the Repository ping behavior and is going to require changes to library-go

  1. If Auth fails against the repository (getting the token) because the current user doesn't have access
I0602 11:31:41.622119   24437 round_trippers.go:432] GET https://registry-1.docker.io/v2/foo/bar/manifests/sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d
I0602 11:31:41.622151   24437 round_trippers.go:438] Request Headers:
I0602 11:31:41.622164   24437 round_trippers.go:442]     Accept: application/vnd.docker.distribution.manifest.list.v2+json
I0602 11:31:41.622175   24437 round_trippers.go:442]     Accept: application/vnd.docker.distribution.manifest.v2+json
I0602 11:31:41.622186   24437 round_trippers.go:442]     Accept: application/vnd.oci.image.manifest.v1+json
I0602 11:31:41.622198   24437 round_trippers.go:442]     Authorization: Bearer <masked>
I0602 11:31:41.972523   24437 round_trippers.go:457] Response Status: 401 Unauthorized in 350 milliseconds

we don't fail over.

For both of these we need to move "ping" into the actual repository loader lazily, which I believe was an early comment on the PR and I forgot to bring it back up. I think it's potentially ok if our test impl doesn't have it, but it will absolutely block 'oc' being used with ICSP to solve the original bugs here (where you are running oc in a disabled context). I'll suggest a fix later.

@smarterclayton
Copy link
Contributor

Tested this locally, didn't seem to work:

$ oc image info docker.io/openshift/origin-cli@sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d -v=8 --icsp-file=contrib/icsp.yaml

I0602 11:42:11.471694   24967 client_mirrored.go:372] get manifest for sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d served from registryclient.retryManifest{ManifestService:registryclient.manifestServiceVerifier{ManifestService:(*client.manifests)(0xc000e04c30)}, repo:(*registryclient.retryRepository)(0xc000e6bf00)}: manifest unknown: manifest unknown

Did not try the alternate at all:

kind: ImageContentSourcePolicy
apiVersion: operator.openshift.io/v1alpha1
spec:
  repositoryDigestMirrors:
  - source: docker.io/openshift/origin-cli
    mirrors:
    - quay.io/openshift/origin-cli

Also, did this PR add extra logging to the client stack? Seeing this:

○ oc explain imagecontentsourcepolicy.spec.repositoryDigestMirrors
E0602 11:44:33.296923   25032 request.go:1027] Unexpected error when reading response body: context deadline exceeded (Client.Timeout or context cancellation while reading body)
error: unexpected error when reading response body. Please retry. Original error: context deadline exceeded (Client.Timeout or context cancellation while reading body)

We should not be printing that...

@smarterclayton
Copy link
Contributor

Not sure why docker.io wasn't selected as an alternate, but it could be because of the legacy lookup behavior (in which case that's a workaround that should be solved when the ICSP is loaded

@smarterclayton
Copy link
Contributor

Failover did work, but didn't prefer the calculated ordering on subsequent calls.

I0602 11:48:42.299108   25197 client_mirrored.go:372] get manifest for sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d served from registryclient.retryManifest{ManifestService:registryclient.manifestServiceVerifier{ManifestService:(*client.manifests)(0xc000ae7350)}, repo:(*registryclient.retryRepository)(0xc000e62600)}: name unknown: repository name not known to registry
I0602 11:48:42.299135   25197 client_mirrored.go:166] Attempting to connect to quay.io/openshift/origin-cli
...
I0602 11:48:43.470154   25197 client_mirrored.go:372] get manifest for sha256:d0795436a770d7ad2b7b01c92f551f01b378d472f6034d2f3a97ed957125ef3d served from registryclient.retryManifest{ManifestService:registryclient.manifestServiceVerifier{ManifestService:(*client.manifests)(0xc0002450e0)}, repo:(*registryclient.retryRepository)(0xc000e62c00)}: <nil>
...
now tries to load the blob:
...
I0602 11:48:43.470236   25197 client_mirrored.go:166] Attempting to connect to quay.io/openshift/origin-test
^ the blob lookup should have been shared with the manifest lookup, didn't expect to go back to the source

It's possible the error lookup strategy should be returning an order that prefers the alternate first (if you get an error, on subsequent calls change the order). But need to think about it a bit.

if err := s.resolve(ctx, locator); err != nil {
return nil, err
}
if len(s.alternates) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the resolve method set a global is weird (just return the list). Also, why is this a list? If two calls are made to the same strategy with different locators, then the strategy has to get a new alternates list. So if you want to cache, you have to cache by locator, and you probably need to maintain the size of the cache (at least bound it to something high like 1024 entries).

In general, for this strategy you want to add the locator as the last option, not the first (also if it's not duplicated) when you're handling the "on error" strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

In general, for this strategy you want to add the locator as the last option, not the first (also if it's not duplicated) when you're handling the "on error" strategy.

I'm not quite sure I understand what you mean by that?

@soltysh
Copy link
Contributor Author

soltysh commented Jun 2, 2021

Also, did this PR add extra logging to the client stack? Seeing this:

This might have been introduced in kubernetes/kubernetes#102217

If PING fails against the registry

Yeah, I've noticed that when testing against missing registry.

If Auth fails against the repository (getting the token) because the current user doesn't have access

I guess we want a similar behavior as above.

Tested this locally, didn't seem to work:

It will test alternates only if it fails, not by default, although it should read the ICSP in.

@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2021

Not sure why docker.io wasn't selected as an alternate, but it could be because of the legacy lookup behavior (in which case that's a workaround that should be solved when the ICSP is loaded

Yeah, docker.io needs to be translated to registry-1.docker.io we have AsV2 in the reference.

@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2021

We need openshift/library-go#1096 too to land this

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2021
@soltysh soltysh force-pushed the bug1823143 branch 2 times, most recently from a6fcfbb to 3cc524f Compare June 9, 2021 11:18
@soltysh
Copy link
Contributor Author

soltysh commented Jun 9, 2021

@smarterclayton split into 2 separate strategies:

  • NewICSPOnErrorStrategy which falls back to ICSP only after getting an error, that's what's being used in oc image info
  • NewICSPExplicitStrategy which will be for oc adm release commands
    I've added TODO from @mtrmac about supporting partial matches.

This should be ready for final review.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 5, 2022

@soltysh: This pull request references Bugzilla bug 1823143, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhouying7780

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtrmac
Copy link
Contributor

mtrmac commented Apr 5, 2022

(meanwhile, openshift/api#874 has added two new CRDs representing mirrors — see openshift/runtime-utils#15 and openshift/machine-config-operator#3037 )

@soltysh
Copy link
Contributor Author

soltysh commented Apr 6, 2022

(meanwhile, openshift/api#874 has added two new CRDs representing mirrors — see openshift/runtime-utils#15 and openshift/machine-config-operator#3037 )

I'll update that in a followup, since we will want to support both.
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2022
@soltysh
Copy link
Contributor Author

soltysh commented Apr 6, 2022

/test e2e-aws-upgrade

@rosspeoples
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deejross, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 7, 2022

@soltysh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 daf7cd6 link false /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 16d88bf into openshift:master Apr 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 7, 2022

@soltysh: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1823143 has not been moved to the MODIFIED state.

Details

In response to this:

Bug 1823143: wire ICSP lookups to oc image info

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@soltysh soltysh deleted the bug1823143 branch April 7, 2022 09:00
eranco74 added a commit to eranco74/assisted-service that referenced this pull request Apr 12, 2022
The oc in this link contain the ICSP lookup feature
openshift/oc#829
The lack of ICSP support is the reason the assisted-service didn't
upstream oc since commit 317f122
openshift-merge-robot pushed a commit to openshift/assisted-service that referenced this pull request Apr 13, 2022
The oc in this link contain the ICSP lookup feature
openshift/oc#829
The lack of ICSP support is the reason the assisted-service didn't
upstream oc since commit 317f122
zaneb added a commit to zaneb/oc that referenced this pull request Jun 13, 2022
Allow the user to specify an ImageContentSourcePolicy file to fetch from
a mirror. This uses the implementation added in openshift#829 for the 'oc image
info' command.
soltysh pushed a commit to soltysh/oc that referenced this pull request Jun 14, 2022
Allow the user to specify an ImageContentSourcePolicy file to fetch from
a mirror. This uses the implementation added in openshift#829 for the 'oc image
info' command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants