Skip to content

Comments

Bug 1874106: Split work of oc image mirror to avoid AuthHeaderTooLong error from registry#761

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
sallyom:new-retry-bz1874106
Mar 20, 2021
Merged

Bug 1874106: Split work of oc image mirror to avoid AuthHeaderTooLong error from registry#761
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
sallyom:new-retry-bz1874106

Conversation

@sallyom
Copy link
Contributor

@sallyom sallyom commented Mar 11, 2021

While mirroring multiple images, quay.io sends http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=ENHANCE_YOUR_CALM, debug="""" , Request Header Or Cookie Too Large</center>\r\n<hr><center>nginx/1.14.1
To avoid this, split the mappings into groups - []Mappings length 10

This PR sets a hard-coded length of 10 for []Mapping from which the imageTrees and plan are built. I've seen from local testing that oc image mirror fails when given > ~20 SRC=DST mappings.

When sending more that ~20 mappings with oc image mirror -f file-w-more-than-20 and/or w/ oc adm catalog mirror the Request Header Or Cookie Too Large</center>\r\n<hr><center>nginx/1.14.1 or the http2 ErrCode=ENHANCE_YOUR_CALM error is encountered. This PR splits large groups of mappings into smaller groups.

From discussions, this probably is not going to be fixed in quay, ie quay would rather not disable header size checks as this is considered insecure.

@openshift-ci-robot
Copy link

@sallyom: This pull request references Bugzilla bug 1874106, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1874106: Split work of oc image mirror to avoid AuthHeaderTooLong error from registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 11, 2021
@openshift-ci-robot
Copy link

@sallyom: This pull request references Bugzilla bug 1874106, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1874106: Split work of oc image mirror to avoid AuthHeaderTooLong error from registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sallyom
Copy link
Contributor Author

sallyom commented Mar 11, 2021

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 11, 2021
@openshift-ci-robot
Copy link

@sallyom: This pull request references Bugzilla bug 1874106, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhouying7780

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 11, 2021
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
a few nits

func buildTargetTrees(mappings []Mapping) []targetTree {
var trees []targetTree
// split targetTrees into groups of 10
splitMappings := split(mappings, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make that 10 a constant so it's easier to change, if needed and it's documented for free this way 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

m := mappings[i:min(i+size, len(mappings))]
splitMappings = append(splitMappings, m)
}
klog.Infof("LEN SPLIT MAPPING: %d", len(splitMappings))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print - remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

return fmt.Errorf("SRC and DST may not be the same")
}
}
klog.Infof("LENGTH OF MAPPINGS: %v", len(o.Mappings))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if err == nil {
// blob exists, skip
klog.V(5).Infof("Server reports blob exists %#v", blob)
klog.V(0).Infof("Server reports blob exists %#v", blob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug leftover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2021
@soltysh
Copy link
Contributor

soltysh commented Mar 18, 2021

/cherry-pick release-4.7

/cherry-pick release-4.6

@openshift-cherrypick-robot

@soltysh: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.7

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@soltysh
Copy link
Contributor

soltysh commented Mar 18, 2021

/cherry-pick release-4.6

@openshift-cherrypick-robot

@soltysh: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sallyom sallyom force-pushed the new-retry-bz1874106 branch 2 times, most recently from a200066 to 0d3108f Compare March 18, 2021 17:41
@vikaslaad
Copy link

/retest

start := time.Now()
p, err := o.plan()
plans, err := o.plan()
//p, err := o.plan()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably drop this commented-out line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated!

…s from registry server

While mirroring multiple images, quay.io sends "http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=ENHANCE_YOUR_CALM, debug=""""
To avoid this, split the mappings into small chunks that the registry server can easily handle.
@sallyom sallyom force-pushed the new-retry-bz1874106 branch from 0d3108f to 6886461 Compare March 19, 2021 03:27
@tsmetana
Copy link
Member

/retest

1 similar comment
@tsmetana
Copy link
Member

/retest

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm


// As length of mappings increases, so does Authorization Header size, and this causes upload failures with
// registries that have a header size limit (quay.io)
var maxLenMappings = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be just plain old const 😉

@soltysh
Copy link
Contributor

soltysh commented Mar 19, 2021

/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 19, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sallyom, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tsmetana
Copy link
Member

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@vikaslaad
Copy link

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4928ad0 into openshift:master Mar 20, 2021
@openshift-ci-robot
Copy link

@sallyom: All pull requests linked via external trackers have merged:

Bugzilla bug 1874106 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1874106: Split work of oc image mirror to avoid AuthHeaderTooLong error from registry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@soltysh: #761 failed to apply on top of branch "release-4.6":

Applying: Bug 1874106: Split work of oc image mirror into chunks to avoid errors from registry server
Using index info to reconstruct a base tree...
M	pkg/cli/image/mirror/mirror.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/cli/image/mirror/mirror.go
CONFLICT (content): Merge conflict in pkg/cli/image/mirror/mirror.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Bug 1874106: Split work of oc image mirror into chunks to avoid errors from registry server
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 22, 2021

I'm a bit concerned about this change. I don't see how splitting the plan up has anything to do with a quay level issue. Are you hitting a rate limit? If so, rate limiting the speed of uploads or parallelism is the approach. Can you explain why splitting up the tree has

The whole point of a plan is that it's doing minimal work. I don't think splitting up the target tree is the right approach. You should instead be rate limiting how many chunks get executed.

@smarterclayton
Copy link
Contributor

I'm super concerned that this doesn't have anything to do with the actual problem - if we're executing requests that have too much data in query parameters, that's different than splitting up the plan.

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 22, 2021

Splitting up auth header too long (the actual error referenced in the bug) is about how many tokens we send. We had an earlier workaround for that that had to do with the fact that a single token can't authenticate more than once.

--skip-multiple-scopes=false

This PR really should be reverted, and instead if you see the error while we're getting authorization we should recommend to the user they use --skip-multiple-scopes OR we should automatically default this flag to true when we see this error, OR we should try to create scopes in batches (not all at once).

@soltysh
Copy link
Contributor

soltysh commented Mar 23, 2021

This PR really should be reverted, and instead if you see the error while we're getting authorization we should recommend to the user they use --skip-multiple-scopes OR we should automatically default this flag to true when we see this error, OR we should try to create scopes in batches (not all at once).

@sallyom can you revert that and we need to sync with @smarterclayton about batching, the temporary workaround is to use --skip-multiple-scopes I'm not quite sure where the batching would live in. Do you want to do that in oc or rather in the registryclient?

@smarterclayton
Copy link
Contributor

we should be able to select the appropriate transport based on which tokens we need already, so i think this is just about avoiding excessively long sets of scopes on large mirrors (if the theory is correct, that would be library-go)

@sallyom
Copy link
Contributor Author

sallyom commented Mar 24, 2021

--skip-multiple-skopes resolves https://bugzilla.redhat.com/show_bug.cgi?id=1874106, I'll update the bug and we can revert
This flag is another one that is not well understood, as nobody in the travels of diagnosing this bz knew about it :( we can add better description, ie, this:
--skip-multiple-scopes=false: Some registries do not support multiple scopes passed to the registry login. does not , to an average human, equate to
if you see this error: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=ENHANCE_YOUR_CALM or anything about an authorization header that is too long while trying to mirror multiple images using a mapping file, then --skip-multiple-scopes might be what you need.

@smarterclayton
Copy link
Contributor

I think the probably fix is to identify when the scope size triggers the issue (i.e. there is some length of the header that will break quay). we should identify what that limit is, and if it's reasonable we should investigate whether we can split the tokens up so that when you access repo A we find the connection that has the right perm on repo A instead of just assuming any connection to the api. I think that's tractable, although the nuance is that for cross mounting we need push on the target repo and pull on the source repo on the same target. So we could instead just look at how many scopes we will need and if > limit we'd simply default to skip-multiple-scopes true.

@sallyom
Copy link
Contributor Author

sallyom commented Mar 24, 2021

opened #780 to revert this change

@soltysh
Copy link
Contributor

soltysh commented Mar 25, 2021

I checked there's no mention of size limit for that header, but it was mentioned that most setups limit that at 4k/8k. We can probably make a safe bet with 4k limit and switch to skip-multiple-scopes at that, wdyt @smarterclayton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants