Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-861: Rebase openshift/etcd 4.12 onto v3.5.5 #144

Merged

Conversation

tjungblu
Copy link

@tjungblu tjungblu commented Sep 6, 2022

This was rebased with:

git rebase --rebase-merges --fork-point v3.5.3 v3.5.5

this seems to effectively pick all commits between the tags and apply them onto the branch HEAD including the merge commits. thanks @hasbro17

liggitt and others added 30 commits April 15, 2022 15:33
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
…_353

[3.5] Update consitent_index when applying fails
Update crypto to address CVE-2022-27191.

The CVE fix is added in 0.0.0-20220315160706-3147a52a75dd but this
change updates to latest.
we found a lease leak issue:
if a new member(by member add) is recovered by snapshot, and then
become leader, the lease will never expire afterwards. leader will
log the revoke failure caused by "invalid auth token", since the
token provider is not functional, and drops all generated token
from upper layer, which in this case, is the lease revoking
routine.
[backport 3.5]: server/auth: enable tokenProvider if recoved store enables auth
This PR removes additional clone when building artifacts.

When releasing v3.5.4 this clone was main cause of issues and
confusion about what release script is doing.

release.sh script already clones repo in /tmp/ directory, so clonning
before build is not needed. As precautions for bug in script leaving
/tmp/ clone in bad state  I moved "Verify the latest commit has the
version tag" and added "Verify the clean working tree" to be always run
before build.
[release-3.5] scripts: Avoid additional repo clone
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels

The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
[3.5] Backport two lease related bug fixes to 3.5
The FileReader interface is the wrapper of io.Reader. It provides
the fs.FileInfo as well. The FileBufReader struct is the wrapper of
bufio.Reader, it also provides fs.FileInfo.

Signed-off-by: Benjamin Wang <[email protected]>
…file

Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.

On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.

Signed-off-by: Benjamin Wang <[email protected]>
[3.5] Restrict the max size of each WAL entry to the remaining size of the WAL file
Cherry pick the PR etcd-io#12992
to 3.5, so please refer to the original PR for more detailed info.

Signed-off-by: Benjamin Wang <[email protected]>
[3.5] client/v3: do not overwrite authTokenBundle on dial
Make sure that WithPrefix correctly set the flag, and add test.
Also, add test for WithFromKey.

fixes etcd-io#14056

Signed-off-by: Sahdev Zala <[email protected]>
…-#14182-upstream-release-3.5

Automated cherry pick of etcd-io#14182
The golang buildin package `flag` doesn't support `uint32` data
type, so we need to support it via the `flag.Var`.

Signed-off-by: Benjamin Wang <[email protected]>
…each client can open at a time

Also refer to etcd-io#14169 (comment)

Signed-off-by: Benjamin Wang <[email protected]>
[3.5] Support configuring `MaxConcurrentStreams` for http2
@hasbro17 hasbro17 added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Oct 5, 2022
@hasbro17
Copy link

hasbro17 commented Oct 5, 2022

/hold
@tjungblu feel free to unhold once we're clear on the nightly payload ci run.
Although we're already just waiting for aws-serial to clear which is the same.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 5, 2022
@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

4 similar comments
@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

@Elbehery
Copy link

Elbehery commented Oct 6, 2022

seems there was some networking trouble between peers for some time:

image

the errors are fine:

image

I'm a bit afraid this health event is too spammy in CEO, I've seen something similar here: https://issues.redhat.com/browse/OCPBUGS-1128?focusedCommentId=20991825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-20991825

The events are very spaced out though, they were at 11:44, then 11:49/12:01 then 12:58, 13:28. Can't tell from the etcd log what made it return "unhealthy" however.

how did u get this prometheus ? i am just curious to use the same approach debugging my failed ci :)

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

@Elbehery https://promecieus.dptools.openshift.org/?search= just paste the prow url

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

1 similar comment
@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

guess everybody is merging on the last day of the release today. I'll retry, maybe it goes through over the night :)

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2022
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 31b6b2d and 2 for PR HEAD 19002cf in total

@tjungblu
Copy link
Author

tjungblu commented Oct 6, 2022

/retest-required

2 similar comments
@tjungblu
Copy link
Author

tjungblu commented Oct 7, 2022

/retest-required

@tjungblu
Copy link
Author

tjungblu commented Oct 7, 2022

/retest-required

@openshift-ci
Copy link

openshift-ci bot commented Oct 7, 2022

@tjungblu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 579ed1c into openshift:openshift-4.12 Oct 7, 2022
@openshift-ci-robot
Copy link

@tjungblu: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-861 has been moved to the MODIFIED state.

In response to this:

This was rebased with:

git rebase --rebase-merges --fork-point v3.5.3 v3.5.5

this seems to effectively pick all commits between the tags and apply them onto the branch HEAD including the merge commits. thanks @hasbro17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@tjungblu: new pull request created: #152

In response to this:

/cherry-pick openshift-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@geliu2016
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Oct 8, 2022
@tjungblu tjungblu deleted the rebase-3.5.4-forkpoint branch October 10, 2022 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.