Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #144

/assign tjungblu

liggitt and others added 30 commits October 7, 2022 15:36
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
Update crypto to address CVE-2022-27191.

The CVE fix is added in 0.0.0-20220315160706-3147a52a75dd but this
change updates to latest.
we found a lease leak issue:
if a new member(by member add) is recovered by snapshot, and then
become leader, the lease will never expire afterwards. leader will
log the revoke failure caused by "invalid auth token", since the
token provider is not functional, and drops all generated token
from upper layer, which in this case, is the lease revoking
routine.
This PR removes additional clone when building artifacts.

When releasing v3.5.4 this clone was main cause of issues and
confusion about what release script is doing.

release.sh script already clones repo in /tmp/ directory, so clonning
before build is not needed. As precautions for bug in script leaving
/tmp/ clone in bad state  I moved "Verify the latest commit has the
version tag" and added "Verify the clean working tree" to be always run
before build.
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels

The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
The FileReader interface is the wrapper of io.Reader. It provides
the fs.FileInfo as well. The FileBufReader struct is the wrapper of
bufio.Reader, it also provides fs.FileInfo.

Signed-off-by: Benjamin Wang <[email protected]>
…file

Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.

On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.

Signed-off-by: Benjamin Wang <[email protected]>
Cherry pick the PR etcd-io#12992
to 3.5, so please refer to the original PR for more detailed info.

Signed-off-by: Benjamin Wang <[email protected]>
Make sure that WithPrefix correctly set the flag, and add test.
Also, add test for WithFromKey.

fixes etcd-io#14056

Signed-off-by: Sahdev Zala <[email protected]>
The golang buildin package `flag` doesn't support `uint32` data
type, so we need to support it via the `flag.Var`.

Signed-off-by: Benjamin Wang <[email protected]>
…each client can open at a time

Also refer to etcd-io#14169 (comment)

Signed-off-by: Benjamin Wang <[email protected]>
This changes the default parent-based trace sampling rate from
100% to 0%. Due to the high QPS etcd can handle, having 100% trace
sampling leads to very high resource usage. Defaulting to 0% means
that only already-sampled traces will be sampled in etcd.

Fixes etcd-io#14310

Signed-off-by: Mike Dame <[email protected]>
Upgrade grpc to 1.41.0;
Run ./script/fix.sh to fix all related issue.

Signed-off-by: Benjamin Wang <[email protected]>
Streams are now closed after being used in the lessor `keepAliveOnce` method.
This prevents the "failed to receive lease keepalive request from gRPC stream"
message from being logged by the server after the context is cancelled by the
client.

Signed-off-by: Justin Kolberg <[email protected]>
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.

Also refer to etcd-io#14356

Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Vitalii Levitskii <[email protected]>
Problem: We pass grpc context down to applier in readonly serializable txn.
This context can be cancelled for example due to timeout.
This will trigger panic inside applyTxn

Solution: Only panic for transactions with write operations

fixes etcd-io#14110
main PR etcd-io#14149

Signed-off-by: Bogdan Kanivets <[email protected]>
serathius and others added 11 commits October 7, 2022 15:36
Signed-off-by: Marek Siarkowicz <[email protected]>
This has been additionally verified by running the tests locally as a
basic smoke test. GitHub Actions doesn't provide MacOS M1 (arm64) yet,
so there's no good way to automate testing.

Ran `TMPDIR=/tmp make test` locally. The `TMPDIR` bit is needed so
there's no really long path used that breaks Unix socket setup in one of
the tests.

Signed-off-by: Marek Siarkowicz <[email protected]>
- permissions were incorrectly loaded on restarts.
- etcd-io#14355
- Backport of etcd-io#14358

Signed-off-by: vivekpatani <[email protected]>
A WAL object was closed by defer, however the WAL was rewritten afterwards,
so defer closed already closed WAL but not the new one. It caused a data
race between writing file and cleaning up a temporary test directory,
which led to a non-deterministic bug.

Fixes etcd-io#14332

Signed-off-by: Vladimir Sokolov <[email protected]>
Due to a duplicate call of clientConfigFromCmd, the move-leader command
would fail with "conflicting environment variable is shadowed by corresponding command-line flag".
Also in scenarios where no command-line flag was supplied.

Signed-off-by: Thomas Jungblut <[email protected]>
@openshift-ci-robot
Copy link

@openshift-cherrypick-robot: Jira Issue OCPBUGS-861 has been cloned as Jira Issue OCPBUGS-2111. Retitling PR to link against new bug.
/retitle [openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title [openshift-4.11] OCPBUGS-861: Rebase openshift/etcd 4.12 onto v3.5.5 [openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 Oct 7, 2022
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 7, 2022
@openshift-ci-robot
Copy link

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111, which is invalid:

  • expected dependent Jira Issue OCPBUGS-861 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Oct 7, 2022

@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from Elbehery and wking October 7, 2022 15:37
@openshift-ci
Copy link

openshift-ci bot commented Oct 7, 2022

@openshift-cherrypick-robot: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Elbehery
Copy link

Elbehery commented Oct 7, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022
@openshift-ci
Copy link

openshift-ci bot commented Oct 7, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Elbehery, openshift-cherrypick-robot
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval by writing /assign @deads2k in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hasbro17
Copy link

hasbro17 commented Oct 7, 2022

Why is this missing the merge pull request commits from the 4.12 PR? See: https://github.com/openshift/etcd/pull/144/commits
Not sure why the cherry-picked PR is ignoring those but we may need to do a manual rebase with --rebase-merges to 3.5.5 for this one to preserve those as well.

@tjungblu
Copy link

let's continue here: #155

/close

@openshift-ci openshift-ci bot closed this Oct 10, 2022
@openshift-ci
Copy link

openshift-ci bot commented Oct 10, 2022

@tjungblu: Closed this PR.

In response to this:

let's continue here: #155

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.