[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

openshift-cherrypick-robot · 2022-10-07T15:36:07Z

This is an automated cherry-pick of #144

/assign tjungblu

…ookup" This reverts commit 4f51cc1.

When clients have no permission to perform whatever operation, then the applying may fail. We should also move consistent_index forward in this case, otherwise the consitent_index may smaller than the snapshot index.

Update crypto to address CVE-2022-27191. The CVE fix is added in 0.0.0-20220315160706-3147a52a75dd but this change updates to latest.

we found a lease leak issue: if a new member(by member add) is recovered by snapshot, and then become leader, the lease will never expire afterwards. leader will log the revoke failure caused by "invalid auth token", since the token provider is not functional, and drops all generated token from upper layer, which in this case, is the lease revoking routine.

This PR removes additional clone when building artifacts. When releasing v3.5.4 this clone was main cause of issues and confusion about what release script is doing. release.sh script already clones repo in /tmp/ directory, so clonning before build is not needed. As precautions for bug in script leaving /tmp/ clone in bad state I moved "Verify the latest commit has the version tag" and added "Verify the clean working tree" to be always run before build.

The first bug fix is to resolve the race condition between goroutine and channel on the same leases to be revoked. It's a classic mistake in using Golang channel + goroutine. Please refer to https://go.dev/doc/effective_go#channels The second bug fix is to resolve the issue that etcd lessor may continue to schedule checkpoint after stepping down the leader role.

The FileReader interface is the wrapper of io.Reader. It provides the fs.FileInfo as well. The FileBufReader struct is the wrapper of bufio.Reader, it also provides fs.FileInfo. Signed-off-by: Benjamin Wang <[email protected]>

…file Currently the max size of each WAL entry is hard coded as 10MB. If users set a value > 10MB for the flag --max-request-bytes, then etcd may run into a situation that it successfully processes a big request, but fails to decode it when replaying the WAL file on startup. On the other hand, we can't just remove the limitation, because if a WAL entry is somehow corrupted, and its recByte is a huge value, then etcd may run out of memory. So the solution is to restrict the max size of each WAL entry as a dynamic value, which is the remaining size of the WAL file. Signed-off-by: Benjamin Wang <[email protected]>

Cherry pick the PR etcd-io#12992 to 3.5, so please refer to the original PR for more detailed info. Signed-off-by: Benjamin Wang <[email protected]>

Make sure that WithPrefix correctly set the flag, and add test. Also, add test for WithFromKey. fixes etcd-io#14056 Signed-off-by: Sahdev Zala <[email protected]>

The golang buildin package `flag` doesn't support `uint32` data type, so we need to support it via the `flag.Var`. Signed-off-by: Benjamin Wang <[email protected]>

…each client can open at a time Also refer to etcd-io#14169 (comment) Signed-off-by: Benjamin Wang <[email protected]>

Signed-off-by: Benjamin Wang <[email protected]>

Signed-off-by: Jille Timmermans <[email protected]>

Signed-off-by: Hitoshi Mitake <[email protected]>

This changes the default parent-based trace sampling rate from 100% to 0%. Due to the high QPS etcd can handle, having 100% trace sampling leads to very high resource usage. Defaulting to 0% means that only already-sampled traces will be sampled in etcd. Fixes etcd-io#14310 Signed-off-by: Mike Dame <[email protected]>

Signed-off-by: Benjamin Wang <[email protected]>

Upgrade grpc to 1.41.0; Run ./script/fix.sh to fix all related issue. Signed-off-by: Benjamin Wang <[email protected]>

Refer to etcd-io@a0bdfc4 Signed-off-by: Benjamin Wang <[email protected]>

Refer to etcd-io#14318 Signed-off-by: Benjamin Wang <[email protected]>

Streams are now closed after being used in the lessor `keepAliveOnce` method. This prevents the "failed to receive lease keepalive request from gRPC stream" message from being logged by the server after the context is cancelled by the client. Signed-off-by: Justin Kolberg <[email protected]>

Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod` by default, so if you want to warp multiple layers of net.Listener, the `keepaliveListener` should be the one which is closest to the original `net.Listener` implementation, namely `TCPListener`. Also refer to etcd-io#14356 Signed-off-by: Benjamin Wang <[email protected]>

Signed-off-by: Vitalii Levitskii <[email protected]>

Problem: We pass grpc context down to applier in readonly serializable txn. This context can be cancelled for example due to timeout. This will trigger panic inside applyTxn Solution: Only panic for transactions with write operations fixes etcd-io#14110 main PR etcd-io#14149 Signed-off-by: Bogdan Kanivets <[email protected]>

Signed-off-by: Marek Siarkowicz <[email protected]>

This has been additionally verified by running the tests locally as a basic smoke test. GitHub Actions doesn't provide MacOS M1 (arm64) yet, so there's no good way to automate testing. Ran `TMPDIR=/tmp make test` locally. The `TMPDIR` bit is needed so there's no really long path used that breaks Unix socket setup in one of the tests. Signed-off-by: Marek Siarkowicz <[email protected]>

- permissions were incorrectly loaded on restarts. - etcd-io#14355 - Backport of etcd-io#14358 Signed-off-by: vivekpatani <[email protected]>

A WAL object was closed by defer, however the WAL was rewritten afterwards, so defer closed already closed WAL but not the new one. It caused a data race between writing file and cleaning up a temporary test directory, which led to a non-deterministic bug. Fixes etcd-io#14332 Signed-off-by: Vladimir Sokolov <[email protected]>

Due to a duplicate call of clientConfigFromCmd, the move-leader command would fail with "conflicting environment variable is shadowed by corresponding command-line flag". Also in scenarios where no command-line flag was supplied. Signed-off-by: Thomas Jungblut <[email protected]>

Signed-off-by: Benjamin Wang <[email protected]>

openshift-ci-robot · 2022-10-07T15:36:17Z

@openshift-cherrypick-robot: Jira Issue OCPBUGS-861 has been cloned as Jira Issue OCPBUGS-2111. Retitling PR to link against new bug.
/retitle [openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2022-10-07T15:36:39Z

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111, which is invalid:

expected dependent Jira Issue OCPBUGS-861 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-10-07T15:36:43Z

@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-10-07T19:30:02Z

@openshift-cherrypick-robot: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Elbehery · 2022-10-07T19:41:05Z

/lgtm

openshift-ci · 2022-10-07T19:41:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Elbehery, openshift-cherrypick-robot
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval by writing /assign @deads2k in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hasbro17 · 2022-10-07T20:39:31Z

Why is this missing the merge pull request commits from the 4.12 PR? See: https://github.com/openshift/etcd/pull/144/commits
Not sure why the cherry-picked PR is ignoring those but we may need to do a manual rebase with --rebase-merges to 3.5.5 for this one to preserve those as well.

tjungblu · 2022-10-10T09:27:13Z

let's continue here: #155

/close

openshift-ci · 2022-10-10T09:38:03Z

@tjungblu: Closed this PR.

In response to this:

let's continue here: #155

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2022-10-10T09:38:04Z

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

This is an automated cherry-pick of #144

/assign tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liggitt and others added 30 commits October 7, 2022 15:36

Revert "trim the suffix dot from the srv.Target for etcd-client DNS l…

c9915c8

…ookup" This reverts commit 4f51cc1.

Add unit test for canonical SRV records

9040832

Update conssitent_index when applying fails

a06a192

When clients have no permission to perform whatever operation, then the applying may fail. We should also move consistent_index forward in this case, otherwise the consitent_index may smaller than the snapshot index.

version: bump up to 3.5.4

d8df590

Update golang.org/x/crypto to latest

73b24cd

Update crypto to address CVE-2022-27191. The CVE fix is added in 0.0.0-20220315160706-3147a52a75dd but this change updates to latest.

scripts: Add tests for release scripts

0774450

Make DRY_RUN explicit

96cb4c4

scripts: Detect staged files before building release

b0bc484

Add FileReader and FileBufReader utilities

ecb2ca9

The FileReader interface is the wrapper of io.Reader. It provides the fs.FileInfo as well. The FileBufReader struct is the wrapper of bufio.Reader, it also provides fs.FileInfo. Signed-off-by: Benjamin Wang <[email protected]>

client/v3: do not overwrite authTokenBundle on dial

2e24e6b

Cherry pick the PR etcd-io#12992 to 3.5, so please refer to the original PR for more detailed info. Signed-off-by: Benjamin Wang <[email protected]>

Client: fix check for WithPrefix op

0bf5289

Make sure that WithPrefix correctly set the flag, and add test. Also, add test for WithFromKey. fixes etcd-io#14056 Signed-off-by: Sahdev Zala <[email protected]>

add the uint32Value data type

5833a0f

The golang buildin package `flag` doesn't support `uint32` data type, so we need to support it via the `flag.Var`. Signed-off-by: Benjamin Wang <[email protected]>

Add flag --max-concurrent-streams to set the max concurrent stream …

ec0690d

…each client can open at a time Also refer to etcd-io#14169 (comment) Signed-off-by: Benjamin Wang <[email protected]>

add e2e test cases to cover the maxConcurrentStreams

32c34ea

Signed-off-by: Benjamin Wang <[email protected]>

Improve error message for incorrect values of ETCD_CLIENT_DEBUG

5940a4b

Signed-off-by: Jille Timmermans <[email protected]>

server/auth: protect rangePermCache with a RW lock

746b884

Signed-off-by: Hitoshi Mitake <[email protected]>

etcdserver: bump OpenTelemetry to 1.0.1

d2ff98b

Signed-off-by: Benjamin Wang <[email protected]>

move setupTracing into a separate file config_tracing.go

d0bec57

Signed-off-by: Benjamin Wang <[email protected]>

update all related dependencies

d62d3c5

Upgrade grpc to 1.41.0; Run ./script/fix.sh to fix all related issue. Signed-off-by: Benjamin Wang <[email protected]>

Fix the failure in TestEndpointSwitchResolvesViolation

603eac4

Refer to etcd-io@a0bdfc4 Signed-off-by: Benjamin Wang <[email protected]>

Change default sampling rate from 100% to 0%

c6575ee

Refer to etcd-io#14318 Signed-off-by: Benjamin Wang <[email protected]>

Backport of pull/14354 to 3.5.5

d093090

Signed-off-by: Vitalii Levitskii <[email protected]>

serathius and others added 11 commits October 7, 2022 15:36

server: Refactor compaction checker

09a1abb

Signed-off-by: Marek Siarkowicz <[email protected]>

tests: Cover periodic check in tests

cbe43f5

Signed-off-by: Marek Siarkowicz <[email protected]>

server: Implement compaction hash checking

dd24c4e

Signed-off-by: Marek Siarkowicz <[email protected]>

server: Make corrtuption check optional and period configurable

d492ad3

Signed-off-by: Marek Siarkowicz <[email protected]>

tests: Fix member id in CORRUPT alarm

39e040e

Signed-off-by: Marek Siarkowicz <[email protected]>

server,test: refresh cache on each NewAuthStore

130ebcf

- permissions were incorrectly loaded on restarts. - etcd-io#14355 - Backport of etcd-io#14358 Signed-off-by: vivekpatani <[email protected]>

fix the flaky test fix_TestV3AuthRestartMember_20220913 for 3.5

c33ea4c

Signed-off-by: Benjamin Wang <[email protected]>

version: bump up to 3.5.5

c7f0098

openshift-cherrypick-robot mentioned this pull request Oct 7, 2022

OCPBUGS-861: Rebase openshift/etcd 4.12 onto v3.5.5 #144

Merged

openshift-cherrypick-robot assigned tjungblu Oct 7, 2022

openshift-ci bot changed the title ~~[openshift-4.11] OCPBUGS-861: Rebase openshift/etcd 4.12 onto v3.5.5~~ [openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 Oct 7, 2022

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 7, 2022

openshift-ci bot requested review from Elbehery and wking October 7, 2022 15:37

openshift-ci bot assigned Elbehery Oct 7, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2022

openshift-ci bot closed this Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

openshift-cherrypick-robot commented Oct 7, 2022

openshift-ci-robot commented Oct 7, 2022

openshift-ci-robot commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

Elbehery commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

hasbro17 commented Oct 7, 2022

tjungblu commented Oct 10, 2022

openshift-ci bot commented Oct 10, 2022

openshift-ci-robot commented Oct 10, 2022

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152

Conversation

openshift-cherrypick-robot commented Oct 7, 2022

openshift-ci-robot commented Oct 7, 2022

openshift-ci-robot commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

Elbehery commented Oct 7, 2022

openshift-ci bot commented Oct 7, 2022

hasbro17 commented Oct 7, 2022

tjungblu commented Oct 10, 2022

openshift-ci bot commented Oct 10, 2022

openshift-ci-robot commented Oct 10, 2022