-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[openshift-4.11] OCPBUGS-2111: Rebase openshift/etcd 4.12 onto v3.5.5 #152
Conversation
…ookup" This reverts commit 4f51cc1.
When clients have no permission to perform whatever operation, then the applying may fail. We should also move consistent_index forward in this case, otherwise the consitent_index may smaller than the snapshot index.
Update crypto to address CVE-2022-27191. The CVE fix is added in 0.0.0-20220315160706-3147a52a75dd but this change updates to latest.
we found a lease leak issue: if a new member(by member add) is recovered by snapshot, and then become leader, the lease will never expire afterwards. leader will log the revoke failure caused by "invalid auth token", since the token provider is not functional, and drops all generated token from upper layer, which in this case, is the lease revoking routine.
This PR removes additional clone when building artifacts. When releasing v3.5.4 this clone was main cause of issues and confusion about what release script is doing. release.sh script already clones repo in /tmp/ directory, so clonning before build is not needed. As precautions for bug in script leaving /tmp/ clone in bad state I moved "Verify the latest commit has the version tag" and added "Verify the clean working tree" to be always run before build.
The first bug fix is to resolve the race condition between goroutine and channel on the same leases to be revoked. It's a classic mistake in using Golang channel + goroutine. Please refer to https://go.dev/doc/effective_go#channels The second bug fix is to resolve the issue that etcd lessor may continue to schedule checkpoint after stepping down the leader role.
The FileReader interface is the wrapper of io.Reader. It provides the fs.FileInfo as well. The FileBufReader struct is the wrapper of bufio.Reader, it also provides fs.FileInfo. Signed-off-by: Benjamin Wang <[email protected]>
…file Currently the max size of each WAL entry is hard coded as 10MB. If users set a value > 10MB for the flag --max-request-bytes, then etcd may run into a situation that it successfully processes a big request, but fails to decode it when replaying the WAL file on startup. On the other hand, we can't just remove the limitation, because if a WAL entry is somehow corrupted, and its recByte is a huge value, then etcd may run out of memory. So the solution is to restrict the max size of each WAL entry as a dynamic value, which is the remaining size of the WAL file. Signed-off-by: Benjamin Wang <[email protected]>
Cherry pick the PR etcd-io#12992 to 3.5, so please refer to the original PR for more detailed info. Signed-off-by: Benjamin Wang <[email protected]>
Make sure that WithPrefix correctly set the flag, and add test. Also, add test for WithFromKey. fixes etcd-io#14056 Signed-off-by: Sahdev Zala <[email protected]>
The golang buildin package `flag` doesn't support `uint32` data type, so we need to support it via the `flag.Var`. Signed-off-by: Benjamin Wang <[email protected]>
…each client can open at a time Also refer to etcd-io#14169 (comment) Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Jille Timmermans <[email protected]>
Signed-off-by: Hitoshi Mitake <[email protected]>
This changes the default parent-based trace sampling rate from 100% to 0%. Due to the high QPS etcd can handle, having 100% trace sampling leads to very high resource usage. Defaulting to 0% means that only already-sampled traces will be sampled in etcd. Fixes etcd-io#14310 Signed-off-by: Mike Dame <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
Upgrade grpc to 1.41.0; Run ./script/fix.sh to fix all related issue. Signed-off-by: Benjamin Wang <[email protected]>
Refer to etcd-io@a0bdfc4 Signed-off-by: Benjamin Wang <[email protected]>
Refer to etcd-io#14318 Signed-off-by: Benjamin Wang <[email protected]>
Streams are now closed after being used in the lessor `keepAliveOnce` method. This prevents the "failed to receive lease keepalive request from gRPC stream" message from being logged by the server after the context is cancelled by the client. Signed-off-by: Justin Kolberg <[email protected]>
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod` by default, so if you want to warp multiple layers of net.Listener, the `keepaliveListener` should be the one which is closest to the original `net.Listener` implementation, namely `TCPListener`. Also refer to etcd-io#14356 Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Vitalii Levitskii <[email protected]>
Problem: We pass grpc context down to applier in readonly serializable txn. This context can be cancelled for example due to timeout. This will trigger panic inside applyTxn Solution: Only panic for transactions with write operations fixes etcd-io#14110 main PR etcd-io#14149 Signed-off-by: Bogdan Kanivets <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
Signed-off-by: Marek Siarkowicz <[email protected]>
This has been additionally verified by running the tests locally as a basic smoke test. GitHub Actions doesn't provide MacOS M1 (arm64) yet, so there's no good way to automate testing. Ran `TMPDIR=/tmp make test` locally. The `TMPDIR` bit is needed so there's no really long path used that breaks Unix socket setup in one of the tests. Signed-off-by: Marek Siarkowicz <[email protected]>
- permissions were incorrectly loaded on restarts. - etcd-io#14355 - Backport of etcd-io#14358 Signed-off-by: vivekpatani <[email protected]>
A WAL object was closed by defer, however the WAL was rewritten afterwards, so defer closed already closed WAL but not the new one. It caused a data race between writing file and cleaning up a temporary test directory, which led to a non-deterministic bug. Fixes etcd-io#14332 Signed-off-by: Vladimir Sokolov <[email protected]>
Due to a duplicate call of clientConfigFromCmd, the move-leader command would fail with "conflicting environment variable is shadowed by corresponding command-line flag". Also in scenarios where no command-line flag was supplied. Signed-off-by: Thomas Jungblut <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
@openshift-cherrypick-robot: Jira Issue OCPBUGS-861 has been cloned as Jira Issue OCPBUGS-2111. Retitling PR to link against new bug. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Elbehery, openshift-cherrypick-robot The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Why is this missing the merge pull request commits from the 4.12 PR? See: https://github.com/openshift/etcd/pull/144/commits |
let's continue here: #155 /close |
@tjungblu: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-2111. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an automated cherry-pick of #144
/assign tjungblu