-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"took too long to execute" always in etcd logs #10860
Comments
warning log like is
|
@boxuan666 Thanks for reporting!
Does this mean the P99 duration is 115.7ms? If so, it is too slow. I am not super familair with
|
Hi bro
|
The "took too long to execute" warning is triggered whenever applying a raft entry to backend store took more than 100 ms. Backend commit duration is usually the most dominant part in the duration of apply. From the metrics you just posted, it looks like the P99 is in the order of 32 ms. But anything in 128 ms bucket and above will definitely trigger a warning, which in your case there are 20k or so. So it is expected that you will see a lot of the warnings in the server log. I think you should use a faster disk. |
Install etcd 3.3.13 or later to solve the problem, I have verified it. |
i want to know why. |
How can I adjust this 100ms "took too long" warning threshold? I have an etcd cluster running on underpowered arm32 machines. I am quite happy with the speed, but now my logs are full of these "warnings", hiding real errors. I have already increased (Note: I'm running 3.3.10 - perhaps this is already improved in later versions?) |
Starting from 3.4, log level is configurable, which means you can choose to log only those levels higher than warning. |
Thanks for the workaround. To be clear: I think you're saying I need to disable all warnings because the 100ms warning threshold is hard-coded. The various examples above indicate that 100ms is not an outlier for some deployments, and generally it seems impossible to mandate a fixed response SLA on behalf of all users. I suggest we turn this issue into a feature request: "too-long warning threshold should be configurable" (or perhaps even just remove this warning and rely on metrics?) |
Did a PR to introduce the threshold limit. |
I am having exactly the same issue with rancher 2.4.3. Tried with ebs optimised instance and porvissioned IOPS, not helping |
The problem still exists in 3.4.13, please refer to 17305 |
Could you suggest how to apply that flag to existing etcd cluster? I have configured etcdctl but it doesn't have that flag. My etcd API version is 3.5. ETCD is embedded into k3s |
ok, I found config file |
Please read https://github.com/etcd-io/etcd/blob/master/Documentation/reporting_bugs.md.
My etcd version is 3.3.10
backend_commit_duration_seconds like is
wal_fsync_duration_seconds like is
is the disk too slow cause this warning ?
The text was updated successfully, but these errors were encountered: