Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower throughput while having more and more watchers #19064

Open
4 tasks done
jokerwyt opened this issue Dec 14, 2024 · 2 comments
Open
4 tasks done

Lower throughput while having more and more watchers #19064

jokerwyt opened this issue Dec 14, 2024 · 2 comments
Labels
area/performance priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@jokerwyt
Copy link

Bug report criteria

What happened?

I am doing an ETCD throughput benchmark. I observed a throughput drop while having more and more watchers.

How I conduct my benchmark

  • I use a fixed number of separate clients, each of which keeps sending a simple Txn in Kubernetes-like optimistic creation.
  • In the meantime, I also launched a fixed number of watchers to watch the prefix I used to create KV.
  • And I also compact every 10 seconds.
    Key length ~10 bytes, value length 1301 bytes

The full code can be found here: https://gist.github.com/jokerwyt/b29b5113d0a5f75f6d5621d05d627230

Here is my result.

watcher\conc		 60		 80		 100		 120		 140
0		 26765.07		 27658.51		 27951.77		 27953.14		 27954.7
1		 18788.5		 18431.04		 16221.12		 11767.03		 15444.2
2		 13639.76		 14557.84		 12761.36		 12464.89		 14349.55
3		 13157.18		 13431.09		 11564.61		 12073.8		 13138.41
4		 12520.72		 10658.89		 12019.56		 10515.21		 10127.3
5		 11439.27		 10491.39		 12060.64		 10877.8		 10575.94
6		 13070.41		 10405.48		 9658.23		 11835.03		 10982.19
7		 12127.91		 12062.77		 10176.37		 9965.35		 10284.55
8		 13128.63		 11080.99		 10346.09		 10189.54		 10012.19
9		 9548.81		 10232.87		 9440.67		 11225.33		 9655.85
10		 9449.93		 9440.84		 9908.77		 9808.65		 9530.57

What did you expect to happen?

I expect etcd has the same performance while having 0 or more watchers.

How can we reproduce it (as minimally and precisely as possible)?

I have a test script, use this combining the go benchmark code.
But you may need to set up an etcd yourself and do some small modifications to the script.

https://gist.github.com/jokerwyt/955a810bfe28b342f6ace11ba840e36c

Anything else we need to know?

No response

Etcd version (please run commands below)

3.5.10

Etcd configuration (command line flags or environment variables)

        quota-backend-bytes: "8589934592" # 8Gi
        auto-compaction-retention: "120m"
        auto-compaction-mode: "periodic"

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
ytwu@worker1:~$ 
    ETCDCTL_API=3 etcdctl \
        --cert /etc/kubernetes/pki/etcd/peer.crt \
        --key /etc/kubernetes/pki/etcd/peer.key \
        --cacert /etc/kubernetes/pki/etcd/ca.crt \
        --endpoints https://worker1:2379 member list -w table
+------------------+---------+---------+------------------------+------------------------+
|        ID        | STATUS  |  NAME   |       PEER ADDRS       |      CLIENT ADDRS      |
+------------------+---------+---------+------------------------+------------------------+
| 99b1b6bcd47e918c | started | worker1 | https://10.10.1.4:2380 | https://10.10.1.4:2379 |
+------------------+---------+---------+------------------------+------------------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
ytwu@worker1:~$ 
    ETCDCTL_API=3 etcdctl \
        --cert /etc/kubernetes/pki/etcd/peer.crt \
        --key /etc/kubernetes/pki/etcd/peer.key \
        --cacert /etc/kubernetes/pki/etcd/ca.crt \
        --endpoints https://worker1:2379 endpoint status -w table
+----------------------+------------------+---------+---------+-----------+-----------+------------+
|       ENDPOINT       |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------+------------------+---------+---------+-----------+-----------+------------+
| https://worker1:2379 | 99b1b6bcd47e918c |  3.5.10 |  1.0 GB |      true |         2 |     500011 |
+----------------------+------------------+---------+---------+-----------+-----------+------------+

Relevant log output

No response

@ahrtr ahrtr added area/performance priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed type/bug labels Dec 23, 2024
@ahrtr
Copy link
Member

ahrtr commented Dec 23, 2024

To clarify, usually regression means some degradation in current version as compared to previous version. But I do not see such comparison between versions in your description. So this isn't a regression. Please let me know if I misunderstood you.

But this isn't the first time we see such complaint on watcher performance, so we will spend some effort on it in next version (3.7) to try to improve the situation.

@jokerwyt
Copy link
Author

To clarify, usually regression means some degradation in current version as compared to previous version. But I do not see such comparison between versions in your description. So this isn't a regression. Please let me know if I misunderstood you.为了澄清,通常 regression 意味着与之前版本相比当前版本的某些性能下降。但我在你的描述中没有看到版本之间的这种比较。所以这不是回归。如果我误解了你,请告诉我。

But this isn't the first time we see such complaint on watcher performance, so we will spend some effort on it in next version (3.7) to try to improve the situation.但这不是我们第一次看到关于观察者 performance 的投诉,因此我们将在下一个版本(3.7)中花一些精力来尝试改善这种情况。

Sorry for my word misuse! I am very happy to see the community has the motivation to solve this!

@jokerwyt jokerwyt changed the title Throughtput regression while having more and more watchers Lower throughtput while having more and more watchers Dec 23, 2024
@jokerwyt jokerwyt changed the title Lower throughtput while having more and more watchers Lower throughput while having more and more watchers Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Development

No branches or pull requests

2 participants