-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major slowdowns writing to tile38 #697
Comments
Sorry, but unable to help without some way to reproduce the issue. |
Are there a large number of points at the same GPS location? I've found that if you reuse the same static point or have tons of points at the same location it will slow things down substantially as the index's get huge. |
We hit this again in production but still can't replicate on demand; I know that's not especially helpful. All replicas are behaving the same and it persists over a restart so the aof file may be enough to reproduce elsewhere. Unfortunately it's > 20G and probably can't be shared; but we can test. |
Similar to when it happened in 2023 the problem resolved itself after a couple days. Last time it seemed to correlate with removing followers but that didn't help this time, instead it correlated with what we call a reindex, where we synced all the points with their values in our SQL database. During the slow period we were seeing 6/s to 10/s inserts. Things we tried that did not help: Reducing the size of other collections in the tile38 |
The collections we were inserting into:
Info from the primary (after failing over) during the slowdown:
Server ext from the previous day, during the incident, when we were still on the primary and had testing shutting down all the replicas:
|
Could this be related to #756? |
Describe the bug
We are experiencing significant slow downs in our write operations. For the past 3 days we've been seeing write speeds for points of approximately 10-30/s, down from 10,000 or more per second typically. We've experienced these slow downs a few times in the past. We expect but cannot yet confirm that it may be related to replication.
We are running 6 instances of tile38, split between two geographically separated datacenters with ~94ms round trip ping time between the two datacenters, with one instance as the leader and all others as replicas. The coordination is managed by redis sentinal.
To Reproduce
This is a behaviour we've observed in production several times, but we do not have steps to reproduce outside of production.
Expected behavior
We expect consistent write speed that can keep up with our load of thousands of points per second.
Logs
Not Applicable
Operating System (please complete the following information):
Additional context
See attached
tile38_slowness_server_cmds.txt
The text was updated successfully, but these errors were encountered: