-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNS-2737: Making changes for a simple readthrough cache without queuing #5992
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a description of your changes in the PR, along with links to any relevant docs and context / impact of the changes.
Also, this isn't a sufficient change, you will also need to ensure that the query_id
is being set correctly. When we send a query to Clickhouse, we also have to include a query_id
. Clickhouse will reject queries with identical query_id
s. Normally the query_id
is a hash of the SQL string, and the readthrough cache ensures that duplicate query_id
s won't get sent to Clickhouse. https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L307
However with this change, we might see duplicate query_id
in Clickhouse. So you will need to ensure that the query_id
is overridden while still maintaining the cache keys.
@evanh Regarding the comment about queryId, it is set in the db_query file : When concurrent queries are received with the same queryId, due to a cache miss with the simple read-through cache, all concurrent queries will be directed to ClickHouse. One of the queries will be executed while the others will be rejected by ClickHouse. This situation is handled in db_query, which assigns a random value to queryId for the rejected queries. |
Isn't this worse than randomizing the query ID before sending it to ClickHouse, though? It seems like there's a performance (and simplicity) penalty for two round-trips |
I agree with Oliver here. The correct flow should be: check cache, and if it's a miss, randomize the query ID and send it to Clickhouse. |
@evanh @onewland Thanks for your review comments. The focus of the Jira task was on simplifying the read-through cache by removing the queueing mechanism and adopting a straightforward approach, acknowledging that this introduces a performance penalty due to additional round-trips. The performance issue is recognized and it only affects the small percentage of request and can be addressed separately in a different PR. We will explore optimizations and monitor the system's performance after merging this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have a test that tests this cache functionality at least one level higher, e.g. in test_db_query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments, otherwise looks good.
and handling exceptions.
incremental rollout.This flag should be set to a value between 0 and 1 where 0 means use read-through cache with queuing design and 1 use simple read-through cache.
one level higher test_db_query. Additionally adding tests for function errors in test_cache.
When concurrent request for same query id is received clickhouse query settings query id will be randomized before directing the request to clickhouse. In the read-through cache we will use the computed query id returned from get_query_cache_key as cache key.
a synchronous call. Additionally also checking if sample_rate is not set to None.
changing the stats with cache_hit_simple.
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #5992 +/- ##
=========================================
Coverage ? 92.08%
=========================================
Files ? 896
Lines ? 42854
Branches ? 0
=========================================
Hits ? 39460
Misses ? 3394
Partials ? 0 ☔ View full report in Codecov by Sentry. |
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
read-through cache has complex logic to manage different states and conditions, queuing incoming queries to prevent duplicate executions. However, internal analysis shows that this complex logic of queuing in cache provides minimal benefit for searches with the same query ID concurrently, as these concurrent searches are relatively infrequent compared to the total volume. Therefore, the advantage of this complex logic does not justify the reduced maintainability of the code.