SNS-2737: Making changes for a simple readthrough cache without queuing #5992

nachivrn · 2024-06-01T03:15:48Z

read-through cache has complex logic to manage different states and conditions, queuing incoming queries to prevent duplicate executions. However, internal analysis shows that this complex logic of queuing in cache provides minimal benefit for searches with the same query ID concurrently, as these concurrent searches are relatively infrequent compared to the total volume. Therefore, the advantage of this complex logic does not justify the reduced maintainability of the code.

evanh

Please provide a description of your changes in the PR, along with links to any relevant docs and context / impact of the changes.

Also, this isn't a sufficient change, you will also need to ensure that the query_id is being set correctly. When we send a query to Clickhouse, we also have to include a query_id. Clickhouse will reject queries with identical query_ids. Normally the query_id is a hash of the SQL string, and the readthrough cache ensures that duplicate query_ids won't get sent to Clickhouse. https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L307

However with this change, we might see duplicate query_id in Clickhouse. So you will need to ensure that the query_id is overridden while still maintaining the cache keys.

snuba/state/cache/redis/backend.py

nachivrn · 2024-06-03T22:54:22Z

@evanh Regarding the comment about queryId, it is set in the db_query file :
https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L387

When concurrent queries are received with the same queryId, due to a cache miss with the simple read-through cache, all concurrent queries will be directed to ClickHouse. One of the queries will be executed while the others will be rejected by ClickHouse. This situation is handled in db_query, which assigns a random value to queryId for the rejected queries.
https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L337

onewland · 2024-06-04T18:48:30Z

@evanh Regarding the comment about queryId, it is set in the db_query file : https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L387

When concurrent queries are received with the same queryId, due to a cache miss with the simple read-through cache, all concurrent queries will be directed to ClickHouse. One of the queries will be executed while the others will be rejected by ClickHouse. This situation is handled in db_query, which assigns a random value to queryId for the rejected queries. https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L337

Isn't this worse than randomizing the query ID before sending it to ClickHouse, though? It seems like there's a performance (and simplicity) penalty for two round-trips

evanh · 2024-06-04T19:14:48Z

Isn't this worse than randomizing the query ID before sending it to ClickHouse, though? It seems like there's a performance (and simplicity) penalty for two round-trips

I agree with Oliver here. The correct flow should be: check cache, and if it's a miss, randomize the query ID and send it to Clickhouse.

nachivrn · 2024-06-05T19:07:45Z

@evanh @onewland Thanks for your review comments. The focus of the Jira task was on simplifying the read-through cache by removing the queueing mechanism and adopting a straightforward approach, acknowledging that this introduces a performance penalty due to additional round-trips. The performance issue is recognized and it only affects the small percentage of request and can be addressed separately in a different PR. We will explore optimizations and monitor the system's performance after merging this PR.

nachivrn · 2024-06-05T19:17:38Z

Notes: https://www.notion.so/sentry/Read-through-cache-simplification-85b308c25da748049fb280fdf32ea384

Rollout plan: https://www.notion.so/sentry/Experiment-1-Turn-off-read-through-cache-with-queuing-and-use-simple-read-through-cache-7bbd60f358874502b6045e34f0e69a5e

volokluev

You should have a test that tests this cache functionality at least one level higher, e.g. in test_db_query

snuba/state/cache/redis/backend.py

tests/state/test_cache.py

evanh

Couple comments, otherwise looks good.

snuba/state/cache/redis/backend.py

snuba/web/db_query.py

tests/web/test_db_query.py

queueing.

and handling exceptions.

incremental rollout.This flag should be set to a value between 0 and 1 where 0 means use read-through cache with queuing design and 1 use simple read-through cache.

one level higher test_db_query. Additionally adding tests for function errors in test_cache.

When concurrent request for same query id is received clickhouse query settings query id will be randomized before directing the request to clickhouse. In the read-through cache we will use the computed query id returned from get_query_cache_key as cache key.

a synchronous call. Additionally also checking if sample_rate is not set to None.

changing the stats with cache_hit_simple.

tests/web/test_db_query.py

codecov · 2024-06-09T22:29:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@ada4804). Learn more about missing BASE report.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #5992   +/-   ##
=========================================
  Coverage          ?   92.08%           
=========================================
  Files             ?      896           
  Lines             ?    42854           
  Branches          ?        0           
=========================================
  Hits              ?    39460           
  Misses            ?     3394           
  Partials          ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sentry-io · 2024-06-11T14:26:22Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ ClickhouseError: DB::Exception: Missing columns: 'exception_main_thread' while processing query: 'SELECT identity(... snql_dataset_query_view__discover__api.auth-tok... View Issue
‼️ ClickhouseError: DB::Exception: There's no column 'events._snuba_gen_8' in table 'events': While processing events... snql_dataset_query_view__events__api.organizati... View Issue
‼️ ClickhouseError: DB::Exception: Not found column countIf(or(equals(op, 'cc.check'), equals(op, 'cc.load'))) in blo... snql_dataset_query_view__spans__api.trace-explo... View Issue
‼️ ClickhouseError: DB::Exception: Received from snuba-transactions-tiger-mz-1-3:9000. DB::Exception: Too many simult... snql_dataset_query_view__discover__api.auth-tok... View Issue
‼️ ClickhouseError: DB::Exception: Attempt to read after eof: while receiving packet from snuba-transactions-tiger-mz... snql_dataset_query_view__discover__api.trace-vi... View Issue

_{Did you find this useful? React with a 👍 or 👎}

nachivrn force-pushed the SNS-2737 branch from 104d78e to 1a5a10f Compare June 3, 2024 18:13

nachivrn marked this pull request as ready for review June 3, 2024 18:13

nachivrn requested a review from a team as a code owner June 3, 2024 18:13

evanh reviewed Jun 3, 2024

View reviewed changes

snuba/state/cache/redis/backend.py Outdated Show resolved Hide resolved

nachivrn force-pushed the SNS-2737 branch from 338736c to c706b22 Compare June 4, 2024 18:47

volokluev reviewed Jun 5, 2024

View reviewed changes

snuba/state/cache/redis/backend.py Outdated Show resolved Hide resolved

tests/state/test_cache.py Show resolved Hide resolved

nachivrn requested a review from a team June 6, 2024 22:20

nachivrn force-pushed the SNS-2737 branch from fd0e906 to 3da2463 Compare June 6, 2024 22:25

getsentry deleted a comment from codecov bot Jun 7, 2024

nachivrn requested review from volokluev and evanh June 7, 2024 00:29

evanh approved these changes Jun 7, 2024

View reviewed changes

snuba/state/cache/redis/backend.py Outdated Show resolved Hide resolved

snuba/state/cache/redis/backend.py Outdated Show resolved Hide resolved

nachivrn force-pushed the SNS-2737 branch from 435405d to b6c2b6a Compare June 7, 2024 16:40

volokluev reviewed Jun 7, 2024

View reviewed changes

Nachiappan Veerappan Nachiappan and others added 11 commits June 7, 2024 15:40

SNS-2737 Making changes for a simple readthrough cache without

4de2c17

queueing.

Making changes to execute the function in a threadpool

a39be1b

and handling exceptions.

Making changes to fix the type issues flagged by mypy.

8d4fa23

Addressing review comments to do a percentage check for

b7c9327

incremental rollout.This flag should be set to a value between 0 and 1 where 0 means use read-through cache with queuing design and 1 use simple read-through cache.

release: 24.5.1

0f6924f

meta: Bump new development version

a2edaa6

Addressing review comments to test cache functionality at

b4075e8

one level higher test_db_query. Additionally adding tests for function errors in test_cache.

Making changes to fix issues flagged by static checker mypy.

c6f3581

Addressing review comment for making the call to redis

e35bca5

a synchronous call. Additionally also checking if sample_rate is not set to None.

Addressing review comments of updating tests and

283bead

changing the stats with cache_hit_simple.

nachivrn force-pushed the SNS-2737 branch from 8fc9a7e to 283bead Compare June 7, 2024 22:40

volokluev reviewed Jun 7, 2024

View reviewed changes

tests/web/test_db_query.py Outdated Show resolved Hide resolved

Nachiappan Veerappan Nachiappan added 3 commits June 7, 2024 16:37

Removing the mocking for query id generation.

84be77c

Adding the variable for query id.

32d73f9

Simplifying the assertions in test case.

e44c214

Adding the metric for simple readthrough cache.

c20e834

volokluev approved these changes Jun 10, 2024

View reviewed changes

nachivrn merged commit c173695 into master Jun 10, 2024
30 checks passed

nachivrn deleted the SNS-2737 branch June 10, 2024 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNS-2737: Making changes for a simple readthrough cache without queuing #5992

SNS-2737: Making changes for a simple readthrough cache without queuing #5992

nachivrn commented Jun 1, 2024 •

edited

Loading

evanh left a comment

nachivrn commented Jun 3, 2024

onewland commented Jun 4, 2024

evanh commented Jun 4, 2024

nachivrn commented Jun 5, 2024 •

edited

Loading

nachivrn commented Jun 5, 2024

volokluev left a comment

evanh left a comment

codecov bot commented Jun 9, 2024

sentry-io bot commented Jun 11, 2024 •

edited

Loading

SNS-2737: Making changes for a simple readthrough cache without queuing #5992

SNS-2737: Making changes for a simple readthrough cache without queuing #5992

Conversation

nachivrn commented Jun 1, 2024 • edited Loading

evanh left a comment

Choose a reason for hiding this comment

nachivrn commented Jun 3, 2024

onewland commented Jun 4, 2024

evanh commented Jun 4, 2024

nachivrn commented Jun 5, 2024 • edited Loading

nachivrn commented Jun 5, 2024

volokluev left a comment

Choose a reason for hiding this comment

evanh left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 9, 2024

Codecov Report

sentry-io bot commented Jun 11, 2024 • edited Loading

Suspect Issues

nachivrn commented Jun 1, 2024 •

edited

Loading

nachivrn commented Jun 5, 2024 •

edited

Loading

sentry-io bot commented Jun 11, 2024 •

edited

Loading