fix: prevent cache key mismatch #35573

betodealmeida · 2025-10-08T17:16:35Z

SUMMARY

Fix mismatch in cache key generation, resulting in cache misses.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

korbit-ai · 2025-10-08T17:16:39Z

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

bito-code-review · 2025-10-08T17:16:55Z

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

betodealmeida · 2025-10-08T17:17:51Z

/korbit-review
/review

korbit-ai

I've completed my review and didn't find any issues.

Files scanned

File Path	Reviewed
superset/common/query_object.py	✅
superset/models/helpers.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Inefficient metric processing without caching ▹ view

Files scanned

File Path	Reviewed
superset/common/query_object.py	✅
superset/models/helpers.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-10-20T15:21:55Z

superset/common/query_object.py

+        sanitized_metrics = []
+        for metric in self.metrics:
+            if not (is_adhoc_metric(metric) and isinstance(metric, dict)):
+                sanitized_metrics.append(metric)
+                continue
+            if sql_expr := metric.get("sqlExpression"):
+                try:
+                    processed = self.datasource._process_select_expression(
+                        expression=sql_expr,
+                        database_id=self.datasource.database_id,
+                        engine=self.datasource.database.backend,
+                        schema=self.datasource.schema,
+                        template_processor=None,
+                    )
+                    if processed and processed != sql_expr:
+                        # Create new dict to avoid mutating shared references
+                        sanitized_metrics.append({**metric, "sqlExpression": processed})
+                    else:
+                        sanitized_metrics.append(metric)
+                except Exception as ex:  # pylint: disable=broad-except
+                    # If processing fails, leave as-is and let execution handle it
+                    logger.debug("Failed to sanitize metric SQL expression: %s", ex)
+                    sanitized_metrics.append(metric)
+            else:
+                sanitized_metrics.append(metric)


Inefficient metric processing without caching

Tell me more

What is the issue?

The method creates a new list and appends items one by one instead of using list comprehension, and performs expensive SQL processing operations for each metric without any caching mechanism.

Why this matters

This approach is inefficient for large metric lists and may cause repeated expensive SQL processing operations. The lack of caching means identical SQL expressions will be processed multiple times across different query objects.

Suggested change ∙ Feature Preview

Use list comprehension where possible and implement caching for processed SQL expressions. Consider using functools.lru_cache or a class-level cache to store processed expressions:

from functools import lru_cache @lru_cache(maxsize=128) def _cached_process_select_expression(self, sql_expr, database_id, engine, schema): return self.datasource._process_select_expression( expression=sql_expr, database_id=database_id, engine=engine, schema=schema, template_processor=None, )

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

eschutho · 2025-10-20T22:28:03Z

I had Claude Code review this PR and we have some follow-up questions:

Overall Assessment
The core approach looks sound - moving SQL expression sanitization from execution time to validation time is the right solution. The implementation shows good defensive programming with immutability, error handling, and comprehensive tests. I verified the call to _sanitize_sql_expressions() is properly integrated into the validate() method.

Follow-up Questions

Performance testing: Have you tested this with queries containing many adhoc metrics? Any noticeable performance impact from repeated datasource method calls? (Korbit AI flagged a potential concern about calling datasource.get_metrics_by_name() and processing methods N times without caching)

Idempotency: Is _sanitize_sql_expressions() idempotent? If validate() is called multiple times on the same QueryObject, will it cause issues?

Cache invalidation: Will this change invalidate existing cached queries? If so, should there be a migration note in UPDATING.md?

Edge cases: What happens with:

Datasources that don't implement processing methods? (I see tests cover this - good!)
SQL expressions containing Jinja templates? (Also covered in tests - good!)
Nested adhoc metrics that reference other adhoc metrics?

Minor Observations

The type ignore comment change (# type: ignore → # type: ignore[misc,index]) - does this address a real mypy error or should the underlying typing be fixed?
Comprehensive test coverage looks great! Particularly like the mutation prevention tests.

sadpandajoe · 2025-10-22T17:28:28Z

@betodealmeida can you add a PR description?

rusackas · 2025-10-29T17:35:07Z

Just needs a PR description :D

betodealmeida · 2025-11-12T18:16:46Z

Performance testing: Have you tested this with queries containing many adhoc metrics? Any noticeable performance impact from repeated datasource method calls? (Korbit AI flagged a potential concern about calling datasource.get_metrics_by_name() and processing methods N times without caching)

The processing was already happening during query execution, this PR just moves is to earlier validation time.

Idempotency: Is _sanitize_sql_expressions() idempotent? If validate() is called multiple times on the same QueryObject, will it cause issues?

It is idempotent.

Cache invalidation: Will this change invalidate existing cached queries? If so, should there be a migration note in UPDATING.md?

Yeah, it will invalidate existing cached queries, it's a one time cache miss. Since this is a bug fix I don't think we need to add to UPDATING.md.

Edge cases: What happens with:

Datasources that don't implement processing methods? (I see tests cover this - good!)

SQL expressions containing Jinja templates? (Also covered in tests - good!)

Nested adhoc metrics that reference other adhoc metrics?

Minor Observations

The type ignore comment change (# type: ignore → # type: ignore[misc,index]) - does this address a real mypy error or should the underlying typing be fixed? Comprehensive test coverage looks great! Particularly like the mutation prevention tests.

…validation Root Cause: SQL expressions in adhoc metrics and orderby were being processed (uppercased via sanitize_clause()) during query execution, causing cache key mismatches in composite queries where: 1. Celery task processes and caches with processed expressions 2. Later requests compute cache keys from unprocessed expressions 3. Keys don't match → 422 error The Fix: Process SQL expressions during QueryObject.validate() BEFORE cache key generation, ensuring both cache key computation and query execution use the same processed expressions. Changes: - superset/common/query_object.py: * Add _sanitize_sql_expressions() called in validate() * Process metrics and orderby SQL expressions before caching - superset/models/helpers.py: * Pass processed=True to adhoc_metric_to_sqla() in get_sqla_query() * Skip re-processing since validate() already handled it - tests/unit_tests/connectors/sqla/test_orderby_mutation.py: * Add regression test documenting the fix

Address feedback on cache key stability fix: 1. **Fix in-place mutation during validation** - Changed _sanitize_metrics_expressions() to create new dicts instead of mutating - Changed _sanitize_orderby_expressions() to create new tuples/dicts - Prevents unexpected side effects when adhoc metrics are shared across queries 2. **Add comprehensive tests** - test_sql_expressions_processed_during_validation: Verifies SQL processing - test_validation_does_not_mutate_original_dicts: Ensures no mutation - test_validation_with_multiple_adhoc_metrics: Tests multiple metrics - test_validation_preserves_jinja_templates: Verifies Jinja preservation - test_validation_without_processing_methods: Tests graceful degradation - test_validation_serialization_stability: Tests JSON serialization stability 3. **Performance optimization** - Added early returns when no adhoc expressions to process - Reduces unnecessary function calls This ensures that: - Cache keys remain stable across validation and execution - Original metric dicts are not mutated (preventing composite query issues) - Jinja templates are preserved for runtime processing - The fix works even when datasources lack processing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

When processing adhoc metrics in ORDER BY clauses during query execution, Jinja templates were not being rendered because `processed=True` was passed without providing template processing. This commit: 1. Updates adhoc_metric_to_sqla() to apply template processing even when processed=True (meaning SQL is already sanitized) 2. Passes template_processor when converting orderby adhoc metrics 3. Removes obsolete test that expected error handling removed in commit add087c The fix ensures that: - During validation: SQL is sanitized but Jinja templates are preserved (template_processor=None) - During execution: Jinja templates are rendered (template_processor provided, processed=True skips re-sanitization) Fixes test: test_chart_data_table_chart_with_time_grain_filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

betodealmeida · 2025-11-21T22:18:09Z

Better fix in #36225

pull-request-size bot added the size/L label Oct 8, 2025

betodealmeida changed the title ~~Fix query cache~~ fix: prevent cache key mismatch Oct 8, 2025

github-actions bot added the preset-io label Oct 8, 2025

korbit-ai bot approved these changes Oct 8, 2025

View reviewed changes

pull-request-size bot added size/XL and removed size/L labels Oct 8, 2025

betodealmeida force-pushed the fix-query-cache branch from 3a23e68 to dec5d13 Compare October 8, 2025 17:39

pull-request-size bot added size/L and removed size/XL labels Oct 8, 2025

sadpandajoe added the review:draft label Oct 14, 2025

betodealmeida marked this pull request as ready for review October 20, 2025 15:17

dosubot bot added the change:backend Requires changing the backend label Oct 20, 2025

korbit-ai bot suggested changes Oct 20, 2025

View reviewed changes

rusackas added need:more-info and removed review:draft labels Oct 29, 2025

pull-request-size bot added size/XL and removed size/L labels Nov 12, 2025

betodealmeida force-pushed the fix-query-cache branch from a85313b to 5c23094 Compare November 12, 2025 19:44

betodealmeida and others added 5 commits November 12, 2025 15:59

Fix style

47c5860

Lint

4bcbe47

Fix logic

29256a4

betodealmeida force-pushed the fix-query-cache branch from 5c23094 to 29256a4 Compare November 12, 2025 20:59

betodealmeida and others added 2 commits November 13, 2025 10:11

Raise exceptions

add087c

pull-request-size bot added size/L and removed size/XL labels Nov 13, 2025

eschutho approved these changes Nov 14, 2025

View reviewed changes

betodealmeida closed this Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent cache key mismatch #35573

fix: prevent cache key mismatch #35573

Uh oh!

betodealmeida commented Oct 8, 2025 •

edited

Loading

Uh oh!

korbit-ai bot commented Oct 8, 2025

Uh oh!

bito-code-review bot commented Oct 8, 2025

Uh oh!

betodealmeida commented Oct 8, 2025

Uh oh!

korbit-ai bot left a comment

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Oct 20, 2025

Uh oh!

eschutho commented Oct 20, 2025 •

edited

Loading

Uh oh!

sadpandajoe commented Oct 22, 2025

Uh oh!

rusackas commented Oct 29, 2025

Uh oh!

betodealmeida commented Nov 12, 2025

Uh oh!

betodealmeida commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: prevent cache key mismatch #35573

fix: prevent cache key mismatch #35573

Uh oh!

Conversation

betodealmeida commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

korbit-ai bot commented Oct 8, 2025

Uh oh!

bito-code-review bot commented Oct 8, 2025

Uh oh!

betodealmeida commented Oct 8, 2025

Uh oh!

korbit-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Oct 20, 2025

Choose a reason for hiding this comment

Inefficient metric processing without caching

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

eschutho commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sadpandajoe commented Oct 22, 2025

Uh oh!

rusackas commented Oct 29, 2025

Uh oh!

betodealmeida commented Nov 12, 2025

Uh oh!

betodealmeida commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

betodealmeida commented Oct 8, 2025 •

edited

Loading

korbit-ai bot left a comment •

edited

Loading

eschutho commented Oct 20, 2025 •

edited

Loading