-
Notifications
You must be signed in to change notification settings - Fork 16.5k
fix: prevent cache key mismatch #35573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
|
Bito Automatic Review Skipped - Draft PR |
|
/korbit-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues.
Files scanned
| File Path | Reviewed |
|---|---|
| superset/common/query_object.py | ✅ |
| superset/models/helpers.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
3a23e68 to
dec5d13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Inefficient metric processing without caching ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| superset/common/query_object.py | ✅ |
| superset/models/helpers.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
| sanitized_metrics = [] | ||
| for metric in self.metrics: | ||
| if not (is_adhoc_metric(metric) and isinstance(metric, dict)): | ||
| sanitized_metrics.append(metric) | ||
| continue | ||
| if sql_expr := metric.get("sqlExpression"): | ||
| try: | ||
| processed = self.datasource._process_select_expression( | ||
| expression=sql_expr, | ||
| database_id=self.datasource.database_id, | ||
| engine=self.datasource.database.backend, | ||
| schema=self.datasource.schema, | ||
| template_processor=None, | ||
| ) | ||
| if processed and processed != sql_expr: | ||
| # Create new dict to avoid mutating shared references | ||
| sanitized_metrics.append({**metric, "sqlExpression": processed}) | ||
| else: | ||
| sanitized_metrics.append(metric) | ||
| except Exception as ex: # pylint: disable=broad-except | ||
| # If processing fails, leave as-is and let execution handle it | ||
| logger.debug("Failed to sanitize metric SQL expression: %s", ex) | ||
| sanitized_metrics.append(metric) | ||
| else: | ||
| sanitized_metrics.append(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient metric processing without caching 
Tell me more
What is the issue?
The method creates a new list and appends items one by one instead of using list comprehension, and performs expensive SQL processing operations for each metric without any caching mechanism.
Why this matters
This approach is inefficient for large metric lists and may cause repeated expensive SQL processing operations. The lack of caching means identical SQL expressions will be processed multiple times across different query objects.
Suggested change ∙ Feature Preview
Use list comprehension where possible and implement caching for processed SQL expressions. Consider using functools.lru_cache or a class-level cache to store processed expressions:
from functools import lru_cache
@lru_cache(maxsize=128)
def _cached_process_select_expression(self, sql_expr, database_id, engine, schema):
return self.datasource._process_select_expression(
expression=sql_expr,
database_id=database_id,
engine=engine,
schema=schema,
template_processor=None,
)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
|
I had Claude Code review this PR and we have some follow-up questions: Overall Assessment Follow-up Questions Performance testing: Have you tested this with queries containing many adhoc metrics? Any noticeable performance impact from repeated datasource method calls? (Korbit AI flagged a potential concern about calling datasource.get_metrics_by_name() and processing methods N times without caching) Idempotency: Is _sanitize_sql_expressions() idempotent? If validate() is called multiple times on the same QueryObject, will it cause issues? Cache invalidation: Will this change invalidate existing cached queries? If so, should there be a migration note in UPDATING.md? Edge cases: What happens with:
Minor Observations The type ignore comment change (# type: ignore → # type: ignore[misc,index]) - does this address a real mypy error or should the underlying typing be fixed? |
|
@betodealmeida can you add a PR description? |
|
Just needs a PR description :D |
The processing was already happening during query execution, this PR just moves is to earlier validation time.
It is idempotent.
Yeah, it will invalidate existing cached queries, it's a one time cache miss. Since this is a bug fix I don't think we need to add to
|
a85313b to
5c23094
Compare
…validation Root Cause: SQL expressions in adhoc metrics and orderby were being processed (uppercased via sanitize_clause()) during query execution, causing cache key mismatches in composite queries where: 1. Celery task processes and caches with processed expressions 2. Later requests compute cache keys from unprocessed expressions 3. Keys don't match → 422 error The Fix: Process SQL expressions during QueryObject.validate() BEFORE cache key generation, ensuring both cache key computation and query execution use the same processed expressions. Changes: - superset/common/query_object.py: * Add _sanitize_sql_expressions() called in validate() * Process metrics and orderby SQL expressions before caching - superset/models/helpers.py: * Pass processed=True to adhoc_metric_to_sqla() in get_sqla_query() * Skip re-processing since validate() already handled it - tests/unit_tests/connectors/sqla/test_orderby_mutation.py: * Add regression test documenting the fix
Address feedback on cache key stability fix: 1. **Fix in-place mutation during validation** - Changed _sanitize_metrics_expressions() to create new dicts instead of mutating - Changed _sanitize_orderby_expressions() to create new tuples/dicts - Prevents unexpected side effects when adhoc metrics are shared across queries 2. **Add comprehensive tests** - test_sql_expressions_processed_during_validation: Verifies SQL processing - test_validation_does_not_mutate_original_dicts: Ensures no mutation - test_validation_with_multiple_adhoc_metrics: Tests multiple metrics - test_validation_preserves_jinja_templates: Verifies Jinja preservation - test_validation_without_processing_methods: Tests graceful degradation - test_validation_serialization_stability: Tests JSON serialization stability 3. **Performance optimization** - Added early returns when no adhoc expressions to process - Reduces unnecessary function calls This ensures that: - Cache keys remain stable across validation and execution - Original metric dicts are not mutated (preventing composite query issues) - Jinja templates are preserved for runtime processing - The fix works even when datasources lack processing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
5c23094 to
29256a4
Compare
When processing adhoc metrics in ORDER BY clauses during query execution, Jinja templates were not being rendered because `processed=True` was passed without providing template processing. This commit: 1. Updates adhoc_metric_to_sqla() to apply template processing even when processed=True (meaning SQL is already sanitized) 2. Passes template_processor when converting orderby adhoc metrics 3. Removes obsolete test that expected error handling removed in commit add087c The fix ensures that: - During validation: SQL is sanitized but Jinja templates are preserved (template_processor=None) - During execution: Jinja templates are rendered (template_processor provided, processed=True skips re-sanitization) Fixes test: test_chart_data_table_chart_with_time_grain_filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Better fix in #36225 |
SUMMARY
Fix mismatch in cache key generation, resulting in cache misses.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION