Skip to content

Conversation

@betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Oct 8, 2025

SUMMARY

Fix mismatch in cache key generation, resulting in cache misses.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@korbit-ai
Copy link

korbit-ai bot commented Oct 8, 2025

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

@bito-code-review
Copy link
Contributor

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

@betodealmeida betodealmeida changed the title Fix query cache fix: prevent cache key mismatch Oct 8, 2025
@betodealmeida
Copy link
Member Author

/korbit-review
/review

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/common/query_object.py
superset/models/helpers.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@pull-request-size pull-request-size bot added size/XL and removed size/L labels Oct 8, 2025
@pull-request-size pull-request-size bot added size/L and removed size/XL labels Oct 8, 2025
@betodealmeida betodealmeida marked this pull request as ready for review October 20, 2025 15:17
@dosubot dosubot bot added the change:backend Requires changing the backend label Oct 20, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Performance Inefficient metric processing without caching ▹ view
Files scanned
File Path Reviewed
superset/common/query_object.py
superset/models/helpers.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Comment on lines 395 to 419
sanitized_metrics = []
for metric in self.metrics:
if not (is_adhoc_metric(metric) and isinstance(metric, dict)):
sanitized_metrics.append(metric)
continue
if sql_expr := metric.get("sqlExpression"):
try:
processed = self.datasource._process_select_expression(
expression=sql_expr,
database_id=self.datasource.database_id,
engine=self.datasource.database.backend,
schema=self.datasource.schema,
template_processor=None,
)
if processed and processed != sql_expr:
# Create new dict to avoid mutating shared references
sanitized_metrics.append({**metric, "sqlExpression": processed})
else:
sanitized_metrics.append(metric)
except Exception as ex: # pylint: disable=broad-except
# If processing fails, leave as-is and let execution handle it
logger.debug("Failed to sanitize metric SQL expression: %s", ex)
sanitized_metrics.append(metric)
else:
sanitized_metrics.append(metric)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient metric processing without caching category Performance

Tell me more
What is the issue?

The method creates a new list and appends items one by one instead of using list comprehension, and performs expensive SQL processing operations for each metric without any caching mechanism.

Why this matters

This approach is inefficient for large metric lists and may cause repeated expensive SQL processing operations. The lack of caching means identical SQL expressions will be processed multiple times across different query objects.

Suggested change ∙ Feature Preview

Use list comprehension where possible and implement caching for processed SQL expressions. Consider using functools.lru_cache or a class-level cache to store processed expressions:

from functools import lru_cache

@lru_cache(maxsize=128)
def _cached_process_select_expression(self, sql_expr, database_id, engine, schema):
    return self.datasource._process_select_expression(
        expression=sql_expr,
        database_id=database_id,
        engine=engine,
        schema=schema,
        template_processor=None,
    )
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

@eschutho
Copy link
Member

eschutho commented Oct 20, 2025

I had Claude Code review this PR and we have some follow-up questions:

Overall Assessment
The core approach looks sound - moving SQL expression sanitization from execution time to validation time is the right solution. The implementation shows good defensive programming with immutability, error handling, and comprehensive tests. I verified the call to _sanitize_sql_expressions() is properly integrated into the validate() method.

Follow-up Questions

Performance testing: Have you tested this with queries containing many adhoc metrics? Any noticeable performance impact from repeated datasource method calls? (Korbit AI flagged a potential concern about calling datasource.get_metrics_by_name() and processing methods N times without caching)

Idempotency: Is _sanitize_sql_expressions() idempotent? If validate() is called multiple times on the same QueryObject, will it cause issues?

Cache invalidation: Will this change invalidate existing cached queries? If so, should there be a migration note in UPDATING.md?

Edge cases: What happens with:

  • Datasources that don't implement processing methods? (I see tests cover this - good!)
  • SQL expressions containing Jinja templates? (Also covered in tests - good!)
  • Nested adhoc metrics that reference other adhoc metrics?

Minor Observations

The type ignore comment change (# type: ignore → # type: ignore[misc,index]) - does this address a real mypy error or should the underlying typing be fixed?
Comprehensive test coverage looks great! Particularly like the mutation prevention tests.

@sadpandajoe
Copy link
Member

@betodealmeida can you add a PR description?

@rusackas
Copy link
Member

Just needs a PR description :D

@betodealmeida
Copy link
Member Author

Performance testing: Have you tested this with queries containing many adhoc metrics? Any noticeable performance impact from repeated datasource method calls? (Korbit AI flagged a potential concern about calling datasource.get_metrics_by_name() and processing methods N times without caching)

The processing was already happening during query execution, this PR just moves is to earlier validation time.

Idempotency: Is _sanitize_sql_expressions() idempotent? If validate() is called multiple times on the same QueryObject, will it cause issues?

It is idempotent.

Cache invalidation: Will this change invalidate existing cached queries? If so, should there be a migration note in UPDATING.md?

Yeah, it will invalidate existing cached queries, it's a one time cache miss. Since this is a bug fix I don't think we need to add to UPDATING.md.

Edge cases: What happens with:

  • Datasources that don't implement processing methods? (I see tests cover this - good!)
  • SQL expressions containing Jinja templates? (Also covered in tests - good!)
  • Nested adhoc metrics that reference other adhoc metrics?

Minor Observations

The type ignore comment change (# type: ignore → # type: ignore[misc,index]) - does this address a real mypy error or should the underlying typing be fixed? Comprehensive test coverage looks great! Particularly like the mutation prevention tests.

betodealmeida and others added 5 commits November 12, 2025 15:59
…validation

Root Cause:
SQL expressions in adhoc metrics and orderby were being processed
(uppercased via sanitize_clause()) during query execution, causing
cache key mismatches in composite queries where:
1. Celery task processes and caches with processed expressions
2. Later requests compute cache keys from unprocessed expressions
3. Keys don't match → 422 error

The Fix:
Process SQL expressions during QueryObject.validate() BEFORE cache key
generation, ensuring both cache key computation and query execution use
the same processed expressions.

Changes:
- superset/common/query_object.py:
  * Add _sanitize_sql_expressions() called in validate()
  * Process metrics and orderby SQL expressions before caching

- superset/models/helpers.py:
  * Pass processed=True to adhoc_metric_to_sqla() in get_sqla_query()
  * Skip re-processing since validate() already handled it

- tests/unit_tests/connectors/sqla/test_orderby_mutation.py:
  * Add regression test documenting the fix
Address feedback on cache key stability fix:

1. **Fix in-place mutation during validation**
   - Changed _sanitize_metrics_expressions() to create new dicts instead of mutating
   - Changed _sanitize_orderby_expressions() to create new tuples/dicts
   - Prevents unexpected side effects when adhoc metrics are shared across queries

2. **Add comprehensive tests**
   - test_sql_expressions_processed_during_validation: Verifies SQL processing
   - test_validation_does_not_mutate_original_dicts: Ensures no mutation
   - test_validation_with_multiple_adhoc_metrics: Tests multiple metrics
   - test_validation_preserves_jinja_templates: Verifies Jinja preservation
   - test_validation_without_processing_methods: Tests graceful degradation
   - test_validation_serialization_stability: Tests JSON serialization stability

3. **Performance optimization**
   - Added early returns when no adhoc expressions to process
   - Reduces unnecessary function calls

This ensures that:
- Cache keys remain stable across validation and execution
- Original metric dicts are not mutated (preventing composite query issues)
- Jinja templates are preserved for runtime processing
- The fix works even when datasources lack processing methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
betodealmeida and others added 2 commits November 13, 2025 10:11
When processing adhoc metrics in ORDER BY clauses during query execution,
Jinja templates were not being rendered because `processed=True` was
passed without providing template processing.

This commit:
1. Updates adhoc_metric_to_sqla() to apply template processing even when
   processed=True (meaning SQL is already sanitized)
2. Passes template_processor when converting orderby adhoc metrics
3. Removes obsolete test that expected error handling removed in commit
   add087c

The fix ensures that:
- During validation: SQL is sanitized but Jinja templates are preserved
  (template_processor=None)
- During execution: Jinja templates are rendered (template_processor
  provided, processed=True skips re-sanitization)

Fixes test: test_chart_data_table_chart_with_time_grain_filter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@betodealmeida
Copy link
Member Author

Better fix in #36225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants