Skip to content

fix: Edge case with metric not getting quoted in sort by when normalize_columns is enabled#33337

Merged
mistercrunch merged 1 commit intomasterfrom
fix/quote-sorted-metric
May 3, 2025
Merged

fix: Edge case with metric not getting quoted in sort by when normalize_columns is enabled#33337
mistercrunch merged 1 commit intomasterfrom
fix/quote-sorted-metric

Conversation

@Vitor-Avila
Copy link
Contributor

SUMMARY

When using Snowflake, it's possible to enable the normalize_columns setting at the dataset level to have all columns as lowercase. With this setting enabled, in case you use a metric in the chart that has the same key as a column that exists in the dataset (but it's not used in the chart), you would get a SQL error as the metric is used on sorting by default, and the metric name won't be quoted. Superset would generate:

SELECT DATE_TRUNC('DAY', time) AS "time", COUNT(*) AS "count_lower" 
FROM public.new GROUP BY DATE_TRUNC('DAY', time) ORDER BY count_lower DESC
LIMIT 1000;

As opposed to:

SELECT DATE_TRUNC('DAY', time) AS "time", COUNT(*) AS "count_lower" 
FROM public.new GROUP BY DATE_TRUNC('DAY', time) ORDER BY "count_lower" DESC
LIMIT 1000;

This only happens with this setting enabled. If you disable it and have the columns in uppercase, the exact same chart configuration works.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No UI changes.

TESTING INSTRUCTIONS

  1. Create a table in Snowflake.
  2. Create a dataset for it.
  3. Enable normalize_columns on the dataset.
  4. Sync columns so they revert to lowercase.
  5. Create a metric in the dataset with a key named after a column from the table.
  6. Create a chart using this metric, and another column.
  7. Validate the metric name is properly quoted in the ORDER BY statement.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added change:backend Requires changing the backend data:connect:snowflake Related to Snowflake labels May 1, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/models/helpers.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@Vitor-Avila
Copy link
Contributor Author

this is mostly changing the "order of evaluation" (to check if the column is a metric first). No tests failed, but I'm not 100% sure if this order of eval has any importance. @betodealmeida @eschutho @mistercrunch @michael-s-molina @villebro any thoughts?

@Vitor-Avila
Copy link
Contributor Author

tagging @hughhhh as I believe you worked on this logic in the past

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though overall the this whole section in the code is rough. Had to do a fair amount of thinking/guessing to understand why the evaluation ordering of the elifs matter here...

Comment on lines +1596 to +1599
elif col in columns_by_name:
col = self.convert_tbl_column_to_sqla_col(
columns_by_name[col], template_processor=template_processor
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm basically just changing the order to first evaluate the column against metrics, and then across columns.

My understanding of the issue is that in the current if/elif ordering, we would first evaluate col across columns_by_name, which would be true (even tho it's a metric), and then convert_tbl_column_to_sqla_col() does col = sa.column(tbl_column.column_name, type_=type_) which seems to return quoted if the column is uppercase (hence the issue does not happen with normalize_columns disabled):

import sqlalchemy as sa
print(sa.column("test", sa.String))
print(sa.column("TEST", sa.String))
test
"TEST"

With normalize_columns, we send a lowercase column label to sa.column() which then does not quote it.

@mistercrunch mistercrunch merged commit 9f0ae77 into master May 3, 2025
58 of 59 checks passed
@mistercrunch mistercrunch deleted the fix/quote-sorted-metric branch May 3, 2025 01:20
@michael-s-molina michael-s-molina added the v5.0 Label added by the release manager to track PRs to be included in the 5.0 branch label May 12, 2025
michael-s-molina pushed a commit that referenced this pull request May 12, 2025
LevisNgigi pushed a commit to LevisNgigi/superset that referenced this pull request Jun 18, 2025
alexandrusoare pushed a commit to alexandrusoare/superset that referenced this pull request Jun 19, 2025
@mistercrunch mistercrunch added 🍒 5.0.0 Cherry-picked to 5.0.0 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels change:backend Requires changing the backend data:connect:snowflake Related to Snowflake size/XS v5.0 Label added by the release manager to track PRs to be included in the 5.0 branch 🍒 5.0.0 Cherry-picked to 5.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants