Skip to content

feat: use sqlglot to set limit#33473

Merged
betodealmeida merged 2 commits intomasterfrom
set-limit
May 27, 2025
Merged

feat: use sqlglot to set limit#33473
betodealmeida merged 2 commits intomasterfrom
set-limit

Conversation

@betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented May 16, 2025

SUMMARY

Part of #26786, stacked on:

This PR changes the logic of setting the limit in a query to use sqlglot, adding the set_limit_value(limit: int) -> None method to the BaseSQLStatement class.

The method always sets the requested limit. It's up to the application to determine if a bigger limit should be forced or not. Before, it had some logic that determined if the limit should be updated or not depending on if the existing limit was smaller than the desired limit. This keeps the method simpler and decoupled from business logic.

One of the main advantages of using sqlglot here is that it abstracts all the different ways a query can be limited:

SELECT * FROM t LIMIT 10;  -- Postgres, Clickhouse, Bigquery, etc.
SELECT TOP 10 * FROM t;  -- Teradata, MS SQL, Vertica, etc.
SELECT * FROM t FETCH FIRST 10 ROWS ONLY;  -- Oracle, DB2, Apache Derby

These are all abstracted as a sqlglot.exp.Limit, and rendered correctly when the engine is specified (see unit tests).

Previously there was additional logic for Teradata that accounted for the SAMPLE function:

SELECT * FROM t SAMPLE n  -- returns n rows from t

I removed this logic because SAMPLE is not a limit — it limits the number of rows fetched from the table, but before the projection is applied. The projection could have a UDTF that generates millions of rows even if only 10 were read from the table. Because of that, I removed the logic. (The tests were also using invalid syntax, so I'm not even sure if this feature ever worked!)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added tests, and moved old tests to the new pattern.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API



class KustoKqlEngineSpec(BaseEngineSpec): # pylint: disable=abstract-method
limit_method = LimitMethod.WRAP_SQL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kusto supports limit, no need to wrap the query.

class MssqlEngineSpec(BaseEngineSpec):
engine = "mssql"
engine_name = "Microsoft SQL Server"
limit_method = LimitMethod.WRAP_SQL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MS SQL supports limit, no need to wrap the query.


engine = "teradatasql"
engine_name = "Teradata"
limit_method = LimitMethod.WRAP_SQL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

statement = script.statements[-1]
current_limit = statement.get_limit_value() or float("inf")

if limit < current_limit or force:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the limit application logic here.

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/db_engine_specs/teradata.py
superset/db_engine_specs/hana.py
superset/db_engine_specs/firebird.py
superset/db_engine_specs/oracle.py
superset/db_engine_specs/db2.py
superset/db_engine_specs/mssql.py
superset/db_engine_specs/kusto.py
superset/db_engine_specs/ocient.py
superset/db_engine_specs/lib.py
superset/sql_parse.py
superset/sql/parse.py
superset/models/core.py
superset/db_engine_specs/base.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X


return self._fallback_formatting()

@deprecated(deprecated_in="4.0")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to be removed in 5.0 and we missed the window, but it doesn't break anything other than not being able to make queries pretty when they don't have a sqlglot dialect.

We need to remove this in order to modify the sqlglot AST inplace.


return None

def set_limit_value(self, limit: int) -> None:
Copy link
Member Author

@betodealmeida betodealmeida May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been fun implementing the logic in KQL, but I'm considering writing a SIP to get rid of it, since it's not SQL and doubles the work.

Comment on lines -29 to -30
(100, "SEL SAMPLE 1000 * FROM My_table", "SEL SAMPLE 100 * FROM My_table"),
(10000, "SEL SAMPLE 1000 * FROM My_table", "SEL SAMPLE 1000 * FROM My_table"),
Copy link
Member Author

@betodealmeida betodealmeida May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not even valid syntax... SAMPLE goes at the end, not in the projection! The correct would be:

SELECT * FROM My_table SAMPLE 1000

Comment on lines -260 to -263
("SEL TOP 1000 * FROM My_table", "SEL TOP 100 * FROM My_table", 100),
("SEL TOP 1000 * FROM My_table;", "SEL TOP 100 * FROM My_table", 100),
("SEL TOP 1000 * FROM My_table;", "SEL TOP 1000 * FROM My_table", 10000),
("SEL TOP 1000 * FROM My_table;", "SEL TOP 1000 * FROM My_table", 1000),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid syntax... MS SQL doesn't support the SEL shortcut for SELECT!

Base automatically changed from is-select-query-refactor to master May 23, 2025 00:53
Copy link
Contributor

@Vitor-Avila Vitor-Avila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Manually tested with SQLite and Snowflake too

@betodealmeida betodealmeida merged commit 8de58b9 into master May 27, 2025
49 checks passed
@betodealmeida betodealmeida deleted the set-limit branch May 27, 2025 19:20
LevisNgigi pushed a commit to LevisNgigi/superset that referenced this pull request Jun 18, 2025
@github-actions github-actions bot added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 6.0.0 First shipped in 6.0.0 labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io sip Superset Improvement Proposal size/XXL 🚢 6.0.0 First shipped in 6.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants