feat: use sqlglot to set limit#33473
Conversation
|
|
||
|
|
||
| class KustoKqlEngineSpec(BaseEngineSpec): # pylint: disable=abstract-method | ||
| limit_method = LimitMethod.WRAP_SQL |
There was a problem hiding this comment.
Kusto supports limit, no need to wrap the query.
| class MssqlEngineSpec(BaseEngineSpec): | ||
| engine = "mssql" | ||
| engine_name = "Microsoft SQL Server" | ||
| limit_method = LimitMethod.WRAP_SQL |
There was a problem hiding this comment.
MS SQL supports limit, no need to wrap the query.
|
|
||
| engine = "teradatasql" | ||
| engine_name = "Teradata" | ||
| limit_method = LimitMethod.WRAP_SQL |
| statement = script.statements[-1] | ||
| current_limit = statement.get_limit_value() or float("inf") | ||
|
|
||
| if limit < current_limit or force: |
There was a problem hiding this comment.
Moved the limit application logic here.
There was a problem hiding this comment.
I've completed my review and didn't find any issues.
Files scanned
| File Path | Reviewed |
|---|---|
| superset/db_engine_specs/teradata.py | ✅ |
| superset/db_engine_specs/hana.py | ✅ |
| superset/db_engine_specs/firebird.py | ✅ |
| superset/db_engine_specs/oracle.py | ✅ |
| superset/db_engine_specs/db2.py | ✅ |
| superset/db_engine_specs/mssql.py | ✅ |
| superset/db_engine_specs/kusto.py | ✅ |
| superset/db_engine_specs/ocient.py | ✅ |
| superset/db_engine_specs/lib.py | ✅ |
| superset/sql_parse.py | ✅ |
| superset/sql/parse.py | ✅ |
| superset/models/core.py | ✅ |
| superset/db_engine_specs/base.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
|
|
||
| return self._fallback_formatting() | ||
|
|
||
| @deprecated(deprecated_in="4.0") |
There was a problem hiding this comment.
This was supposed to be removed in 5.0 and we missed the window, but it doesn't break anything other than not being able to make queries pretty when they don't have a sqlglot dialect.
We need to remove this in order to modify the sqlglot AST inplace.
superset/sql/parse.py
Outdated
|
|
||
| return None | ||
|
|
||
| def set_limit_value(self, limit: int) -> None: |
There was a problem hiding this comment.
It's been fun implementing the logic in KQL, but I'm considering writing a SIP to get rid of it, since it's not SQL and doubles the work.
| (100, "SEL SAMPLE 1000 * FROM My_table", "SEL SAMPLE 100 * FROM My_table"), | ||
| (10000, "SEL SAMPLE 1000 * FROM My_table", "SEL SAMPLE 1000 * FROM My_table"), |
There was a problem hiding this comment.
This is not even valid syntax... SAMPLE goes at the end, not in the projection! The correct would be:
SELECT * FROM My_table SAMPLE 1000| ("SEL TOP 1000 * FROM My_table", "SEL TOP 100 * FROM My_table", 100), | ||
| ("SEL TOP 1000 * FROM My_table;", "SEL TOP 100 * FROM My_table", 100), | ||
| ("SEL TOP 1000 * FROM My_table;", "SEL TOP 1000 * FROM My_table", 10000), | ||
| ("SEL TOP 1000 * FROM My_table;", "SEL TOP 1000 * FROM My_table", 1000), |
There was a problem hiding this comment.
This is invalid syntax... MS SQL doesn't support the SEL shortcut for SELECT!
d936b6f to
3708d38
Compare
5cddcea to
6f904f2
Compare
Vitor-Avila
left a comment
There was a problem hiding this comment.
LGTM! Manually tested with SQLite and Snowflake too
SUMMARY
Part of #26786, stacked on:
This PR changes the logic of setting the limit in a query to use
sqlglot, adding theset_limit_value(limit: int) -> Nonemethod to theBaseSQLStatementclass.The method always sets the requested limit. It's up to the application to determine if a bigger limit should be forced or not. Before, it had some logic that determined if the limit should be updated or not depending on if the existing limit was smaller than the desired limit. This keeps the method simpler and decoupled from business logic.
One of the main advantages of using
sqlglothere is that it abstracts all the different ways a query can be limited:These are all abstracted as a
sqlglot.exp.Limit, and rendered correctly when the engine is specified (see unit tests).Previously there was additional logic for Teradata that accounted for the
SAMPLEfunction:I removed this logic because
SAMPLEis not a limit — it limits the number of rows fetched from the table, but before the projection is applied. The projection could have a UDTF that generates millions of rows even if only 10 were read from the table. Because of that, I removed the logic. (The tests were also using invalid syntax, so I'm not even sure if this feature ever worked!)BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Added tests, and moved old tests to the new pattern.
ADDITIONAL INFORMATION