-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column lineage does not traverse through CTE containing uppercase letters #531
Comments
Caused by capitalization, if the SQL is modified to all lowercase, the result will be accurate. |
Thanks @maoxingda for your helpful response! Would you consider it a bug that it needs to be lowercase, or is transforming to lowercase a best-practice that is (or should be) documented? |
It's because most databases are case-insensitive, so the underlying layer of sqllineage uniformly converts all identifiers to lowercase. Therefore, currently, lineage analysis is not sensitive to uppercase SQL. |
Thanks. I'll close this issue since it sounds like it's a known bug that has already been considered. |
Thanks @maoxingda for the background. Yes we want to enforce case-insensitivity in lineage analysis in short term. However, I'm afraid this is indeed a bug introduced via fbad73a and released in v1.4.9. Converting all identifiers to lower case is just implementation details, the real intention is that for all the following SQL, they should generate the same lineage result: WITH CTE AS (SELECT name FROM person) SELECT name FROM cte;
WITH cte AS (SELECT name FROM person) SELECT name FROM CTE;
WITH cte AS (SELECT name FROM person) SELECT name FROM cTe;
WITH ctE AS (SELECT name FROM person) SELECT name FROM cte;
WITH CtE AS (SELECT name FROM person) SELECT name FROM Cte; because in a case-insensitive SQL database, they're indeed the same. |
For SQL code with a CTE, column lineage is ending at the CTE and not traversing through to the source tables or expressions referenced in the CTE.
For example, I have the following code in test.sql:
When I run the following command I get the subsequent output:
output:
expected output:
Python version:
SQLLineage version:
Additional context
If there are multiple CTE's with one referencing another, the lineage stops at the first CTE.
The text was updated successfully, but these errors were encountered: