-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inferring Column Lineage When SELECT * FROM CTE/SubQuery #303
Comments
The overall design for column lineage regarding But clearly in your case, we should be able to derive the columns that Right now there's no walk-around. It's better you avoid writing |
For your situation, you can split the SQL query into separate sections by separating the WITH clauses and the main SELECT statement. You can then parse each section individually using sqllineage. After parsing, you can manually add the relations you need using the WITH clause parsing results. |
+@mberk06 who's interested in solving this issue. I'd like to first mention that the ambition of sqllineage v1.5.x is to introduce a catalog plugin mechanism, so we're capable of knowing what columns a table contains, even if the query is Under this background, it's better we also build a catalog for subquery (note CTE is modeled as SubQuery class underneath sqllineage). But unlike table, the columns of subquery can be inferred during parsing. And from catalog perspective, they're unified. That's the preferred way aligned with our road map. But you're also welcome to solve this one with point solution. |
I was wondering why the following is the case:
returns :
.newtable.* <- B.* <- A.*
while
only returns:
.newtable.* <- B.
While extra information is gathered in the query. The column lineage is actually reduced.
In fact in the current setup you should have all the information to create the following lineage:
.newtable.col1 <- B.col1 <- A.col1 <- sourceTable.colOld
.newtable.col2 <- B.col2 <- A.col2 <- sourceTable.colOld2
Is there any way to resolve this issue and get all the columns?
It might be possible to split the SQL up into parts and get the lineage for each CTE. Afterwards you can then deduct the actual lineage from the actual source tables to the target. If this is not clear enough I can provide an example in the near future
Kind regards
The text was updated successfully, but these errors were encountered: