Conversation
|
A few things to be discussed is how can we expose the column lineage details
|
61bdf04 to
c7fb784
Compare
kasiafi
left a comment
There was a problem hiding this comment.
A few comments and questions.
Also, could you explain what is the advantage of this approach vs using originTable and originColumnName?
There was a problem hiding this comment.
Adjust the method name.
BTW, I can see no usage of the field referencedFields in Analysis. If you removed it, you could simplify creating originColumnDetails by avoiding one level of mapping by the source node.
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Could you explain how it is possible to have more than 1 element in this list?
There was a problem hiding this comment.
Like if we have field for a expression like func(func2(col1, col2), col3) then we might need fetch the OriginColumnDetail for col1, col2 and col3
core/trino-main/src/main/java/io/trino/sql/analyzer/OriginColumnDetail.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
c7fb784 to
2ca4eb0
Compare
|
@kasiafi AC |
This allow us to track column lineage for each column.
2ca4eb0 to
828dbb8
Compare
828dbb8 to
ba5b4ef
Compare
kasiafi
left a comment
There was a problem hiding this comment.
Collecting dependent fields is supported for single-column select expressions.
So, we will get this info for a query like SELECT a, b, c FROM (SELECT ... ).
But we will not get the info for SELECT * FROM (SELECT ...), although they might be equivalent queries. Is this limitation intentional?
I am also concerned about replacing originTable and originColumnName with a list of dependent fields.
In the previous design, we knew that the field is a column reference. Now, we only know that it has a column reference. In some contexts, it might be an important difference.
| analyzer.analyze(expression, scope); | ||
|
|
||
| updateAnalysis(analysis, analyzer, session, accessControl); | ||
| analysis.addReferencedFields(expression, analyzer.getReferencedFields()); |
There was a problem hiding this comment.
Probably referencedFields could be now simplified to Set<Field>. I don't think the mapping by source node is used.
| } | ||
|
|
||
| public void addReferencedFields(Multimap<NodeRef<Node>, Field> references) | ||
| public void addReferencedFields(Expression expression, Multimap<NodeRef<Node>, Field> references) |
| return result.toString(); | ||
| } | ||
|
|
||
| public static class OriginColumnDetail |
There was a problem hiding this comment.
The details are distinct per expression, but if you want to reason about them further, e.g. collect all references from a query, this class will need equals().
Other approach is to maintain both |
This allows us to compute the source column for each field. This is still in a WIP state.