Record column lineage details #7465
Conversation
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/OutputColumn.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do you have test for CREATE TABLE LIKE where we should see that columns have nice source columns set.
There was a problem hiding this comment.
In case of LIKE we don't capture the data right ? We just reuse the name and type.
0814ffc to
0032ddd
Compare
core/trino-main/src/main/java/io/trino/sql/analyzer/OutputColumn.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
0032ddd to
4fffa4c
Compare
|
@kokosing , @skrzypo987 AC |
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/eventlistener/OutputColumnMetadata.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
5af1567 to
c7bb2dd
Compare
|
@kasiafi Thanks for the review. AC |
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/analyzer/TestOutput.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/eventlistener/OutputColumnMetadata.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
b3702ed to
6c36a33
Compare
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
e173bb5 to
e1e7426
Compare
|
@kokosing Added tests |
bde7caa to
f7951f3
Compare
kasiafi
left a comment
There was a problem hiding this comment.
A couple of comments.
I went through the code. However, I could not make sure if all sites are covered which should expose source columns. The same about the test coverage.
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/execution/TestEventListenerBasic.java
Outdated
Show resolved
Hide resolved
I guess we could cover all the sites in the incremental way. It is kind of experimental for now. WDYT ? |
f7951f3 to
91d4501
Compare
|
@kasiafi AC |
|
Looks good to me, provided that a more consistent column reporting will be added as a follow-up. |
Sure thing |
588e4c3 to
4d88174
Compare
- Remove unused method
4d88174 to
17f5792
Compare
17f5792 to
ab08046
Compare
|
@Praveen2112 how to view the lineage? what is sample output for a nested create view, ie viewA which selects from viewB which selects from viewC |
|
Same question with @rsaw4. How to view the lineage details? @Praveen2112 |
|
The table level lineage details can be accessed via
For this view we get an List of TableInfo for For column level lineage detail we would get it from
|
|
@Praveen2112 we are trying to leverage this lineage and found that when an expression is being used, source columns are not being captured. For a query like: create table amalakar.new_query_log_1 as
with queries as
(
select * from
hive.default.event_presto_query_logged p2
where ds='2021-12-08' and hr=3
)
SELECT
p1.occurred_at as occurred_at,
substr(p2.query_id, 1, 10) as new_query_id
FROM queries p1
inner join queries p2
ON p1.query_id=p2.query_id
limit 10
Lineage I am seeing is after light transformation of the lineage we get via the QueryIOMetadata: {
"hive.amalakar.new_query_log_1.new_query_id": [],
"hive.amalakar.new_query_log_1.occurred_at": [
{
"columnName": "hive.default.event_presto_query_logged.occurred_at"
}
]
}The transformation code looks like: public Optional<String> getColumnLineage(QueryIOMetadata ioMetadata) {
Map<String, List<UpstreamColumn>> lineage = new HashMap<>();
if (ioMetadata.getOutput().isPresent()) {
QueryOutputMetadata outputMetadata = ioMetadata.getOutput().get();
if (outputMetadata.getColumns().isPresent()) {
List<OutputColumnMetadata> outputColumns = outputMetadata.getColumns().get();
for (OutputColumnMetadata outputColumn : outputColumns) {
List<UpstreamColumn> upstreamColumns =
outputColumn.getSourceColumns().stream()
.map(col -> new UpstreamColumn(getQualifiedColumnName(col)))
.collect(Collectors.toList());
String outputColumnName =
String.format(
"%s.%s", getQualifiedTableName(outputMetadata), outputColumn.getColumnName());
lineage.put(outputColumnName, upstreamColumns);
}
}
...
} |
|
Created: #10272 |
Overrides #7354