-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SubQuery with Same Alias Visualized as Same Node #481
Comments
When there are two subqueries with the same name in the set_expression, the column lineage is also incorrect. insert into
public.tgt_tbl1
(
id
)
select
sq.id
from
(
select
id
from
public.src_tbl1
) sq
union all
select
sq.id
from
(
select
id
from
public.src_tbl2
) sq
; ** sqllineage.core.models.SubQuery.__str__ ** def __str__(self):
# return self.alias
return re.sub(r'\s+', ' ', self.query_raw).strip() |
The table level lineage is incorrect, too $ sqllineage -f test.sql --dialect=non-validating
Statements(#): 1
Source Tables:
public.t1
public.t2
Target Tables:
public.t3
$ sqllineage -f test.sql --dialect=redshift
Statements(#): 1
Source Tables:
<default>.cte1
<default>.cte2
public.t1
public.t2
Target Tables:
public.t3 We have some bugs here when handling set expression together with CTE that we mis-identify CTE as normal table. This should be fixed first. |
* fix: similar alias across statements * fix: handling subqueries in a set expression. * refactor: re-use handle table and column logic for set * refactor: make test case atomic * style: black reformat test --------- Co-authored-by: reata <[email protected]>
With #488 merged, now for the SQL insert into
public.t3
(
c1
)
with
cte1 as (
select t1.c1 from public.t1 t1
),
cte2 as (
select t2.c1 from public.t2 t2
)
select
sq1.c1
from
(
select cte1.c1 from cte1 union all select cte2.c1 from cte2
) sq1
; non-validating and redshift generate same result for both table lineage and column lineage: $ python -m sqllineage.cli -f test.sql --dialect=non-validating
Statements(#): 1
Source Tables:
public.t1
public.t2
Target Tables:
public.t3
$ python -m sqllineage.cli -f test.sql --dialect=redshift
Statements(#): 1
Source Tables:
public.t1
public.t2
Target Tables:
public.t3
$ python -m sqllineage.cli -f test.sql --dialect=non-validating -l column
public.t3.c1 <- sq1.c1 <- cte1.c1 <- public.t1.c1
public.t3.c1 <- sq1.c1 <- cte2.c1 <- public.t2.c1
$ python -m sqllineage.cli -f test.sql --dialect=redshift -l column
public.t3.c1 <- sq1.c1 <- cte1.c1 <- public.t1.c1
public.t3.c1 <- sq1.c1 <- cte2.c1 <- public.t2.c1 |
Thank you very much, boss. ✅ |
two subqueries with the same name in the set_expression is the only remaining buggy sql in this issue that we will take care of in #489 . |
The results of column lineage using non-validating and Redshift dialects are inconsistent.
SQL
To Reproduce
Note here we refer to SQL provided in prior step as stored in a file named
test.sql
Expected behavior
Python version (available via
python --version
)SQLLineage version (available via
sqllineage --version
):Additional context
The text was updated successfully, but these errors were encountered: