Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column lineage parse result error #530

Open
delphisharp opened this issue Jan 4, 2024 · 1 comment
Open

column lineage parse result error #530

delphisharp opened this issue Jan 4, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@delphisharp
Copy link
Contributor

Describe the bug
For example:

v_sql="""
INSERT OVERWRITE TABLE target 
SELECT NVL(PROV_CODE,'999') aaa,COUNT(DISTINCT MSISDN) bbb FROM (
SELECT NVL(PROV_CODE,'000') PROV_CODE,A.MSISDN FROM (
SELECT MSISDN FROM (
	SELECT MSISDN, BUSI_ID
	FROM source1 A
	UNION ALL
	SELECT  concat(A.MSISDN,'20230826') MSISDN,A.MUSIC_BUSI_CODE
	 FROM source2 A
	 union all
	 SELECT concat(A.MSISDN,'20230826') MSISDN, BUSI_code
	FROM source4 A
) C GROUP BY MSISDN
) A
LEFT JOIN source3 B
ON SUBSTRING(A.MSISDN,1,7)=B.MSISDN_NBR_PAR
) T GROUP BY PROV_CODE GROUPING SETS ((),PROV_CODE);
"""
from sqllineage.runner import LineageRunner
parse = LineageRunner(sql=v_sql,dialect='sparksql')
parse.print_column_lineage()
<default>.target.aaa <- t.prov_code <- prov_code
<default>.target.bbb <- t.msisdn <- a.msisdn <- c.msisdn <- <default>.a.msisdn
<default>.target.bbb <- t.msisdn <- a.msisdn <- c.msisdn <- <default>.source1.msisdn

but if sql is lower():

from sqllineage.runner import LineageRunner
parse = LineageRunner(sql=v_sql.lower(),dialect='sparksql')
parse.print_column_lineage()
<default>.target.aaa <- t.prov_code <- prov_code
<default>.target.bbb <- t.msisdn <- a.msisdn <- c.msisdn <- <default>.source1.msisdn
<default>.target.bbb <- t.msisdn <- a.msisdn <- c.msisdn <- <default>.source2.msisdn
<default>.target.bbb <- t.msisdn <- a.msisdn <- c.msisdn <- <default>.source4.msisdn

Expected behavior

  1. when sql is upper(), column lineage bbb is error
  2. column lineage aaa is error. should be is source3

Python version (available via python --version)

  • 3.10.13

SQLLineage version (available via sqllineage --version):

  • 1.4.9
@delphisharp delphisharp added the bug Something isn't working label Jan 4, 2024
@reata
Copy link
Owner

reata commented Jan 6, 2024

There're two issues we need to solve.

  1. when sql is upper(), column lineage bbb is error

Looks like we have some issue with upper case alias used together with UNION. A minimal example with same issue:

INSERT OVERWRITE TABLE TARGET
SELECT MSISDN, BUSI_ID
FROM SOURCE1
UNION ALL
SELECT  CONCAT(A.MSISDN,'20230826') MSISDN, A.MUSIC_BUSI_CODE
FROM SOURCE2 A

Change alias A to lower case a generate correct output.

  1. column lineage aaa is error. should be is source3

Right we we say we don't know if aaa is from subquery a or table source3. But actually we can be smarter, because a is subquery contains only one column named msisdn, which makes table source3 the only possibility. But this "smart logic" is not in our code yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants