Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't handle parenthesized from clause #278

Closed
mishbahr opened this issue Jul 28, 2022 · 7 comments · Fixed by #419
Closed

Can't handle parenthesized from clause #278

mishbahr opened this issue Jul 28, 2022 · 7 comments · Fixed by #419
Labels
bug Something isn't working parser something that requires a strict/validating SQL parser

Comments

@mishbahr
Copy link

Using sqllineage==1.3.5

Given auto generated query from a third party package.

query = """
    WITH tbl1 AS (
         SELECT table1.id AS user_id
           FROM db1.table1
        )
    SELECT tbl1.user_id,
    table1.foo,
    table2.bar,
    FROM ((tbl1
     LEFT JOIN db1.table2 USING (user_id))
     LEFT JOIN db1.table3 USING (user_id))
"""

Input

from sqllineage.runner import LineageRunner

[...]

result = LineageRunner(query)

for tbl in result.source_tables:
    print(tbl)

Output (incorrect)

db1.table1

However if I remove the extra brackets from the query:

query = """
    WITH tbl1 AS (
         SELECT table1.id AS user_id
           FROM db1.table1
        )
    SELECT tbl1.user_id,
    table1.foo,
    table2.bar,
    FROM tbl1
     LEFT JOIN db1.table2 USING (user_id)
     LEFT JOIN db1.table3 USING (user_id)
"""

Output:

db1.table1
db1.table2
db1.table3
@reata reata added bug Something isn't working parser something that requires a strict/validating SQL parser labels Aug 21, 2022
@reata
Copy link
Owner

reata commented Aug 21, 2022

Thanks for reporting it, issue acknowledged.

I already fixed a lot of edge cases regarding brackets, but this seems never ending. I need to rethink how should we handle this.

@cuong-pham
Copy link

cuong-pham commented Sep 1, 2022

I have another example where the parser just failed completely:

SELECT * FROM ( (SELECT * FROM A) A1 JOIN (SELECT * FROM B) B1 ON A1.id=B1.id )
sqllineage.exceptions.SQLLineageException: An Identifier is expected, got Token[value: ] instead.

If i removed the bracket

SELECT * FROM (SELECT * FROM A) A1 JOIN (SELECT * FROM B) B1 ON A1.id=B1.id
it works

EDIT: I think with the first query, there's a space between the 2 brackets, that caused the exception

@reata
Copy link
Owner

reata commented Sep 4, 2022

@cuong-pham Thanks for reporting this. Yes the space is one problem, which I can make it go away easily. However, the following SQL still fails to parse out any source tables.
SELECT * FROM ((SELECT * FROM A) A1 JOIN (SELECT * FROM B) B1 ON A1.id=B1.id)
This is the real problem. And indeed it's yet another bracket problem.

@JustinJanz
Copy link

adding one more example to the mix as this might be a frequent case - here i am only able to get the source and target table lineage and not the column lineage.

Query I was trying out is :

INSERT INTO db.schema.table1

( col1 , col2 , col3)
(SELECT column1 AS tab1.col1, tab1.column2 AS col2, tab3.col3
FROM
db2.schema2.table2 tab2 JOIN db2.schema2.table3 tab3 ON tab2.column1 = tab3.col1);

if I format the sql a bit like below by removing the brackets for the select statement i get the column lineage

INSERT INTO db.schema.table1
( col1 , col2 , col3)
SELECT column1 AS tab1.col1, tab1.column2 AS col2, tab3.col3
FROM
db2.schema2.table2 tab2 JOIN db2.schema2.table3 tab3 ON tab2.column1 = tab3.col1;

@reata
Copy link
Owner

reata commented Mar 11, 2023

@mishbahr @cuong-pham Can you let me know the SQL dialect you're using? I tried MySQL and it's valid syntax, not sure if you're using other database.

@JustinJanz Please refer to #244 for your case.

@cuong-pham
Copy link

@reata I was using Athena for that query.

@mishbahr
Copy link
Author

@reata I was using Postgres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser something that requires a strict/validating SQL parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants