Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throw exception when the statement missing the semicolon as splitter #159

Closed
CuteLemon opened this issue Jun 23, 2021 · 3 comments
Closed
Assignees
Labels
enhancement New feature or request parser something that requires a strict/validating SQL parser

Comments

@CuteLemon
Copy link

I'm confused if the result is expected:


--MSSQL
-- saved as test.sql
select *
into t_0
from t0

--notice that the there is no  semicolon between the two statement
select *
into t_1
from 
t1

-- also no semicolon 

drop table if exists t2;

select *
into t2
from t_1
union ALL
select * 
from t_0

sqllineage -g -f test.sql

image

If I add a semicolon between the first two statement, the graph output is normal.

@reata
Copy link
Owner

reata commented Jun 25, 2021

Sorry for the late reply. This is both expected and unexpected.

It's expected because as you can see from Text View in UI (or toggle -v for verbose option in command line), this SQL is recognized as 2 statements SQL. The first one clearly is syntactically wrong. But because we're using sqlparse as parser, which according to its homepage, is a "non-validating SQL parser". In other words, we don't validate against the SQL syntax, we just try our best to parse it. This approach has its advantage and disadvantage. The disadvantage is obvious, we had this issue coming. On the other side, talking about the good part, personally I never used MSSQL before, so I'm amazed to see SQL like select * into t_0 from t0 is valid and sqllineage even supports it.

And it's unexpected because as a user, I'd like the library to throw me an error, or at least give me some warnings in case like this, as opposed to just showing this confusing graph, which leads me nowhere for debugging. Of course it's user's responsibility to get the syntax right, whilst it's library's responsibility to at least warn the user about this. Although I have no clue at this point how we're going to detect potential syntax error so as to warn user, let's leave this ticket open for future reference.

@reata reata added enhancement New feature or request parser something that requires a strict/validating SQL parser labels Mar 2, 2022
@reata reata self-assigned this Mar 11, 2023
@reata
Copy link
Owner

reata commented Mar 11, 2023

After releasing v1.4.0 next month, we shall be able to use --dialect=tsql in command line, or in python LineageRunner(sql, dialect="tsql") to leverage our new parsing capability. sqllineage will throw a InvalidSyntaxException for most syntax error.

However, this missing semicolon issue is not well handled yet, because it will be parsed as Batches of SQL Statements (https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/batches-of-sql-statements?view=sql-server-ver16)
by mistake. We need to contact upstream sqlfluff for fix.

@reata reata changed the title parser get the error output when the statement missing the semicolon throw exception when the statement missing the semicolon as splitter Aug 13, 2023
@reata
Copy link
Owner

reata commented Aug 13, 2023

We will close this issue as we already support throwing exception generally when semicolon is missing.

With tsql, I'm surprised to understand that semicolon is actually not a must-have. Currently after closing #422 , we trigger a SyntaxWarning, letting people know that SQLLineage is not guaranteed to output correct result with tsql when semicolon is missing. We'll try supporting that feature in #384 .

@reata reata closed this as completed Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parser something that requires a strict/validating SQL parser
Projects
None yet
Development

No branches or pull requests

2 participants