Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError raised using parenthesized where clause #426

Closed
noxiaofan opened this issue Aug 9, 2023 · 4 comments
Closed

AttributeError raised using parenthesized where clause #426

noxiaofan opened this issue Aug 9, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@noxiaofan
Copy link

When parsing same sql, LineageRunner failed while command line could give right result.
Here is sql I used: (dialect: mysql)

insert into LM_risk_engine_request_increase
select sls.*
from(
    select id,
    user_id,
    sign,
    type,
    request_node,
    status,
    ctime,
    mtime
    from `in-loan-market-db`.market_risk_engine_request
    where (ctime >= '{sql_para_1}' and ctime < '{sql_para_2}')
)sls
left join(
    select id,status
    from LM_risk_engine_request
    where (ctime >= '{sql_para_1}' and ctime < '{sql_para_2}')
)egn on sls.id=egn.id
where egn.id is null or egn.status='0'
;

Result of command line:

Statements(#): 1
Source Tables:
    <default>.lm_risk_engine_request
    in-loan-market-db.market_risk_engine_request
Target Tables:
    <default>.lm_risk_engine_request_increase

Python script:

from sqllineage.runner import LineageRunner

sql_file = '/path/to/sql/test.sql'
with open(sql_file, 'r', ) as f:
    sql = f.read()
    result = LineageRunner(sql, dialect='mysql')
    print(result)

Result of python api:

Traceback (most recent call last):
  File "/path/to/script/script.py", line 7, in <module>
    print(result)
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/runner.py", line 22, in wrapper
    self._eval()
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in _eval
    self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/runner.py", line 177, in <listcomp>
    self._stmt_holders = [analyzer.analyze(stmt) for stmt in self._stmt]
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/analyzer.py", line 63, in analyze
    lineage_holder = extractor.extract(
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/dml_insert_extractor.py", line 90, in extract
    self._extract_select(holder, segment)
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/dml_insert_extractor.py", line 120, in _extract_select
    holder |= DmlSelectExtractor(self.dialect).extract(
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/dml_select_extractor.py", line 71, in extract
    holder |= self.extract(sq.query, AnalyzerContext(sq, holder.cte))
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/dml_select_extractor.py", line 54, in extract
    for sq in self.parse_subquery(segment):
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/lineage_holder_extractor.py", line 71, in parse_subquery
    result = cls._parse_subquery(get_subqueries(segment))
  File "/home/BI_anaconda/anaconda3/envs/BI_python3.9.7_202209/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/utils.py", line 85, in get_subqueries
    expression_segments = segment.get_child("expression").segments or []
AttributeError: 'NoneType' object has no attribute 'segments'

And I'm using sqllineage v1.4.6, both python 3.9 and python 3.10 got same result.

@noxiaofan
Copy link
Author

Strangely, if dialect was not assigned, python api could print result correctly except for a DeprecationWarning:
Python script:

from sqllineage.runner import LineageRunner

sql_file = '/path/to/sql/test.sql'
with open(sql_file, 'r', ) as f:
    sql = f.read()
    result = LineageRunner(sql, dialect='mysql')
    print(result)

Output:
DeprecationWarning: dialect non-validating is deprecated, use ansi or dialect of your SQL instead. non-validating will stop being the default dialect in v1.5.x release and be completely removed in v1.6.x
result = LineageRunner(sql)
Statements(#): 1
Source Tables:
.lm_risk_engine_request
in-loan-market-db.market_risk_engine_request
Target Tables:
.lm_risk_engine_request_increase

@reata
Copy link
Owner

reata commented Aug 12, 2023

Did you specify mysql as dialect in command line?

sqllineage -f test.sql --dialect=mysql

I believe this generate the exact same exception as python api. Can you kindly confirm?

@reata reata added the question Further information is requested label Aug 12, 2023
@noxiaofan
Copy link
Author

Yes, command line generated same error as python api if specify mysql as dialect.
So it turns out to be a problem of dialect? But this sql has been validated and I believe there is no syntax error.

@reata
Copy link
Owner

reata commented Aug 14, 2023

The problem is with parenthesized where clause. After removing the parenthesis, it's working fine with following SQL using mysql dialect:

insert into LM_risk_engine_request_increase
select sls.*
from(
    select id,
    user_id,
    sign,
    type,
    request_node,
    status,
    ctime,
    mtime
    from `in-loan-market-db`.market_risk_engine_request
    where ctime >= '{sql_para_1}' and ctime < '{sql_para_2}'
)sls
left join(
    select id,status
    from LM_risk_engine_request
    where ctime >= '{sql_para_1}' and ctime < '{sql_para_2}'
)egn on sls.id=egn.id
where egn.id is null or egn.status='0'
;

It's not limited to mysql only. All sqlfluff implementation has the same issue. A super simplified test case with ansi dialect also trigger the same exception:

SELECT * FROM dual
WHERE (1=1);
sqllineage -f test.sql --dialect=ansi

We need to get this fixed.

@reata reata added bug Something isn't working and removed question Further information is requested labels Aug 14, 2023
@reata reata changed the title AttributeError raised using api while successfully parsed using command line AttributeError raised using parenthesized where clause Aug 14, 2023
@reata reata closed this as completed Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants