Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support json_tuple in SELECT clause in Hive #483

Closed
Light-Towers opened this issue Nov 30, 2023 · 6 comments
Closed

Support json_tuple in SELECT clause in Hive #483

Light-Towers opened this issue Nov 30, 2023 · 6 comments
Labels
enhancement New feature or request parser something that requires a strict/validating SQL parser

Comments

@Light-Towers
Copy link

Describe the bug

  • No dialect parameters are normal. Use Hive dialect will error.

SQL
Paste the SQL text here. For example:

INSERT OVERWRITE table dw.dim_dict partition (dt)
select get_json_object(replace(t.id, '$', ''), '$.oid') as id,
       t.industry_code,
       t.industry,
       t.sec_industry_code,
       t.sec_industry,
       t.type,
       t.version,
       current_date                                     as dt
from (select json_tuple(line, '_id', 'industryCode', 'industry', 'secIndustryCode', 'secIndustry', 'type', 'version')
                 as (id, industry_code, industry, sec_industry_code, sec_industry, type, version)
      from default.tmp_json) t;

To Reproduce

# -*- coding: utf-8 -*-
from sqllineage.runner import LineageRunner

def test_create_as():
    sql = """
INSERT OVERWRITE table dw.dim_dict partition (dt)
select get_json_object(replace(t.id, '$', ''), '$.oid') as id,
       t.industry_code,
       t.industry,
       t.sec_industry_code,
       t.sec_industry,
       t.type,
       t.version,
       current_date                                     as dt
from (select json_tuple(line, '_id', 'industryCode', 'industry', 'secIndustryCode', 'secIndustry', 'type', 'version')
                 as (id, industry_code, industry, sec_industry_code, sec_industry, type, version)
      from default.tmp_json) t;
    """
    # result = LineageRunner(sql)
    result = LineageRunner(sql, 'hive')
    result.print_column_lineage()
    print(result.source_tables)
    print(result.target_tables)

if __name__ == "__main__":
    test_create_as()
Line 11, Position 18: Found unparsable section: 'as (id, industry_code, industry, sec_ind...'

Python version (available via python --version)

  • 3.10.11

SQLLineage version (available via sqllineage --version):

  • 1.4.8
@Light-Towers Light-Towers added the bug Something isn't working label Nov 30, 2023
@reata
Copy link
Owner

reata commented Dec 3, 2023

Can you confirm this SQL is executable and share the Hive version you're using?

Based on Hive Doc, JSON_TUPLE must be used together with LATERAL VIEW.

On the other hand, SparkSQL Doc does explicitly say they support JSON_TUPLE used in SELECT clause. And this SQL is parsable if you specify dialect=sparksql.

@reata reata added the question Further information is requested label Dec 3, 2023
@Light-Towers
Copy link
Author

Thank you for your reply !

My hive version: 3.1.3
I use JSON_TUPLE without LATERAL VIEW successful 😂 , Maybe I should use official recommended grammar.
image

@reata
Copy link
Owner

reata commented Dec 4, 2023

Thanks for the info. Since it's executable, I'll follow up with sqlfluff community to get this syntax supported in Hive dialect.

@reata reata changed the title Hive syntax optimize Support json_tuple in SELECT clause in Hive Dec 4, 2023
@reata reata added enhancement New feature or request and removed bug Something isn't working question Further information is requested labels Dec 4, 2023
@reata
Copy link
Owner

reata commented Dec 24, 2023

This is the same issue as sqlfluff/sqlfluff#3160, I'll try create a PR for that.

@reata
Copy link
Owner

reata commented Dec 28, 2023

Upstream PR sqlfluff/sqlfluff#5495 is merged. We will wait for next release of sqlfluff, and upgrade the dependency version, this syntax will be supported then.

@reata reata added the parser something that requires a strict/validating SQL parser label Feb 4, 2024
@reata
Copy link
Owner

reata commented Apr 7, 2024

Close as complete of #593

@reata reata closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parser something that requires a strict/validating SQL parser
Projects
None yet
Development

No branches or pull requests

2 participants