Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparkSQL: reserved keyword ORDER as identifier name #566

Open
xiao321 opened this issue Jan 24, 2024 · 1 comment
Open

SparkSQL: reserved keyword ORDER as identifier name #566

xiao321 opened this issue Jan 24, 2024 · 1 comment
Labels
bug Something isn't working parser something that requires a strict/validating SQL parser

Comments

@xiao321
Copy link

xiao321 commented Jan 24, 2024

SparkSQL complicated 'with' sql not support parse

SQL
sqllineage -f ./sql.sql --dialect=sparksql

WITH fenqun AS
         (  SELECT g,
                   user_id,
                   dt
            FROM (
                     SELECT g,
                            user_id,
                            dt,
                            collect_set(TYPE) AS TYPE
                     FROM (

                              SELECT
                                  user_id,
                                  dt
                              FROM dwd_ddcx.dwd_trf_ubr_active_user_detail_di m1

                              UNION ALL
                              SELECT
                                  user_id,
                                  dt
                              FROM dwd_ddcx.dwd_trf_ubr_active_user_detail_di m1

                          )m1
                     GROUP BY g,
                              user_id,
                              dt
                 )m2
         ),
     ORDER AS
         ( SELECT dt,driver_user_id as user_id
           FROM dwd_ddcx.dwd_trd_def_carpool_order_detail_di
         ),
     goods_payment AS
         ( SELECT to_date(pay_time) AS dt,
                  gp.user_id
           FROM dwd_ddcx.dwd_mkt_act_goods_order_di gp
 ),
     hexiao AS
         ( SELECT to_date(o.give_driver_remit_time) AS dt,
                  gp.user_id,
                  sum(o.cash_coupon_amt) AS coupon_credit
           FROM
               (SELECT m1.*,
                       coupon_id
                FROM dwd_ddcx.dwd_mkt_act_goods_order_di m1
                         INNER JOIN dim_ddcx.dim_mkt_act_goods_order_coupond_rel_di m2 ON m1.order_code=m2.order_code

               )gp
                   INNER JOIN dwd_ddcx.dwd_trd_def_carpool_order_detail_di o ON gp.coupon_id=o.driver_coupon_id
   ),
     FINAL AS
         (SELECT
              m1.type,
              m1.dt,
              g,
              count(DISTINCT m1.user_id) AS `人数`,

          FROM fenqun m1
                   LEFT JOIN goods_payment m2 ON m1.user_id=m2.user_id
              AND m1.dt=m2.dt
                   LEFT JOIN
               ORDER m3 ON m1.user_id=m3.user_id
                   AND m1.dt=m3.dt
                   LEFT JOIN hexiao m4 ON m1.user_id=m4.user_id
              AND m1.dt=m4.dt
          GROUP BY m1.type,m1.dt,
                   g),
     FINAL1 AS
         (SELECT *
          FROM FINAL

         )
SELECT type,dt,
       g
FROM
    (select type,dt,g

     FROM
         (SELECT type,dt,g
          from
              (select
                   m1.type,m1.dt,m1.g
               FROM FINAL m1
                        LEFT JOIN FINAL1 m2 ON m1.dt=m2.dt and m1.type=m2.type)a1 )a2 )a3

order by dt desc,g;

Line 1, Position 1: Found unparsable section: 'WITH fenqun AS\n ( SELECT g,\n ...'

python 3.8
sqllineage 1.5.0

@xiao321 xiao321 added the bug Something isn't working label Jan 24, 2024
@reata
Copy link
Owner

reata commented Jan 29, 2024

The problem is with ORDER being identifier:

ORDER AS
         ( SELECT dt,driver_user_id as user_id
           FROM dwd_ddcx.dwd_trd_def_carpool_order_detail_di
         )

Change ORDER to another regular name fixes the problem.

More background:

In almost every other sql engines, ORDER is a reserved keyword that cannot be used as identifier name. Spark is different that by default it doesn't distinguish reserved keyword vs non-reserved keyword unless user set spark.sql.ansi.enabled=true and set spark.sql.ansi.enforceReservedKeywords=true.

Reference:

Unfortunately, the parser we're using (sqlfluff) assumes every dialect distinguishes reserved keyword vs non-reserved keyword and it writes spark parser this way to be ANSI compatible. I'm afraid this is not an issue that can be easily fixed. See upstream issue for reference:

@reata reata changed the title SparkSQL complicated 'with' sql not support parse SparkSQL: ORDER as identifier name Jan 29, 2024
@reata reata changed the title SparkSQL: ORDER as identifier name SparkSQL: reserved keyword ORDER as identifier name Jan 29, 2024
@reata reata added the parser something that requires a strict/validating SQL parser label Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser something that requires a strict/validating SQL parser
Projects
None yet
Development

No branches or pull requests

2 participants