Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Assessment.assess_workflows task fails with <class 'astroid.exceptions.AstroidSyntaxError> #3457

Open
1 task done
bwmann89 opened this issue Dec 19, 2024 · 7 comments · May be fixed by #3498
Open
1 task done
Assignees
Labels
bug Something isn't working upstream these issues are caused by upstream dependencies wontfix This will not be worked on

Comments

@bwmann89
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

parse_logs task is failing with the following error message repeated in the stderr output:
return`: <class 'astroid.exceptions.AstroidSyntaxError'>: Parsing Python code failed:
invalid syntax (, line 68). Report this as an issue on UCX GitHub.

Expected Behavior

This task should complete successfully without any errors

Steps To Reproduce

Databricks E2 environment with 0.52 version of UCX. Running UCX [assessment] job in an environment where we have many managed tables that are in a separate Glue catalog.

Cloud

AWS

Operating System

Linux

Version

latest via Databricks CLI

Relevant log output

/Users/[email protected]/_git_branch/IDRC-3728/tmsis/scripts/common/python/snowflake_gzip_load:73 [system-error] Failed to parse code `def fn_snowflake_updt_etl_ld_cntl(state, submitting_state, tmsrunid, stage_table):
  
  """
  
  Insert a row into the etl load control table with the Snowflake load status for the passed stage table.    
  
  Parameters:
    state:  character stage code
    submitting_state:  string with numeric 2 position submitting state
    tmsrunid:  bigint tms_run_id
    stage_table:  string stage table name
    
  Returns:
    none  
    
  """ 

  etl_ld_cntl_delta = f'{parm_pdlDB}.etl_ld_cntl_delta'
  vw_etl_ld_cntl_delta = f'{parm_pdlDB}.vw_etl_ld_cntl_delta'
  
  etl_stream = fn_util_get_tmsis_sub_stream(stage_table)
  
  idrcLogger.info(f"Inserting a row into {etl_ld_cntl_delta} for the Snowflake stage table copy of table {stage_table}")
  
  insert_sql = (f"""insert into {etl_ld_cntl_delta} 
                    select                     
                       pdl_stg_database
                      ,pdl_stg_table
                      ,pdl_stg_row_cnt
                      ,pdl_stg_load_ts
                      ,'{sf_stg_schema}' as snowflake_stg_schema     
                      ,'{stage_table}' as snowflake_stg_table      
                      ,pdl_stg_row_cnt as snowflake_stg_row_cnt    
                      ,current_timestamp as snowflake_stg_load_ts    
                      ,snowflake_tgt_schema     
                      ,snowflake_tgt_table      
                      ,snowflake_tgt_row_cnt    
                      ,snowflake_tgt_load_ts    
                      ,teradata_stg_copy_ts     
                      ,fil_procd_cd
                      ,'Snowflake stage table load complete' as fil_procd_desc
                      ,'{state}' as state
                      ,'{submitting_state}' as submitting_state
                      ,{tmsrunid} as tms_run_id
                      ,'{etl_stream}' as etl_stream
                      ,current_timestamp as idr_insrt_ts
                    from {vw_etl_ld_cntl_delta}
                    where state = '{state}'
                      and tms_run_id = {tmsrunid}
                      and etl_stream = '{etl_stream}'
                      and pdl_stg_table = '{stage_table}'
                """)

  try:
      
    spark.sql(insert_sql)
                    
  except:
      
    idrcLogger.debug(" ")
    idrcLogger.info(f'An insert into table {etl_ld_cntl_delta} has failed, inserting a row for stage table {stage_table} with the Snowflake stage copy stats.')
    idrcLogger.info(f'Notebook: /IDRC/tmsis/scripts/common/python/snowflake_gzip_load')
    idrcLogger.info( 'Location: fn_snowflake_updt_etl_ld_cntl')
    idrcLogger.info(f"Insert SQL cmd:")
    idrcLogger.info(insert_sql)
    idrcLogger.debug(" ")
    idrcLogger.info( 'Please see the spark log.  Also you could refer to the Databricks job to get an url of the executed notebook.')
    %tb
    idrcLogger.flush()
    sys.exit(4)
  
  return`: <class 'astroid.exceptions.AstroidSyntaxError'>: Parsing Python code failed:
invalid syntax (<unknown>, line 68). Report this as an issue on UCX GitHub.
/Users/[email protected]/_git_branch/IDRC-3728/tmsis/scripts/claim/python/tmsis_claim_etl_wf:586 [dependency-cannot-compute-value] Can't check dependency from dbutils.notebook.run(f'{parm_clm_python_folder}/clm_val_src_vld_err', 0, args) because the expression cannot be computed
/Users/[email protected]/_git_branch/IDRC-3728/tmsis/scripts/claim/python/tmsis_claim_etl_wf:756 [dependency-cannot-compute-value] Can't check dependency from dbutils.notebook.run(f'{parm_clm_python_folder}/claim_etl_final_step', 0, trnsfrm_args) because the expression cannot be computed
/Users/[email protected]/_git_branch/IDRC-3728/tmsis/scripts/claim/python/tmsis_claim_etl_wf:818 [dependency-cannot-compute-value] Can't check dependency from dbutils.notebook.run(f'{parm_clm_python_folder}/clm_val_trns', 0, args) because the expression cannot be computed
18:46:24  INFO [d.l.blueprint.parallel][linting_workflows_1] linting workflows 222/222, rps: 0.017/sec
18:46:24  WARN [d.l.blueprint.parallel] Some 'linting workflows' tasks failed: 99% results available (219/222). Took 3:43:14.305104
18:46:24  INFO [d.l.u.source_code.jobs] Saving 1721 linting problems...
18:47:04  WARN [d.l.u.source_code.jobs] Errors occurred during linting:
[Errno 2] No such file or directory: '/tmp/ucx-2raqwa_m/snowflake/connector/vendored/requests/utils.py'
[Errno 2] No such file or directory: '/tmp/ucx-2raqwa_m/snowflake/connector/log_configuration.py'
[Errno 2] No such file or directory: '/tmp/ucx-2raqwa_m/snowflake/connector/cursor.py'
Thu Dec 19 18:47:20 2024 Connection to spark from PID  53177
Thu Dec 19 18:47:20 2024 Initialized gateway on port 45399
Thu Dec 19 18:47:20 2024 Connected to spark.
@bwmann89 bwmann89 added bug Something isn't working needs-triage labels Dec 19, 2024
@JCZuurmond JCZuurmond changed the title [BUG]: parse_logs task failing with return`: <class 'astroid.exceptions.AstroidSyntaxError'>: Parsing Python code failed: invalid syntax [BUG]: Assessment.assess_workflows task fails with <class 'astroid.exceptions.AstroidSyntaxError> Dec 20, 2024
@JCZuurmond
Copy link
Member

@bwmann89 : Thanks for reporting. The parse_logs tasks finds logged failures in other tasks. Could you share the debug logs from the assess_workflows task. Specifically, I am looking for the stack trace that leads to this error

@bwmann89
Copy link
Author

bwmann89 commented Jan 2, 2025

Sure @JCZuurmond I've attached the cluster logs for stdout, stderr, and log4jactive, as well as the task output, which is under parse_logs_run_output.txt.
ucx_out_1219.zip

@JCZuurmond
Copy link
Member

Hi @bwmann89 , could you share the output of the assess_workflows task instead of the parse_logs task?

@bwmann89
Copy link
Author

bwmann89 commented Jan 6, 2025

Hi @JCZuurmond my apologies, just attached assess_workflows task output, and this is for UCX 0.53.1, but I encountered the same issue.
assess_workflows.txt

@JCZuurmond
Copy link
Member

Could you share the logs by downloading the file from the logs folder? These logs are truncated: *** WARNING: max output size exceeded, skipping output. ***

@bwmann89
Copy link
Author

bwmann89 commented Jan 7, 2025

Yeah, just attached that.
assess_workflows.log

@JCZuurmond JCZuurmond self-assigned this Jan 8, 2025
@JCZuurmond JCZuurmond added this to UCX Jan 8, 2025
@JCZuurmond JCZuurmond moved this to Ready for Review in UCX Jan 8, 2025
@JCZuurmond JCZuurmond added wontfix This will not be worked on upstream these issues are caused by upstream dependencies and removed needs-triage labels Jan 8, 2025
@JCZuurmond
Copy link
Member

@bwmann89 : Thank you. The code being linted contains a syntax error as it contains Jupyter magic. Currently, we only support valid Python code, excluding Jupyter syntax, hence I marked this issue as a "won't fix". See the linked PR for more details, we expect you to solve this manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream these issues are caused by upstream dependencies wontfix This will not be worked on
Projects
Status: Ready for Review
Development

Successfully merging a pull request may close this issue.

2 participants