Skip to content

Too large queries produce MaxRetryError #413

@rth

Description

@rth

Previously when a too big query was made #383 we got 0 rows as output (as discussed in that issue) . With the changes in #405 for me it now produces an MaxRetryError which is better but the error message is misleading (and also retying so many times is slow).

The minimal code I'm using is,

    from databricks import sql as databricks_sql

    db = databricks_sql.connect(
        server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
        http_path=os.getenv("DATABRICKS_HTTP_PATH"),
        access_token=os.getenv("DATABRICKS_TOKEN"),
        _tls_no_verify=True
    )
    cursor = db.cursor()
    cursor.execute("<my-query>")
    data = cursor.fetchall()

If the query is small, it works with no warnings.

If the query is too big produces the following MaxRetryError with a nested SSLError. There is no way to detect a too big query in the HTTP response status without retying N times and reaching MaxRetryError? And also I have the impression that _tls_no_verify is not passed somewhere in this case, which produces those SSLError. cc @kravets-levko

urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  File "python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='xxxx.blob.core.windows.net', port=443): Max retries exceeded with url: /jobs/999999/sql/2024-07-11/14/results_2024-07-11T14:44:24Z_ef4c56f4-6fc6-43ca-b8dd-009eeb472cd4?sig=xxxx&se=2024-07-11T14%3A59%3A26Z&sv=2019-02-02&spr=https&sp=r&sr=b (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "fetch-databricks.py", line 13, in main
    cursor.execute("select * from sgg.site_cbs_coatify_view")
  File "databricks/sql/client.py", line 768, in execute
    execute_response = self.thrift_backend.execute_command(
  File "databricks/sql/thrift_backend.py", line 869, in execute_command
    return self._handle_execute_response(resp, cursor)
  File "databricks/sql/thrift_backend.py", line 966, in _handle_execute_response
    return self._results_message_to_execute_response(resp, final_operation_state)
  File "databricks/sql/thrift_backend.py", line 770, in _results_message_to_execute_response
    arrow_queue_opt = ResultSetQueueFactory.build_queue(
  File "databricks/sql/utils.py", line 84, in build_queue
    return CloudFetchQueue(
  File "databricks/sql/utils.py", line 175, in __init__
    self.table = self._create_next_table()
  File "databricks/sql/utils.py", line 238, in _create_next_table
    downloaded_file = self.download_manager.get_next_downloaded_file(
  File "databricks/sql/cloudfetch/download_manager.py", line 68, in get_next_downloaded_file
    file = task.result()
  File "python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "databricks/sql/cloudfetch/downloader.py", line 95, in run
    response = session.get(
  File "python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "python3.10/site-packages/requests/adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='xxxxxx.blob.core.windows.net', port=443): Max retries exceeded with url: /jobs/123445/sql/2024-07-11/14/results_2024-07-11T14:44:24Z_ef4c56f4-6fc6-43ca-b8dd-009eeb472cd4?sig=xxxxxxxse=2024-07-11T14%3A59%3A26Z&sv=2019-02-02&spr=https&sp=r&sr=b (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

Versions

requests                      2.32.3
urllib3                       2.2.2
databricks-sql-python  main

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions