Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batched iteration for INSERT INTO queries in StatementExecutionBackend with default max_records_per_batch=1000 #237

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Sep 20, 2023

By default, we execute inserts with the batch size of 1000 records. Tunable by max_records_per_batch

Fixes #226

By default, we execute inserts with the batch size of 1000 records. Tunable by `max_records_per_batch`

Fixes #226
@nfx nfx requested a review from larsgeorge-db as a code owner September 20, 2023 13:19
@nfx nfx changed the title Add batched iteration for StatementExecutionBackend Add batched iteration for INSERT INTO queries in StatementExecutionBackend with default max_records_per_batch=1000 Sep 20, 2023
@nfx nfx merged commit 9ef7ffe into main Sep 20, 2023
@nfx nfx deleted the fix/226 branch September 20, 2023 13:21
FastLee pushed a commit that referenced this pull request Sep 20, 2023
…nBackend` with default `max_records_per_batch=1000` (#237)

By default, we execute inserts with the batch size of 1000 records.
Tunable by `max_records_per_batch`

Fixes #226
@nfx nfx mentioned this pull request Sep 21, 2023
nfx added a commit that referenced this pull request Sep 21, 2023
* Added batched iteration for `INSERT INTO` queries in
`StatementExecutionBackend` with default `max_records_per_batch=1000`
([#237](#237)).
* Added crawler for mount points
([#209](#209)).
* Added crawlers for compatibility of jobs and clusters, along with
basic recommendations for external locations
([#244](#244)).
* Added safe return on grants
([#246](#246)).
* Added ability to specify empty group filter in the installer script
([#216](#216))
([#217](#217)).
* Added ability to install application by multiple different users on
the same workspace ([#235](#235)).
* Added dashboard creation on installation and a requirement for
`warehouse_id` in config, so that the assessment dashboards are
refreshed automatically after job runs
([#214](#214)).
* Added reliance on rate limiting from Databricks SDK for listing
workspace ([#258](#258)).
* Fixed errors in corner cases where Azure Service Principal Credentials
were not available in Spark context
([#254](#254)).
* Fixed `DESCRIBE TABLE` throwing errors when listing Legacy Table ACLs
([#238](#238)).
* Fixed `file already exists` error in the installer script
([#219](#219))
([#222](#222)).
* Fixed `guess_external_locations` failure with `AttributeError:
as_dict` and added an integration test
([#259](#259)).
* Fixed error handling edge cases in `crawl_tables` task
([#243](#243))
([#251](#251)).
* Fixed `crawl_permissions` task failure on folder names containing a
forward slash ([#234](#234)).
* Improved `README` notebook documentation
([#260](#260),
[#228](#228),
[#252](#252),
[#223](#223),
[#225](#225)).
* Removed redundant `.python-version` file
([#221](#221)).
* Removed discovery of account groups from `crawl_permissions` task
([#240](#240)).
* Updated databricks-sdk requirement from ~=0.8.0 to ~=0.9.0
([#245](#245)).
larsgeorge-db pushed a commit that referenced this pull request Sep 23, 2023
* Added batched iteration for `INSERT INTO` queries in
`StatementExecutionBackend` with default `max_records_per_batch=1000`
([#237](#237)).
* Added crawler for mount points
([#209](#209)).
* Added crawlers for compatibility of jobs and clusters, along with
basic recommendations for external locations
([#244](#244)).
* Added safe return on grants
([#246](#246)).
* Added ability to specify empty group filter in the installer script
([#216](#216))
([#217](#217)).
* Added ability to install application by multiple different users on
the same workspace ([#235](#235)).
* Added dashboard creation on installation and a requirement for
`warehouse_id` in config, so that the assessment dashboards are
refreshed automatically after job runs
([#214](#214)).
* Added reliance on rate limiting from Databricks SDK for listing
workspace ([#258](#258)).
* Fixed errors in corner cases where Azure Service Principal Credentials
were not available in Spark context
([#254](#254)).
* Fixed `DESCRIBE TABLE` throwing errors when listing Legacy Table ACLs
([#238](#238)).
* Fixed `file already exists` error in the installer script
([#219](#219))
([#222](#222)).
* Fixed `guess_external_locations` failure with `AttributeError:
as_dict` and added an integration test
([#259](#259)).
* Fixed error handling edge cases in `crawl_tables` task
([#243](#243))
([#251](#251)).
* Fixed `crawl_permissions` task failure on folder names containing a
forward slash ([#234](#234)).
* Improved `README` notebook documentation
([#260](#260),
[#228](#228),
[#252](#252),
[#223](#223),
[#225](#225)).
* Removed redundant `.python-version` file
([#221](#221)).
* Removed discovery of account groups from `crawl_permissions` task
([#240](#240)).
* Updated databricks-sdk requirement from ~=0.8.0 to ~=0.9.0
([#245](#245)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CrawlerBase more resilient
1 participant