Add batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` #237

nfx · 2023-09-20T13:19:49Z

By default, we execute inserts with the batch size of 1000 records. Tunable by max_records_per_batch

Fixes #226

By default, we execute inserts with the batch size of 1000 records. Tunable by `max_records_per_batch` Fixes #226

…nBackend` with default `max_records_per_batch=1000` (#237) By default, we execute inserts with the batch size of 1000 records. Tunable by `max_records_per_batch` Fixes #226

* Added batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` ([#237](#237)). * Added crawler for mount points ([#209](#209)). * Added crawlers for compatibility of jobs and clusters, along with basic recommendations for external locations ([#244](#244)). * Added safe return on grants ([#246](#246)). * Added ability to specify empty group filter in the installer script ([#216](#216)) ([#217](#217)). * Added ability to install application by multiple different users on the same workspace ([#235](#235)). * Added dashboard creation on installation and a requirement for `warehouse_id` in config, so that the assessment dashboards are refreshed automatically after job runs ([#214](#214)). * Added reliance on rate limiting from Databricks SDK for listing workspace ([#258](#258)). * Fixed errors in corner cases where Azure Service Principal Credentials were not available in Spark context ([#254](#254)). * Fixed `DESCRIBE TABLE` throwing errors when listing Legacy Table ACLs ([#238](#238)). * Fixed `file already exists` error in the installer script ([#219](#219)) ([#222](#222)). * Fixed `guess_external_locations` failure with `AttributeError: as_dict` and added an integration test ([#259](#259)). * Fixed error handling edge cases in `crawl_tables` task ([#243](#243)) ([#251](#251)). * Fixed `crawl_permissions` task failure on folder names containing a forward slash ([#234](#234)). * Improved `README` notebook documentation ([#260](#260), [#228](#228), [#252](#252), [#223](#223), [#225](#225)). * Removed redundant `.python-version` file ([#221](#221)). * Removed discovery of account groups from `crawl_permissions` task ([#240](#240)). * Updated databricks-sdk requirement from ~=0.8.0 to ~=0.9.0 ([#245](#245)).

Add batched iteration for StatementExecutionBackend

86bea9c

By default, we execute inserts with the batch size of 1000 records. Tunable by `max_records_per_batch` Fixes #226

nfx requested a review from larsgeorge-db as a code owner September 20, 2023 13:19

nfx changed the title ~~Add batched iteration for StatementExecutionBackend~~ Add batched iteration for INSERT INTO queries in StatementExecutionBackend with default max_records_per_batch=1000 Sep 20, 2023

nfx merged commit 9ef7ffe into main Sep 20, 2023

nfx deleted the fix/226 branch September 20, 2023 13:21

nfx mentioned this pull request Sep 21, 2023

Release v0.1.1 #261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` #237

Add batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` #237

nfx commented Sep 20, 2023

Add batched iteration for INSERT INTO queries in StatementExecutionBackend with default max_records_per_batch=1000 #237

Add batched iteration for INSERT INTO queries in StatementExecutionBackend with default max_records_per_batch=1000 #237

Conversation

nfx commented Sep 20, 2023

Add batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` #237

Add batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` #237