-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-53917][CONNECT] Support large local relations - follow-ups #52973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6117,7 +6117,9 @@ object SQLConf { | |
| .doc("The chunk size in bytes when splitting ChunkedCachedLocalRelation.data " + | ||
| "into batches. A new chunk is created when either " + | ||
| "spark.sql.session.localRelationChunkSizeBytes " + | ||
| "or spark.sql.session.localRelationChunkSizeRows is reached.") | ||
| "or spark.sql.session.localRelationChunkSizeRows is reached. " + | ||
| "Limited by the spark.sql.session.localRelationBatchOfChunksSizeBytes, " + | ||
| "a minimum of the two confs is used to determine the chunk size.") | ||
| .version("4.1.0") | ||
| .longConf | ||
| .checkValue(_ > 0, "The chunk size in bytes must be positive") | ||
|
|
@@ -6141,6 +6143,21 @@ object SQLConf { | |
| .bytesConf(ByteUnit.BYTE) | ||
| .createWithDefaultString("3GB") | ||
|
|
||
| val LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES = | ||
| buildConf(SqlApiConfHelper.LOCAL_RELATION_BATCH_OF_CHUNKS_SIZE_BYTES_KEY) | ||
| .internal() | ||
| .doc("Limit on how much memory the client can use when uploading a local relation to the " + | ||
| "server. The client collects multiple local relation chunks into a single batch in " + | ||
| "memory until the limit is reached, then uploads the batch to the server. " + | ||
| "This helps reduce memory pressure on the client when dealing with very large local " + | ||
| "relations because the client does not have to materialize all chunks in memory. " + | ||
| "Limits the spark.sql.session.localRelationChunkSizeBytes, " + | ||
| "a minimum of the two confs is used to determine the chunk size.") | ||
| .version("4.1.0") | ||
| .longConf | ||
| .checkValue(_ > 0, "The batch size in bytes must be positive") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should check if this value is greater than the chunk size value value as an initial step
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the case of conflicts, this conf should be respected and the operation should error out as we wouldn't want to bypass an explicitly set max materialisation size (to avoid system failures) |
||
| .createWithDefault(1 * 1024 * 1024 * 1024L) | ||
|
|
||
| val DECORRELATE_JOIN_PREDICATE_ENABLED = | ||
| buildConf("spark.sql.optimizer.decorrelateJoinPredicate.enabled") | ||
| .internal() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could update the name here to be a bit more explicit in the sense that this pertains to the maximum number of bytes that we will materialise in memory for the specific local relation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(specific because multi-threading can result in multiple artifacts being materialised at once)