-
Notifications
You must be signed in to change notification settings - Fork 29.3k
[SPARK-26700][CORE] enable fetch-big-block-to-disk by default #23625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -626,19 +626,6 @@ Apart from these, the following properties are also available, and may be useful | |
| You can mitigate this issue by setting it to a lower value. | ||
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.maxRemoteBlockSizeFetchToMem</code></td> | ||
| <td>Int.MaxValue - 512</td> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to clarify, you intentionally moved this from shuffle section to network section since it affects both the shuffle fetch and block manager fetches?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea |
||
| <td> | ||
| The remote block will be fetched to disk when size of the block is above this threshold in bytes. | ||
| This is to avoid a giant request that takes too much memory. By default, this is only enabled | ||
| for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are | ||
| available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much | ||
| memory on smaller blocks as well. Note this configuration will affect both shuffle fetch | ||
| and block manager remote block fetch. For users who enabled external shuffle service, | ||
| this feature can only be used when external shuffle service is newer than Spark 2.2. | ||
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.shuffle.compress</code></td> | ||
| <td>true</td> | ||
|
|
@@ -1519,6 +1506,17 @@ Apart from these, the following properties are also available, and may be useful | |
| you can set larger value. | ||
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.maxRemoteBlockSizeFetchToMem</code></td> | ||
|
srowen marked this conversation as resolved.
|
||
| <td>200m</td> | ||
| <td> | ||
| Remote block will be fetched to disk when size of the block is above this threshold | ||
| in bytes. This is to avoid a giant request takes too much memory. Note this | ||
| configuration will affect both shuffle fetch and block manager remote block fetch. | ||
| For users who enabled external shuffle service, this feature can only work when | ||
| external shuffle service is at least 2.3.0. | ||
| </td> | ||
| </tr> | ||
| </table> | ||
|
|
||
| ### Scheduling | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit.
less than or equal to?