Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,11 @@ package object config {
"external shuffle service, this feature can only be worked when external shuffle" +
"service is newer than Spark 2.2.")
.bytesConf(ByteUnit.BYTE)
.createWithDefault(Long.MaxValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original purpose to set to Long.MaxValue is to avoid using this configuration by default, user should set to a proper size to enable this feature. But anyway I think the current change is also fine.

// fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
// as well use fetch-to-disk in that case. The message includes some metadata in addition
// to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
// extra room.
.createWithDefault(Int.MaxValue - 512)

private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
Expand Down
10 changes: 6 additions & 4 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -580,13 +580,15 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
<td>Long.MaxValue</td>
<td>Int.MaxValue - 512</td>
<td>
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
This is to avoid a giant request takes too much memory. We can enable this config by setting
a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch
This is to avoid a giant request that takes too much memory. By default, this is only enabled
for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
and block manager remote block fetch. For users who enabled external shuffle service,
this feature can only be worked when external shuffle service is newer than Spark 2.2.
this feature can only be used when external shuffle service is newer than Spark 2.2.
</td>
</tr>
<tr>
Expand Down