Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ package object config {

private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
ConfigBuilder("spark.shuffle.file.buffer")
.doc("Size of the in-memory buffer for each shuffle file output stream. " +
.doc("Size (in KiB) of the in-memory buffer for each shuffle file output stream. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, "in KiB unless otherwise specified"?

Same for the next property below. These two are the only two that aren't in bytes by default, and have a description already. It would be handy to add a blurb about this to all of the "MiB" default properties above this too, for consistency.

"These buffers reduce the number of disk seeks and system calls made " +
"in creating intermediate shuffle files.")
.bytesConf(ByteUnit.KiB)
Expand Down
8 changes: 7 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ The following format is accepted:
1t or 1tb (tebibytes = 1024 gibibytes)
1p or 1pb (pebibytes = 1024 tebibytes)

Without specification the unit depends on the configuration entry where KiB are typically assumed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at the properties that use bytesConf(), there are as many in MiB. And, really the default is just bytes unless otherwise specified. If you say anything here, maybe just

"While numbers without units are generally interpreted as bytes, a few are interpreted as KiB or MiB when no units are specified, for historical reasons. See documentation of individual configuration properties. Specifying units is desirable where possible."


## Dynamically Loading Spark Properties

In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For
Expand Down Expand Up @@ -150,6 +152,7 @@ of the most common options to set are:
<td>
Amount of memory to use for the driver process, i.e. where SparkContext is initialized.
(e.g. <code>1g</code>, <code>2g</code>).
Default unit: MiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere the default isn't bytes, a clause like ", in MiB unless otherwise specified", seems cleanest. There are 9 such properties as far as I can tell.

Although it would be complete to say "in bytes" for all other properties, probably not necessary.


<br /><em>Note:</em> In client mode, this config must not be set through the <code>SparkConf</code>
directly in your application, because the driver JVM has already started at that point.
Expand Down Expand Up @@ -572,9 +575,10 @@ Apart from these, the following properties are also available, and may be useful
<td>
The remote block will be fetched to disk when size of the block is above this threshold.
This is to avoid a giant request takes too much memory. We can enable this config by setting
a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch
a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch
and block manager remote block fetch. For users who enabled external shuffle service,
this feature can only be worked when external shuffle service is newer than Spark 2.2.
Default unit: Bytes.
</td>
</tr>
<tr>
Expand All @@ -591,6 +595,7 @@ Apart from these, the following properties are also available, and may be useful
<td>
Size of the in-memory buffer for each shuffle file output stream. These buffers
reduce the number of disk seeks and system calls made in creating intermediate shuffle files.
Default unit: KiB
</td>
</tr>
<tr>
Expand Down Expand Up @@ -688,6 +693,7 @@ Apart from these, the following properties are also available, and may be useful
When we compress the size of shuffle blocks in HighlyCompressedMapStatus, we will record the
size accurately if it's above this config. This helps to prevent OOM by avoiding
underestimating shuffle block size when fetch shuffle blocks.
Default unit: Bytes
</td>
</tr>
<tr>
Expand Down