-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23238][SQL] Externalize SQLConf configurations exposed in documentation #20403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
4899e33
1f4d288
0c05526
fd7f5c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -123,14 +123,12 @@ object SQLConf { | |
| .createWithDefault(10) | ||
|
|
||
| val COMPRESS_CACHED = buildConf("spark.sql.inMemoryColumnarStorage.compressed") | ||
| .internal() | ||
| .doc("When set to true Spark SQL will automatically select a compression codec for each " + | ||
| "column based on statistics of the data.") | ||
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val COLUMN_BATCH_SIZE = buildConf("spark.sql.inMemoryColumnarStorage.batchSize") | ||
| .internal() | ||
| .doc("Controls the size of batches for columnar caching. Larger batch sizes can improve " + | ||
| "memory utilization and compression, but risk OOMs when caching data.") | ||
| .intConf | ||
|
|
@@ -1043,17 +1041,16 @@ object SQLConf { | |
|
|
||
| val ARROW_EXECUTION_ENABLE = | ||
| buildConf("spark.sql.execution.arrow.enabled") | ||
| .internal() | ||
| .doc("Make use of Apache Arrow for columnar data transfers. Currently available " + | ||
| "for use with pyspark.sql.DataFrame.toPandas with the following data types: " + | ||
| "StringType, BinaryType, BooleanType, DoubleType, FloatType, ByteType, IntegerType, " + | ||
| "LongType, ShortType") | ||
| .doc("When true, make use of Apache Arrow for columnar data transfers. Currently available " + | ||
| "for use with pyspark.sql.DataFrame.toPandas, and " + | ||
| "pyspark.sql.SparkSession.createDataFrame when its input is a Pandas DataFrame. " + | ||
| "The following data types are unsupported: " + | ||
| "MapType, ArrayType of TimestampType, and nested StructType.") | ||
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH = | ||
| buildConf("spark.sql.execution.arrow.maxRecordsPerBatch") | ||
| .internal() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure about conf. https://github.com/apache/spark/pull/19575/files#r164252424 If we want to merge this PR now, maybe revert this change?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can externalize this conf in that PR #19575, if we believe this conf is the one we will use in the long term.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. Let me take this out. Is this only one you are concerned of for now? |
||
| .doc("When using Apache Arrow, limit the maximum number of records that can be written " + | ||
| "to a single ArrowRecordBatch in memory. If set to zero or negative there is no limit.") | ||
| .intConf | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.execution.arrow.maxRecordsPerBatchis also mentioned in the doc change at #19575. Shall we also externalize it?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Let me update
spark.sql.inMemoryColumnarStorage.compressedandspark.sql.inMemoryColumnarStorage.batchSizetoo. These are also exposed but internals.