Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
bd9a4b9
[SPARK-46528][BUILD] Upgrade `zstd-jni` to 1.5.5-11
dongjoon-hyun Dec 28, 2023
d2e87c8
[SPARK-46521][PYTHON][DOCS] Refine docstring of `array_remove/array_d…
LuciferYang Dec 28, 2023
d0245d3
[SPARK-46514][TESTS] Fix HiveMetastoreLazyInitializationSuite
yaooqinn Dec 28, 2023
229a4ea
[SPARK-45917][PYTHON][SQL] Automatic registration of Python Data Sour…
HyukjinKwon Dec 28, 2023
ff881da
[SPARK-46517][PS][TESTS][FOLLOWUPS] Reorganize `IndexingTest`: factor…
zhengruifeng Dec 28, 2023
a3e7bad
[SPARK-46529][TESTS] Upgrade guava from 18.0 to 19.0 for `docker-inte…
yaooqinn Dec 28, 2023
93a2526
[SPARK-46366][SQL] Use WITH expression in BETWEEN to avoid duplicate …
dbatomic Dec 28, 2023
bb497eb
[SPARK-46519][SQL] Clear unused error classes from `error-classes.jso…
panbingkun Dec 28, 2023
6fcc268
[SPARK-46532][CONNECT] Pass message parameters in metadata of `ErrorI…
MaxGekk Dec 28, 2023
0de70c4
[SPARK-46517][PS][TESTS][FOLLOWUPS] Reorganize `IndexingTest`: Move t…
zhengruifeng Dec 28, 2023
5db6824
[SPARK-46537][SQL] Convert NPE and asserts from commands to internal …
MaxGekk Dec 28, 2023
af8228c
[SPARK-46535][SQL] Fix NPE when describe extended a column without co…
Zouxxyy Dec 28, 2023
4ec63be
[SPARK-46382][SQL] XML: Capture values interspersed between elements
shujingyang-db Dec 29, 2023
826f8d9
[SPARK-46484][SQL][CONNECT] Make `resolveOperators*` helper functions…
zhengruifeng Dec 29, 2023
b249cb8
[SPARK-46538][ML] Fix the ambiguous column reference issue in `ALSMod…
zhengruifeng Dec 29, 2023
c7c43cf
[SPARK-46533][PYTHON][DOCS] Refine docstring of `array_min/array_max/…
LuciferYang Dec 29, 2023
e9c4e7e
[SPARK-46397][PYTHON][CONNECT] Function `sha2` should raise `PySparkV…
zhengruifeng Dec 29, 2023
f99b86a
[SPARK-45914][PYTHON] Support commit and abort API for Python data so…
allisonwang-db Dec 29, 2023
2e918e8
[SPARK-46532][CONNECT][PYTHON][FOLLOW-UP] Pass message parameters in …
HyukjinKwon Dec 29, 2023
d6334a3
[SPARK-46530][PYTHON][SQL] Check Python executable when looking up av…
HyukjinKwon Dec 30, 2023
9671abb
[SPARK-46509][CORE][SS] Replace `.reverse.find` with `.findLast`
LuciferYang Dec 30, 2023
0eeb60b
[SPARK-46542][SQL] Remove the check for `c>=0` from `ExternalCatalogU…
LuciferYang Dec 30, 2023
04b5128
[SPARK-46545][INFRA] Pin `lxml==4.9.4`
zhengruifeng Dec 30, 2023
af3a225
[SPARK-46531][BUILD] Move the dependency management of `datasketches-…
LuciferYang Dec 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ jobs:
- name: Install Python packages (Python 3.9)
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect')
run: |
python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1'
python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.59.3' 'grpcio-status==1.59.3' 'protobuf==4.25.1'
python3.9 -m pip list
# Run the tests.
- name: Run tests
Expand Down
32 changes: 0 additions & 32 deletions common/utils/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -875,12 +875,6 @@
],
"sqlState" : "42K01"
},
"DATA_SOURCE_ALREADY_EXISTS" : {
"message" : [
"Data source '<provider>' already exists in the registry. Please use a different name for the new data source."
],
"sqlState" : "42710"
},
"DATA_SOURCE_NOT_EXIST" : {
"message" : [
"Data source '<provider>' not found. Please make sure the data source is registered."
Expand Down Expand Up @@ -1480,12 +1474,6 @@
},
"sqlState" : "42K0B"
},
"INCORRECT_END_OFFSET" : {
"message" : [
"Max offset with <rowsPerSecond> rowsPerSecond is <maxSeconds>, but it's <endSeconds> now."
],
"sqlState" : "22003"
},
"INCORRECT_RAMP_UP_RATE" : {
"message" : [
"Max offset with <rowsPerSecond> rowsPerSecond is <maxSeconds>, but 'rampUpTimeSeconds' is <rampUpTimeSeconds>."
Expand Down Expand Up @@ -1906,11 +1894,6 @@
"Operation not found."
]
},
"SESSION_ALREADY_EXISTS" : {
"message" : [
"Session already exists."
]
},
"SESSION_CLOSED" : {
"message" : [
"Session was closed."
Expand Down Expand Up @@ -6065,11 +6048,6 @@
"<walkedTypePath>."
]
},
"_LEGACY_ERROR_TEMP_2142" : {
"message" : [
"Attributes for type <schema> is not supported."
]
},
"_LEGACY_ERROR_TEMP_2144" : {
"message" : [
"Unable to find constructor for <tpe>. This could happen if <tpe> is an interface, or a trait without companion object constructor."
Expand Down Expand Up @@ -6920,11 +6898,6 @@
"<clazz>: <msg>"
]
},
"_LEGACY_ERROR_TEMP_3066" : {
"message" : [
"<msg>"
]
},
"_LEGACY_ERROR_TEMP_3067" : {
"message" : [
"Streaming aggregation doesn't support group aggregate pandas UDF"
Expand Down Expand Up @@ -6980,11 +6953,6 @@
"More than one event time columns are available. Please ensure there is at most one event time column per stream. event time columns: <eventTimeCols>"
]
},
"_LEGACY_ERROR_TEMP_3078" : {
"message" : [
"Can not match ParquetTable in the query."
]
},
"_LEGACY_ERROR_TEMP_3079" : {
"message" : [
"Dynamic partition cannot be the parent of a static partition."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,13 @@ class ClientE2ETestSuite extends RemoteSparkSession with SQLHelper with PrivateM
|""".stripMargin)
.collect()
}
assert(ex.getErrorClass != null)
assert(
ex.getErrorClass ===
"INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER")
assert(
ex.getMessageParameters.asScala == Map(
"datetime" -> "'02-29'",
"config" -> "\"spark.sql.legacy.timeParserPolicy\""))
if (enrichErrorEnabled) {
assert(ex.getCause.isInstanceOf[DateTimeException])
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,10 @@ class PlanGenerationTestSuite
simple.where("a + id < 1000")
}

test("between expr") {
simple.selectExpr("rand(123) BETWEEN 0.1 AND 0.2")
}

test("unpivot values") {
simple.unpivot(
ids = Array(fn.col("id"), fn.col("a")),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -372,10 +372,14 @@ private[client] object GrpcExceptionConverter {
.addAllErrorTypeHierarchy(classes.toImmutableArraySeq.asJava)

if (errorClass != null) {
val messageParameters = JsonMethods
.parse(info.getMetadataOrDefault("messageParameters", "{}"))
.extract[Map[String, String]]
builder.setSparkThrowable(
FetchErrorDetailsResponse.SparkThrowable
.newBuilder()
.setErrorClass(errorClass)
.putAllMessageParameters(messageParameters.asJava)
.build())
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Project [((_common_expr_0#0 >= cast(0.1 as double)) AND (_common_expr_0#0 <= cast(0.2 as double))) AS between(rand(123), 0.1, 0.2)#0]
+- Project [id#0L, a#0, b#0, rand(123) AS _common_expr_0#0]
+- LocalRelation <empty>, [id#0L, a#0, b#0]
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"common": {
"planId": "1"
},
"project": {
"input": {
"common": {
"planId": "0"
},
"localRelation": {
"schema": "struct\u003cid:bigint,a:int,b:double\u003e"
}
},
"expressions": [{
"expressionString": {
"expression": "rand(123) BETWEEN 0.1 AND 0.2"
}
}]
}
}
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -256,4 +256,13 @@ object Connect {
.version("4.0.0")
.booleanConf
.createWithDefault(true)

val CONNECT_GRPC_MAX_METADATA_SIZE =
buildStaticConf("spark.connect.grpc.maxMetadataSize")
.doc(
"Sets the maximum size of metadata fields. For instance, it restricts metadata fields " +
"in `ErrorInfo`.")
.version("4.0.0")
.bytesConf(ByteUnit.BYTE)
.createWithDefault(1024)
}
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ private[connect] object ErrorUtils extends Logging {
"classes",
JsonMethods.compact(JsonMethods.render(allClasses(st.getClass).map(_.getName))))

val maxMetadataSize = SparkEnv.get.conf.get(Connect.CONNECT_GRPC_MAX_METADATA_SIZE)
// Add the SQL State and Error Class to the response metadata of the ErrorInfoObject.
st match {
case e: SparkThrowable =>
Expand All @@ -181,7 +182,12 @@ private[connect] object ErrorUtils extends Logging {
}
val errorClass = e.getErrorClass
if (errorClass != null && errorClass.nonEmpty) {
errorInfo.putMetadata("errorClass", errorClass)
val messageParameters = JsonMethods.compact(
JsonMethods.render(map2jvalue(e.getMessageParameters.asScala.toMap)))
if (messageParameters.length <= maxMetadataSize) {
errorInfo.putMetadata("errorClass", errorClass)
errorInfo.putMetadata("messageParameters", messageParameters)
}
}
case _ =>
}
Expand All @@ -200,8 +206,10 @@ private[connect] object ErrorUtils extends Logging {
val withStackTrace =
if (sessionHolderOpt.exists(
_.session.conf.get(SQLConf.PYSPARK_JVM_STACKTRACE_ENABLED) && stackTrace.nonEmpty)) {
val maxSize = SparkEnv.get.conf.get(Connect.CONNECT_JVM_STACK_TRACE_MAX_SIZE)
errorInfo.putMetadata("stackTrace", StringUtils.abbreviate(stackTrace.get, maxSize))
val maxSize = Math.min(
SparkEnv.get.conf.get(Connect.CONNECT_JVM_STACK_TRACE_MAX_SIZE),
maxMetadataSize)
errorInfo.putMetadata("stackTrace", StringUtils.abbreviate(stackTrace.get, maxSize.toInt))
} else {
errorInfo
}
Expand Down
2 changes: 1 addition & 1 deletion connector/docker-integration-tests/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
<version>19.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1789,7 +1789,7 @@ class KafkaMicroBatchV2SourceSuite extends KafkaMicroBatchSourceSuiteBase {
CheckAnswer(data: _*),
Execute { query =>
// The rate limit is 1, so there must be some delay in offsets per partition.
val progressWithDelay = query.recentProgress.map(_.sources.head).reverse.find { progress =>
val progressWithDelay = query.recentProgress.map(_.sources.head).findLast { progress =>
// find the metrics that has non-zero average offsetsBehindLatest greater than 0.
!progress.metrics.isEmpty && progress.metrics.get("avgOffsetsBehindLatest").toDouble > 0
}
Expand Down
56 changes: 28 additions & 28 deletions core/benchmarks/ZStandardBenchmark-jdk21-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,48 +2,48 @@
Benchmark ZStandardCompressionCodec
================================================================================================

OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure
OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure
AMD EPYC 7763 64-Core Processor
Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
Compression 10000 times at level 1 without buffer pool 672 681 10 0.0 67171.0 1.0X
Compression 10000 times at level 2 without buffer pool 715 718 4 0.0 71458.8 0.9X
Compression 10000 times at level 3 without buffer pool 831 835 4 0.0 83139.1 0.8X
Compression 10000 times at level 1 with buffer pool 609 611 2 0.0 60881.5 1.1X
Compression 10000 times at level 2 with buffer pool 648 649 1 0.0 64791.0 1.0X
Compression 10000 times at level 3 with buffer pool 744 751 6 0.0 74392.4 0.9X
Compression 10000 times at level 1 without buffer pool 674 920 293 0.0 67406.4 1.0X
Compression 10000 times at level 2 without buffer pool 882 884 3 0.0 88195.1 0.8X
Compression 10000 times at level 3 without buffer pool 973 978 4 0.0 97301.3 0.7X
Compression 10000 times at level 1 with buffer pool 955 955 1 0.0 95452.0 0.7X
Compression 10000 times at level 2 with buffer pool 994 996 2 0.0 99432.1 0.7X
Compression 10000 times at level 3 with buffer pool 1093 1101 12 0.0 109300.9 0.6X

OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure
OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure
AMD EPYC 7763 64-Core Processor
Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------------
Decompression 10000 times from level 1 without buffer pool 842 849 12 0.0 84240.0 1.0X
Decompression 10000 times from level 2 without buffer pool 842 846 6 0.0 84185.2 1.0X
Decompression 10000 times from level 3 without buffer pool 843 844 1 0.0 84285.4 1.0X
Decompression 10000 times from level 1 with buffer pool 770 771 1 0.0 77024.9 1.1X
Decompression 10000 times from level 2 with buffer pool 771 771 0 0.0 77120.4 1.1X
Decompression 10000 times from level 3 with buffer pool 770 771 0 0.0 77031.9 1.1X
Decompression 10000 times from level 1 without buffer pool 826 829 3 0.0 82591.4 1.0X
Decompression 10000 times from level 2 without buffer pool 825 826 1 0.0 82533.4 1.0X
Decompression 10000 times from level 3 without buffer pool 827 830 5 0.0 82715.3 1.0X
Decompression 10000 times from level 1 with buffer pool 763 764 1 0.0 76271.6 1.1X
Decompression 10000 times from level 2 with buffer pool 763 777 23 0.0 76321.2 1.1X
Decompression 10000 times from level 3 with buffer pool 763 765 2 0.0 76286.1 1.1X

OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure
OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure
AMD EPYC 7763 64-Core Processor
Parallel Compression at level 3: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers 48 50 3 0.0 376597.0 1.0X
Parallel Compression with 1 workers 41 42 3 0.0 318927.3 1.2X
Parallel Compression with 2 workers 38 40 2 0.0 297410.2 1.3X
Parallel Compression with 4 workers 37 39 1 0.0 287605.8 1.3X
Parallel Compression with 8 workers 39 40 1 0.0 301948.1 1.2X
Parallel Compression with 16 workers 41 43 1 0.0 317095.6 1.2X
Parallel Compression with 0 workers 49 50 1 0.0 384188.1 1.0X
Parallel Compression with 1 workers 42 44 4 0.0 328139.4 1.2X
Parallel Compression with 2 workers 40 42 1 0.0 309013.2 1.2X
Parallel Compression with 4 workers 40 41 1 0.0 309732.2 1.2X
Parallel Compression with 8 workers 41 43 2 0.0 319730.2 1.2X
Parallel Compression with 16 workers 43 45 1 0.0 337944.2 1.1X

OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure
OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure
AMD EPYC 7763 64-Core Processor
Parallel Compression at level 9: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Parallel Compression with 0 workers 174 175 1 0.0 1360596.3 1.0X
Parallel Compression with 1 workers 189 228 24 0.0 1477060.7 0.9X
Parallel Compression with 2 workers 109 118 15 0.0 851455.9 1.6X
Parallel Compression with 4 workers 114 118 3 0.0 891964.9 1.5X
Parallel Compression with 8 workers 115 122 4 0.0 899748.7 1.5X
Parallel Compression with 16 workers 119 123 2 0.0 931210.7 1.5X
Parallel Compression with 0 workers 160 161 1 0.0 1250203.7 1.0X
Parallel Compression with 1 workers 196 197 2 0.0 1529028.2 0.8X
Parallel Compression with 2 workers 114 121 10 0.0 892592.4 1.4X
Parallel Compression with 4 workers 111 113 1 0.0 865617.7 1.4X
Parallel Compression with 8 workers 112 117 2 0.0 878723.8 1.4X
Parallel Compression with 16 workers 114 117 2 0.0 889199.7 1.4X


Loading