[pull] master from apache:master#50
Merged
pull[bot] merged 12 commits intowangyum:masterfrom Oct 23, 2022
Merged
Conversation
…files in Connect Python client ### What changes were proposed in this pull request? 1. Improve README for how to install `buf` dependency and then run the proto generated file script. 2. Improve the message for out-of-sync check script. ### Why are the changes needed? Improve developer experience either when touching Connect proto files or not touch those files but having master branch out of sync with local branch. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing checks. Closes #38335 from amaliujia/python_generate_buf_read_me. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? 1, fix the code gen cmd; 2, add the check cmd ### Why are the changes needed? fix the wrong cmd ### Does this PR introduce _any_ user-facing change? yes, user will use correct cmd and know how to check ### How was this patch tested? doc only Closes #38339 from zhengruifeng/connect_doc_nit. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…fy workflow ### What changes were proposed in this pull request? Upgrade actions/github-scripts from v3 to v6 and fix notify workflow ### Why are the changes needed? Node.js 12 actions are deprecated. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/. - Since github-script V5, change from `github.*` to `github.rest.*`, but `request`, `paginate` are unchanged. see also https://github.com/actions/github-script#breaking-changes-in-v5 - Since github-script V6, upgrade node12 to node16 ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? - Due to `pull_request_target`, the current PR is not in effect, we can only do test on local : set default branch to V6, and submit the PR Yikun#181 Notify works well: <img width="850" alt="image" src="https://user-images.githubusercontent.com/1736354/197310102-6c709716-8a99-422d-8d38-3f770b6925f0.png"> Update status set to failed as expeceted: <img width="898" alt="image" src="https://user-images.githubusercontent.com/1736354/197310119-30332769-0553-4ffa-816c-97a5ec0b3c27.png"> And `See test results` set right. https://github.com/Yikun/spark/pull/181/checks?check_run_id=9029035780 Closes #38341 from Yikun/upgrade-actions. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…nabled
### What changes were proposed in this pull request?
This PR fixes a bug in broadcast handling `PythonRunner` when encryption is enabed. Due to this bug the following pyspark script:
```
bin/pyspark --conf spark.io.encryption.enabled=true
...
bar = {"a": "aa", "b": "bb"}
foo = spark.sparkContext.broadcast(bar)
spark.udf.register("MYUDF", lambda x: foo.value[x] if x else "")
spark.sql("SELECT MYUDF('a') AS a, MYUDF('b') AS b").collect()
```
fails with:
```
22/10/21 17:14:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 811, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 87, in read_command
command = serializer._read_with_length(file)
File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
return self.loads(obj)
File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 471, in loads
return cloudpickle.loads(obj, encoding=encoding)
EOFError: Ran out of input
```
The reason for this failure is that we have multiple Python UDF referencing the same broadcast and in the current code:
https://github.com/apache/spark/blob/748fa2792e488a6b923b32e2898d9bb6e16fb4ca/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L385-L420
the number of broadcasts (`cnt`) is correct (1) but the broadcast id is serialized 2 times from JVM to Python ruining the next item that Python expects from JVM side.
Please note that the example above works in Spark 3.3 without this fix. That is because #36121 in Spark 3.4 modified `ExpressionSet` and so `udfs` in `ExtractPythonUDFs`:
https://github.com/apache/spark/blob/748fa2792e488a6b923b32e2898d9bb6e16fb4ca/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala#L239-L242
changed from `Stream` to `Vector`. When `broadcastVars` (and so `idsAndFiles`) is a `Stream` the example accidentaly works as the broadcast id is written to `dataOut` once (`oldBids.add(id)` in `idsAndFiles.foreach` is called before the 2nd item is calculated in `broadcastVars.flatMap`). But that doesn't mean that #36121 introduced the regression as `EncryptedPythonBroadcastServer` shouldn't serve the broadcast data 2 times (which `EncryptedPythonBroadcastServer` does now, but it is not noticed) as it could fail other cases when there are more than 1 broadcast used in UDFs).
### Why are the changes needed?
To fix a bug.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added new UT.
Closes #38334 from peter-toth/SPARK-40874-fix-broadcasts-in-python-udf.
Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? pin 'grpcio==1.48.1' 'protobuf==4.21.6' ### Why are the changes needed? to avoid connect-related conflict due to package upgrade ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? CI Closes #38338 from zhengruifeng/infra_pin_connect. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? Upgrade docker actions to cleanup warning ### Why are the changes needed? - docker/setup-qemu-action from v1 to v2: https://github.com/docker/setup-qemu-action/releases/tag/v2.0.0: Cleanup node 12 warning https://github.com/docker/setup-qemu-action/releases/tag/v2.1.0: Cleanup set-ouput save-state waring - docker/setup-buildx-action from v1 to v2: https://github.com/docker/setup-buildx-action/releases/tag/v2.0.0: Cleanup node 12 warning https://github.com/docker/setup-buildx-action/releases/tag/v2.1.0: Cleanup set-ouput save-state waring - docker/build-push-action from v2 to v3 https://github.com/docker/build-push-action/releases/tag/v3.0.0: Cleanup node 12 warning https://github.com/docker/build-push-action/releases/tag/v3.2.0: Cleanup set-ouput save-state waring - docker/login-action from v1 to v2 https://github.com/docker/login-action/releases/tag/v2.0.0: Cleanup node 12 warning https://github.com/docker/login-action/releases/tag/v2.1.0: Cleanup set-ouput save-state waring ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? CI passed Closes #38342 from Yikun/upgrade-docker. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What changes were proposed in this pull request? The pr aim to Update the error template of WRONG_NUM_PARAMS. ### Why are the changes needed? More general. ### Does this PR introduce _any_ user-facing change? Yes the PR changes user-facing error messages. ### How was this patch tested? - Existed UT. - Pass GA. Closes #38319 from panbingkun/update_WRONG_NUM_PARAMS_template. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
… arithmetic ops ### What changes were proposed in this pull request? Migrate the following errors in QueryExecutionErrors onto use error classes: unscaledValueTooLargeForPrecisionError -> UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION decimalPrecisionExceedsMaxPrecisionError -> DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION integerOverflowError -> INTEGER_OVERFLOW outOfDecimalTypeRangeError -> OUT_OF_DECIMAL_TYPE_RANGE ### Why are the changes needed? Porting ArithmeticExceptions to the new error framework ### Does this PR introduce _any_ user-facing change? Yes, errors will indicate that it's controlled Spark exception ### How was this patch tested? ./build/sbt "catalyst/testOnly org.apache.spark.sql.types.DecimalSuite" ./build/sbt "sql/testOnly org.apache.spark.sql.execution.streaming.sources.RateStreamProviderSuite" ./build/sbt "core/testOnly testOnly org.apache.spark.SparkThrowableSuite" Closes #38273 from khalidmammadov/error_class2. Lead-authored-by: Khalid Mammadov <khalidmammadov9@gmail.com> Co-authored-by: khalidmammadov <khalidmammadov9@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…BC_TRANSACTION ### What changes were proposed in this pull request? This pr aims to Add one test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION to QueryExecutionErrorsSuite. ### Why are the changes needed? Add one test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION to QueryExecutionErrorsSuite. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manual test: ``` ./build/sbt "sql/testOnly *QueryExecutionErrorsSuite*" ``` All tests passed. Closes #38351 from panbingkun/SPARK-40391. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…onto error classes ### What changes were proposed in this pull request? This pr replace TypeCheckFailure by DataTypeMismatch in type checks in the string expressions, includes: - regexpExpressions.scala (RegExpReplace) - stringExpressions.scala (Etl) ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? - Add new UT - Update existed UT - Pass GA. Closes #38299 from panbingkun/SPARK-40756. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
…s onto error classes ### What changes were proposed in this pull request? In the PR, I propose to add new error sub-classes of the error class `DATATYPE_MISMATCH`, and use it in the case of type check failures of some interval expressions. ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages, and improves search-ability of errors. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? By running the affected test suites: ``` $ build/sbt "test:testOnly *AnalysisSuite" $ build/sbt "test:testOnly *ExpressionTypeCheckingSuite" $ build/sbt "test:testOnly *ApproxCountDistinctForIntervalsSuite" ``` Closes #38237 from MaxGekk/type-check-fails-interval-exprs. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? Bump Jackson Databind from 2.13.4.1 to 2.13.4.2 ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? There is a regression about Gradle in 2.13.4.1 and got fixed in 2.13.4.2 FasterXML/jackson-databind#3627 ### How was this patch tested? Existing UT. Closes #38355 from pan3793/SPARK-40886. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Sean Owen <srowen@gmail.com>
pull bot
pushed a commit
that referenced
this pull request
Jul 21, 2025
…ingBuilder` ### What changes were proposed in this pull request? This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression. ### Why are the changes needed? Since Java 9, `String Concatenation` has been handled better by default. | ID | DESCRIPTION | | - | - | | JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) | For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly. **CODE CHANGE** ```java - return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE) - .append("appId", appId) - .append("execId", execId) - .append("blockIds", Arrays.toString(blockIds)) - .toString(); + return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" + + Arrays.toString(blockIds) + "]"; ``` **BEFORE** ``` public java.lang.String toString(); Code: 0: new #39 // class org/apache/commons/lang3/builder/ToStringBuilder 3: dup 4: aload_0 5: getstatic #41 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle; 8: invokespecial #47 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V 11: ldc #50 // String appId 13: aload_0 14: getfield #7 // Field appId:Ljava/lang/String; 17: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 20: ldc #55 // String execId 22: aload_0 23: getfield #13 // Field execId:Ljava/lang/String; 26: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 29: ldc #56 // String blockIds 31: aload_0 32: getfield #16 // Field blockIds:[Ljava/lang/String; 35: invokestatic #57 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 38: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 41: invokevirtual #61 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String; 44: areturn ``` **AFTER** ``` public java.lang.String toString(); Code: 0: aload_0 1: getfield #7 // Field appId:Ljava/lang/String; 4: aload_0 5: getfield #13 // Field execId:Ljava/lang/String; 8: aload_0 9: getfield #16 // Field blockIds:[Ljava/lang/String; 12: invokestatic #39 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 15: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 20: areturn ``` ### Does this PR introduce _any_ user-facing change? No. This is an `toString` implementation improvement. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51572 from dongjoon-hyun/SPARK-52880. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )