Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Dec 9, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

HyukjinKwon and others added 8 commits December 9, 2022 16:21
### What changes were proposed in this pull request?

This PR proposes to document the correct way of running Spark Connect tests with `--parallelism 1` option in `./python/run-tests` script.

### Why are the changes needed?

Without this option, the tests cannot run due to the port conflicts. It fails with port already in use.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually ran the commands.

Closes #38992 from HyukjinKwon/minor-docs-test.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…ame.unpivot`

### What changes were proposed in this pull request?
Implement `DataFrame.melt` and `DataFrame.unpivot` with a proto message

1. Implement `DataFrame.melt` and `DataFrame.unpivot` for scala API
2. Implement `DataFrame.melt` and `DataFrame.unpivot` for python API

### Why are the changes needed?
for Connect API coverage

### Does this PR introduce _any_ user-facing change?
'No'. New API

### How was this patch tested?
New test cases.

Closes #38973 from beliefer/SPARK-41439.

Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…indows

### What changes were proposed in this pull request?

This PR enhances the Maven build configuration to automatically detect and switch between using Powershell for Windows and Bash for non-Windows OS to generate `spark-version-info.properties` file.

### Why are the changes needed?

While building Spark, the `spark-version-info.properties` file [is generated using bash](https://github.com/apache/spark/blob/d62c18b7497997188ec587e1eb62e75c979c1c93/core/pom.xml#L560-L564). In Windows environment, if Windows Subsystem for Linux (WSL) is installed, it somehow overrides the other bash executables in the PATH, as noted in SPARK-40739. The bash in WSL has a different mounting configuration and thus, [the target location specified for spark-version-info.properties](https://github.com/apache/spark/blob/d62c18b7497997188ec587e1eb62e75c979c1c93/core/pom.xml#L561-L562) won't be the expected location. Ultimately, this leads to `spark-version-info.properties` to get excluded from the spark-core jar, thus causing the SparkContext initialization to fail with the above depicted error message.

This PR fixes the issue by directing the build system to use the right shell according to the platform.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I tested this by building on a Windows 10 PC.

```psh
mvn -Pyarn '-Dhadoop.version=3.3.0' -DskipTests clean package
```

Once the build finished, I verified that `spark-version-info.properties` file was included in the spark-core jar.

![image](https://user-images.githubusercontent.com/10280768/205497898-80e53617-c991-460e-b04a-a3bdd4f298ae.png)

I also ran the SparkPi application and verified that it ran successfully without any errors.

![image](https://user-images.githubusercontent.com/10280768/205499567-f6e8e10a-dcbb-45fb-b282-fc29ba58adee.png)

Closes #38903 from GauthamBanasandra/spark-version-info-ps.

Authored-by: Gautham Banasandra <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?
Remove hive-vector-code-gen and its dependent jars from spark distribution

### Why are the changes needed?
hive-vector-code-gen is not used in spark

Remove it to avoid vulnerability scanners's alert.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Passed current test cases

Closes #38978 from zhouyifan279/SPARK-39948.

Authored-by: zhouyifan279 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?

Fix typo in SqlBaseLexer.g4

### Why are the changes needed?

Better documentation

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #38990 from jiaoqingbo/typefix.

Authored-by: jiaoqb <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…mal` to set null values for decimals in an unsafe row

### What changes were proposed in this pull request?

Change `InterpretedMutableProjection` to use `setDecimal` rather than `setNullAt` to set null values for decimals in unsafe rows.

### Why are the changes needed?

The following returns the wrong answer:

```
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+
```
This is because `InterpretedMutableProjection` inappropriately uses `InternalRow#setNullAt` on unsafe rows to set null for decimal types with precision > `Decimal.MAX_LONG_DIGITS`.

When `setNullAt` is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when `InterpretedMutableProjection` calls `setDecimal` on that field, `UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

[Codegen for mutable projection](https://github.com/apache/spark/blob/89b2ee27d258dec8fe265fa862846e800a374d8e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1729) uses `mutableRow.setDecimal` for null decimal values regardless of precision or the type for `mutableRow`, so this PR does the same.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes #38923 from bersprockets/unsafe_decimal_issue.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…and more input dataset types

### What changes were proposed in this pull request?
1, support schema;
2, support more types: ndarray, list

### Why are the changes needed?
for API coverage

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
added types

Closes #38979 from zhengruifeng/connect_create_df.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request?
According to https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html:
`pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before panda 1.3.0, should skip the check

```
ERROR [0.180s]: test_style (pyspark.pandas.tests.test_dataframe.DataFrameTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5795, in test_style
    check_style()
  File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5793, in check_style
    self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex())
AttributeError: 'Styler' object has no attribute 'to_latex'
```

Related: 58375a8

### Why are the changes needed?
This test break the 3.2 branch pyspark test (with python 3.6 + pandas 1.1.x), so I think better add the `skipIf` it.

See also #38982 (comment)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed
- Test on 3.2 branch: Yikun#194, https://github.com/Yikun/spark/actions/runs/3655564439/jobs/6177030747

Closes #39002 from Yikun/skip-check.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
dongjoon-hyun and others added 3 commits December 12, 2022 11:32
### What changes were proposed in this pull request?

This PR is a follow-up to fix Scala 2.13 test failure by using `toAttributeMap` before comparion.

### Why are the changes needed?

To recover Scala 2.13 tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manually run Scala 2.13.
```
$ dev/change-scala-version.sh 2.13
$ build/sbt -Pscala-2.13 "sql/testOnly *.DataSourceV2SQLSuiteV1Filter -- -z SPARK-41378"
...
[info] DataSourceV2SQLSuiteV1Filter:
[info] - SPARK-41378: test column stats (3 seconds, 312 milliseconds)
[info] Run completed in 11 seconds, 159 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes #39038 from dongjoon-hyun/SPARK-41378.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?
Implement `collection` functions: P~Z, except:
1, `transform`, `transform_keys`, `transform_values`, `zip_with`
2, parameter `options` in `to_csv`, `to_json`, `schema_of_csv`, `schema_of_json`

### Why are the changes needed?
API coverage

### Does this PR introduce _any_ user-facing change?
new functions

### How was this patch tested?
added UT

Closes #39033 from zhengruifeng/connect_function_collect_3.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request?

This PR aims to upgrade the minimum Minikube version to 1.18.0 to 1.28.0 for Apache Spark 3.4.0.

### Why are the changes needed?

Minikube v1.28.0 is released on `Nov 4th, 2022` and the latest one as of today.
- https://github.com/kubernetes/minikube/releases/tag/v1.28.0

GitHub Action CI has been using `Minikube 1.28.0` to test `K8s v1.25.3` and Homebrew also provides `1.28.0` by default.
- https://github.com/apache/spark/actions/runs/3681318787/jobs/6227888255
```
* minikube v1.28.0 on Ubuntu 20.04
...
* Downloading Kubernetes v1.25.3 preload ...
```

In addition, we can choose different K8s versions on this latest Minikube like the following.
```
$ minikube start --kubernetes-version=1.21.0
😄  minikube v1.28.0 on Darwin 13.1 (arm64)
❗  Kubernetes 1.21.0 has a known performance issue on cluster startup. It might take 2 to 3 minutes for a cluster to start.
❗  For more information, see: kubernetes/kubeadm#2395
✨  Automatically selected the docker driver
📌  Using Docker Desktop driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
💾  Downloading Kubernetes v1.21.0 preload ...
```

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

Closes #39043 from dongjoon-hyun/SPARK-41502.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…comment

### What changes were proposed in this pull request?

This PR aims to update R version number comment from 4.0.4 to 4.1.2 in R Dockerfile.

### Why are the changes needed?

Apache Spark 3.3 used `R 4.0.4`, but Apache Spark 3.4 is using `R 4.1.2` in master branch.
```
$ docker run -it --rm apache/spark-r:3.3.1 R --version | grep 'R version'
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
```

```
$ docker run -it --rm kubespark/spark-r:dev R --version | grep 'R version'
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- GitHub Action logs show it.
    - https://github.com/apache/spark/actions/runs/3681318787/jobs/6227888255
```
Get:304 http://archive.ubuntu.com/ubuntu jammy/universe amd64 r-base-core amd64 4.1.2-1ubuntu2 [26.0 MB]
```

Closes #39044 from dongjoon-hyun/SPARK-41504.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@github-actions github-actions bot added the R label Dec 13, 2022
jianghaonan and others added 8 commits December 13, 2022 12:43
…ariable of *_PROTOC_EXEC_PATH

### What changes were proposed in this pull request?
This PR unify the environment variable of `*_PROTOC_EXEC_PATH` to support that users can use the same environment variable to build and test `core`, `connect`, `protobuf` module by use profile named `-Puser-defined-protoc` with specifying custom `protoc` executables.

### Why are the changes needed?
As described in [SPARK-41485](https://issues.apache.org/jira/browse/SPARK-41485), at present, there are 3 similar environment variable of `*_PROTOC_EXEC_PATH`, but they use the same `pb` version. Because they are consistent in compilation, so I unify the environment variable names to simplify.

### Does this PR introduce _any_ user-facing change?
No, the way to using official pre-release `protoc` binary files is activated by default.

### How was this patch tested?
- Pass GitHub Actions
- Manual test on CentOS6u3 and CentOS7u4
```bash
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
./build/mvn clean install -pl core -Puser-defined-protoc -am -DskipTests -DskipDefaultProtoc
./build/mvn clean install -pl connector/connect/common -Puser-defined-protoc -am -DskipTests
./build/mvn clean install -pl connector/protobuf -Puser-defined-protoc -am -DskipTests
./build/mvn clean test -pl core -Puser-defined-protoc -DskipDefaultProtoc
./build/mvn clean test -pl connector/connect/common -Puser-defined-protoc
./build/mvn clean test -pl connector/protobuf -Puser-defined-protoc
```
and
```bash
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
./build/sbt clean "core/compile" -Puser-defined-protoc
./build/sbt clean "connect-common/compile" -Puser-defined-protoc
./build/sbt clean "protobuf/compile" -Puser-defined-protoc
./build/sbt "core/test" -Puser-defined-protoc
./build/sbt  "connect-common/test" -Puser-defined-protoc
./build/sbt  "protobuf/test" -Puser-defined-protoc
```

Closes #39036 from WolverineJiang/master.

Authored-by: jianghaonan <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request?

Upgrade Protobuf version to 3.21.11

### Why are the changes needed?

There are some bug fixes in the latest release https://github.com/protocolbuffers/protobuf/releases:
* Use bit-field int values in buildPartial to skip work on unset groups of fields. (protocolbuffers/protobuf#10960)
* Mark nested builder as clean after clear is called (protocolbuffers/protobuf#10984)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests

Closes #39042 from gengliangwang/upgradeProtobuf.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
…s under the current working directory on the driver in K8S cluster mode

### What changes were proposed in this pull request?
This PR will place spark.files , spark.jars and spark.pyfiles to the current working directory on the driver in K8s cluster mode

### Why are the changes needed?
This mimics the behaviour of Yarn and also helps user to access files from PWD . Also as mentioned in the jira
By doing this, users can, for example, leverage PEX to manage Python dependences in Apache Spark:
```
pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested via unit test cases and also ran on local K8s cluster.

Closes #37417 from pralabhkumar/rk_k8s_local_resource.

Authored-by: pralabhkumar <[email protected]>
Signed-off-by: Holden Karau <[email protected]>
…EGACY_ERROR_TEMP_0020`

### What changes were proposed in this pull request?
This pr aims to reuse error class `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_1020`.

### Why are the changes needed?
Proper names of error classes to improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions.

Closes #39025 from LuciferYang/SPARK-41481.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…tExpressions

### What changes were proposed in this pull request?
This is a follow-up PR to #39010 to handle `NamedLambdaVariable`s too.

### Why are the changes needed?
To avoid possible issues with higer-order functions.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #39046 from peter-toth/SPARK-41468-fix-planexpressions-in-equivalentexpressions-follow-up.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…to make it more generic

### What changes were proposed in this pull request?
The pr aims to refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Update existed UT.
Pass GA.

Closes #38937 from panbingkun/SPARK-41406.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…P_1234

### What changes were proposed in this pull request?
In the PR, I propose to assign the name ANALYZE_UNCACHED_TEMP_VIEW to the error class _LEGACY_ERROR_TEMP_1234.

### Why are the changes needed?
Proper names of error classes should improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT.
Pass GA.

Closes #39018 from panbingkun/LEGACY_ERROR_TEMP_1234.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…usedByApp when driver is shutting down

### What changes were proposed in this pull request?

Treating container `AllocationFailure` as not "exitCausedByApp" when driver is shutting down.

The approach is suggested at #36991 (comment)

### Why are the changes needed?

I observed some Spark Applications successfully completed all jobs but failed during the shutting down phase w/ reason: Max number of executor failures (16) reached, the timeline is

Driver - Job success, Spark starts shutting down procedure.
```
2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped Spark74e9431b{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at http://hadoop2627.xxx.org:28446
2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all executors
```

Driver - A container allocate successful during shutting down phase.
```
2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container container_e94_1649986670278_7743380_02_000025 on host hadoop4388.xxx.org for executor with ID 24 for ResourceProfile Id 0
```

Executor - The executor can not connect to driver endpoint because driver already stopped the endpoint.
```
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
  at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
  at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
  at scala.collection.immutable.Range.foreach(Range.scala:158)
  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
  ... 4 more
Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find endpoint: spark://CoarseGrainedSchedulerhadoop2627.xxx.org:21956
  at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148)
  at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144)
  at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
  at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99)
  at scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138)
  at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
```

Driver - YarnAllocator received container launch error message and treat it as `exitCausedByApp`
```
2022-06-23 19:52:27 CST YarnAllocator INFO - Completed container container_e94_1649986670278_7743380_02_000025 on host: hadoop4388.xxx.org (state: COMPLETE, exit status: 1)
2022-06-23 19:52:27 CST YarnAllocator WARN - Container from a bad node: container_e94_1649986670278_7743380_02_000025 on host: hadoop4388.xxx.org. Exit status: 1. Diagnostics: [2022-06-23 19:52:24.932]Exception from container-launch.
Container id: container_e94_1649986670278_7743380_02_000025
Exit code: 1
Shell output: main : command provided 1
main : run as user is bdms_pm
main : requested yarn user is bdms_pm
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /mnt/dfs/2/yarn/local/nmPrivate/application_1649986670278_7743380/container_e94_1649986670278_7743380_02_000025/container_e94_1649986670278_7743380_02_000025.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

[2022-06-23 19:52:24.938]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873)
  at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
  at scala.concurrent.Promise.trySuccess(Promise.scala:94)
  at scala.concurrent.Promise.trySuccess$(Promise.scala:94)
  at scala.concurrent.impl.Promise$DefaultPromise.trySuccess(Promise.scala:187)
  at org.apache.spark.rpc.netty.NettyRpcEnv.onSuccess$1(NettyRpcEnv.scala:225)
  at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7(NettyRpcEnv.scala:246)
  at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7$adapted(NettyRpcEnv.scala:246)
  at org.apache.spark.rpc.netty.RpcOutboxMessage.onSuccess(Outbox.scala:90)
  at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:195)
  at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
  at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
```

Driver - Eventually application failed because ”failed“ executor reached threshold
```
2022-06-23 19:52:30 CST ApplicationMaster INFO - Final app status: FAILED, exitCode: 11, (reason: Max number of executor failures (16) reached)
```
### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Update UT.

Closes #38622 from pan3793/SPARK-39601.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
@github-actions github-actions bot added the YARN label Dec 13, 2022
HyukjinKwon and others added 12 commits December 13, 2022 23:34
…gisterBlockManager in MiMa

### What changes were proposed in this pull request?

This PR is a followup of #38876 that excludes BlockManagerMessages.RegisterBlockManager in MiMa compatibility check.

### Why are the changes needed?

It fails in MiMa check presumably with Scala 2.13 in other branches. Should be safer to exclude them all in the affected branches.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Filters copied from error messages. Will monitor the build in other branches.

Closes #39052 from HyukjinKwon/SPARK-41360-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…n Project

### What changes were proposed in this pull request?
This PR implements a new feature: Implicit lateral column alias  on `Project` case, controlled by `spark.sql.lateralColumnAlias.enableImplicitResolution` temporarily (default false now, but will turn on this conf once the feature is completely merged).

#### Lateral column alias
View https://issues.apache.org/jira/browse/SPARK-27561 for more details on lateral column alias.
There are two main cases to support: LCA in Project, and LCA in Aggregate.
```sql
-- LCA in Project. The base_salary references an attribute defined by a previous alias
SELECT salary AS base_salary, base_salary + bonus AS total_salary
FROM employee

-- LCA in Aggregate. The avg_salary references an attribute defined by a previous alias
SELECT dept, average(salary) AS avg_salary, avg_salary + average(bonus)
FROM employee
GROUP BY dept
```
This **implicit** lateral column alias (no explicit keyword, e.g. `lateral.base_salary`) should be supported.

#### High level design
This PR defines a new Resolution rule, `ResolveLateralColumnAlias` to resolve the implicit lateral column alias, covering the `Project` case.
It introduces a new leaf node NamedExpression, `LateralColumnAliasReference`, as a placeholder used to hold a referenced that has been temporarily resolved as the reference to a lateral column alias.

The whole process is generally divided into two phases:
1) recognize **resolved** lateral alias, wrap the attributes referencing them with `LateralColumnAliasReference`.
 2) when the whole operator is resolved, unwrap `LateralColumnAliasReference`. For Project, it further resolves the attributes and push down the referenced lateral aliases to the new Project.

For example:
```
// Before
Project [age AS a, 'a + 1]
+- Child

// After phase 1
Project [age AS a, lateralalias(a) + 1]
+- Child

// After phase 2
Project [a, a + 1]
+- Project [child output, age AS a]
   +- Child
```

#### Resolution order
Given this new rule, the name resolution order will be (higher -> lower):
```
local table column > local metadata attribute > local lateral column alias > all others (outer reference of subquery, parameters of SQL UDF, ..)
```

There is a recent refactor that moves the creation of `OuterReference` in the Resolution batch: #38851.
Because lateral column alias has higher resolution priority than outer reference, it will try to resolve an `OuterReference` using lateral column alias, similar as an `UnresolvedAttribute`. If success, it strips `OuterReference` and also wraps it with `LateralColumnAliasReference`.

### Why are the changes needed?
The lateral column alias is a popular feature wanted for a long time. It is supported by lots of other database vendors (Redshift, snowflake, etc) and provides a better user experience.

### Does this PR introduce _any_ user-facing change?
Yes, as shown in the above example, it will be able to resolve lateral column alias. I will write the migration guide or release note when most PRs of this feature are merged.

### How was this patch tested?
Existing tests and newly added tests.

Closes #38776 from anchovYu/SPARK-27561-refactor.

Authored-by: Xinyi Yu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
… super.stop()

### What changes were proposed in this pull request?

This is a followup of #38622, I just notice that the `YarnClusterSchedulerBackend#stop` missed calling `super.stop()`.

### Why are the changes needed?

Followup previous change, otherwise Spark may not shutdown properly on Yarn cluster mode.

### Does this PR introduce _any_ user-facing change?

No, unreleased change.

### How was this patch tested?

Existing UT.

Closes #39053 from pan3793/SPARK-39601-followup.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?
This pr aims upgrade dropwizard metrics to 4.2.13.

### Why are the changes needed?
The release notes as follows:

- https://github.com/dropwizard/metrics/releases/tag/v4.2.13

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #39026 from LuciferYang/metrics-4213.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…ELATED_REFERENCE`

### What changes were proposed in this pull request?

This PR proposes to rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`.

Also, show `sqlExprs` rather than `treeNode` which is more useful information to users.

### Why are the changes needed?

The sub-error class name is duplicated with its main class, `UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY`.

We should make the all error class name clear and briefly.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

```
./build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*”
```

Closes #38576 from itholic/SPARK-41062.

Lead-authored-by: itholic <[email protected]>
Co-authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…ke the tests to pass with/without ANSI mode

### What changes were proposed in this pull request?

This PR is another followup of #39034 that, instead, make the tests to pass with/without ANSI mode.

### Why are the changes needed?

Spark Connect uses isolated Spark session so setting the configuration in PySpark side does not take an effect. Therefore, the test still fails, see https://github.com/apache/spark/actions/runs/3681383627/jobs/6228030132.

We should make the tests pass with/without ANSI mode for now.

### Does this PR introduce _any_ user-facing change?
No, test-only

### How was this patch tested?

Manually tested via:

```bash
SPARK_ANSI_SQL_MODE=true ./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_column'
```

Closes #39050 from HyukjinKwon/SPARK-41412.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…DataType

### What changes were proposed in this pull request?
1, existing `LiteralExpression` is a mixture of `Literal`, `CreateArray`, `CreateStruct` and `CreateMap`, since we have added collection functions `array`, `struct` and `create_map`, the `CreateXXX` expressions can be replaced with `UnresolvedFunction`;
2, add field `dataType` in `LiteralExpression`, so we can specify the DataType if needed, a special case is the typed null;
3, it is up to the `lit` function to infer the DataType, not `LiteralExpression` itself;

### Why are the changes needed?
Refactor LiteralExpression to support DataType

### Does this PR introduce _any_ user-facing change?
No, `LiteralExpression` is a internal class, should not expose to end users

### How was this patch tested?
added UT

Closes #39047 from zhengruifeng/connect_lit_datatype.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
….sql.tests.connect.test_connect_column

### What changes were proposed in this pull request?

This PR is a followup of #39047 which import `BinaryType` that's removed in #39050. This was a logical conflict.

### Why are the changes needed?

To recover the build.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually verified by running `./dev/lint-python`.

Closes #39055 from HyukjinKwon/SPARK-41506-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…s into one file

### What changes were proposed in this pull request?

Move the two rules `WrapLateralColumnAliasReference` and `ResolveLateralColumnAliasReference` into one file `ResolveLateralColumnAlias.scala`, instead of one rule in `Analyzer.scala` and the other one in another file.
Also update the code comments.

### Why are the changes needed?

Found this issue during reviewing #39040
This refactor should make the code easier to read and review.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT

Closes #39054 from gengliangwang/refactorLCA.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
…UM_ARGS.WITHOUT_SUGGESTION`

### What changes were proposed in this pull request?
This pr introduces sub-classes of `WRONG_NUM_ARGS`:

- WITHOUT_SUGGESTION
- WITH_SUGGESTION

then replace existing  `WRONG_NUM_ARGS` to `WRONG_NUM_ARGS.WITH_SUGGESTION` and rename error class `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION`

### Why are the changes needed?
Proper names of error classes to improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add new test case

Closes #38940 from LuciferYang/legacy-1043.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…e/disable JSON partial results

### What changes were proposed in this pull request?

This PR adds a SQL config `spark.sql.json.enablePartialResults` to control SPARK-40646 change. This allows us to fall back to the behaviour before the change.

It was observed that SPARK-40646 could cause a performance regression for deeply nested schemas. I, however, could not reproduce the regression with Apache Spark JSON benchmarks (maybe we need to extend them, I can do it as a follow-up). Regardless, I propose to add a SQL config to have an ability to disable the change in case of performance degradation during JSON parsing.

Benchmark results are attached to the JIRA ticket.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

SQL config `spark.sql.json.enablePartialResults` is added to control the behaviour of SPARK-40646 JSON partial results parsing. Users can disable the feature if they find any performance regressions when reading JSON files.

### How was this patch tested?

I extended existing unit tests to test with flag enabled and disabled.

Closes #38784 from sadikovi/add-flag-json-parsing.

Authored-by: Ivan Sadikov <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…oc and revise config name

### What changes were proposed in this pull request?

This PR aims to add `PVC-oriented executor pod allocation` section to K8s documentation.
To be consistent with the existing two configurations, I revise the configuration like the following.
```
- spark.kubernetes.driver.waitToReusePersistentVolumeClaims
+ spark.kubernetes.driver.waitToReusePersistentVolumeClaim
```

### Why are the changes needed?

To document new feature.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review since this is a documentation.

Closes #39058 from dongjoon-hyun/SPARK-41514.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@huangxiaopingRD huangxiaopingRD merged commit 0fd1f85 into huangxiaopingRD:master Dec 14, 2022
pull bot pushed a commit that referenced this pull request Apr 29, 2024
… spark docker image

### What changes were proposed in this pull request?
The pr aims to update the packages name removed in building the spark docker image.

### Why are the changes needed?
When our default image base was switched from `ubuntu 20.04` to `ubuntu 22.04`, the unused installation package in the base image has changed, in order to eliminate some warnings in building images and free disk space more accurately, we need to correct it.

Before:
```
#35 [29/31] RUN apt-get remove --purge -y     '^aspnet.*' '^dotnet-.*' '^llvm-.*' 'php.*' '^mongodb-.*'     snapd google-chrome-stable microsoft-edge-stable firefox     azure-cli google-cloud-sdk mono-devel powershell libgl1-mesa-dri || true
#35 0.489 Reading package lists...
#35 0.505 Building dependency tree...
#35 0.507 Reading state information...
#35 0.511 E: Unable to locate package ^aspnet.*
#35 0.511 E: Couldn't find any package by glob '^aspnet.*'
#35 0.511 E: Couldn't find any package by regex '^aspnet.*'
#35 0.511 E: Unable to locate package ^dotnet-.*
#35 0.511 E: Couldn't find any package by glob '^dotnet-.*'
#35 0.511 E: Couldn't find any package by regex '^dotnet-.*'
#35 0.511 E: Unable to locate package ^llvm-.*
#35 0.511 E: Couldn't find any package by glob '^llvm-.*'
#35 0.511 E: Couldn't find any package by regex '^llvm-.*'
#35 0.511 E: Unable to locate package ^mongodb-.*
#35 0.511 E: Couldn't find any package by glob '^mongodb-.*'
#35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed
#35 0.511 Package 'php' is not installed, so not removed
#35 0.511 : Couldn't find any package by regex '^mongodb-.*'
#35 0.511 E: Unable to locate package snapd
#35 0.511 E: Unable to locate package google-chrome-stable
#35 0.511 E: Unable to locate package microsoft-edge-stable
#35 0.511 E: Unable to locate package firefox
#35 0.511 E: Unable to locate package azure-cli
#35 0.511 E: Unable to locate package google-cloud-sdk
#35 0.511 E: Unable to locate package mono-devel
#35 0.511 E: Unable to locate package powershell
#35 DONE 0.5s

#36 [30/31] RUN apt-get autoremove --purge -y
#36 0.063 Reading package lists...
#36 0.079 Building dependency tree...
#36 0.082 Reading state information...
#36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
#36 DONE 0.4s
```

After:
```
#38 [32/36] RUN apt-get remove --purge -y     'gfortran-11' 'humanity-icon-theme' 'nodejs-doc' || true
#38 0.066 Reading package lists...
#38 0.087 Building dependency tree...
#38 0.089 Reading state information...
#38 0.094 The following packages were automatically installed and are no longer required:
#38 0.094   at-spi2-core bzip2-doc dbus-user-session dconf-gsettings-backend
#38 0.095   dconf-service gsettings-desktop-schemas gtk-update-icon-cache
#38 0.095   hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data
#38 0.095   libatspi2.0-0 libbz2-dev libcairo-gobject2 libcolord2 libdconf1 libepoxy0
#38 0.095   libgfortran-11-dev libgtk-3-common libjs-highlight.js libllvm11
#38 0.095   libncurses-dev libncurses5-dev libphobos2-ldc-shared98 libreadline-dev
#38 0.095   librsvg2-2 librsvg2-common libvte-2.91-common libwayland-client0
#38 0.095   libwayland-cursor0 libwayland-egl1 libxdamage1 libxkbcommon0
#38 0.095   session-migration tilix-common xkb-data
#38 0.095 Use 'apt autoremove' to remove them.
#38 0.096 The following packages will be REMOVED:
#38 0.096   adwaita-icon-theme* gfortran* gfortran-11* humanity-icon-theme* libgtk-3-0*
#38 0.096   libgtk-3-bin* libgtkd-3-0* libvte-2.91-0* libvted-3-0* nodejs-doc*
#38 0.096   r-base-dev* tilix* ubuntu-mono*
#38 0.248 0 upgraded, 0 newly installed, 13 to remove and 0 not upgraded.
#38 0.248 After this operation, 99.6 MB disk space will be freed.
...
(Reading database ... 70597 files and directories currently installed.)
#38 0.304 Removing r-base-dev (4.1.2-1ubuntu2) ...
#38 0.319 Removing gfortran (4:11.2.0-1ubuntu1) ...
#38 0.340 Removing gfortran-11 (11.4.0-1ubuntu1~22.04) ...
#38 0.356 Removing tilix (1.9.4-2build1) ...
#38 0.377 Removing libvted-3-0:amd64 (3.10.0-1ubuntu1) ...
#38 0.392 Removing libvte-2.91-0:amd64 (0.68.0-1) ...
#38 0.407 Removing libgtk-3-bin (3.24.33-1ubuntu2) ...
#38 0.422 Removing libgtkd-3-0:amd64 (3.10.0-1ubuntu1) ...
#38 0.436 Removing nodejs-doc (12.22.9~dfsg-1ubuntu3.4) ...
#38 0.457 Removing libgtk-3-0:amd64 (3.24.33-1ubuntu2) ...
#38 0.488 Removing ubuntu-mono (20.10-0ubuntu2) ...
#38 0.754 Removing humanity-icon-theme (0.6.16) ...
#38 1.362 Removing adwaita-icon-theme (41.0-1ubuntu1) ...
#38 1.537 Processing triggers for libc-bin (2.35-0ubuntu3.6) ...
#38 1.566 Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
#38 1.577 Processing triggers for libglib2.0-0:amd64 (2.72.4-0ubuntu2.2) ...
(Reading database ... 56946 files and directories currently installed.)
#38 1.645 Purging configuration files for libgtk-3-0:amd64 (3.24.33-1ubuntu2) ...
#38 1.657 Purging configuration files for ubuntu-mono (20.10-0ubuntu2) ...
#38 1.670 Purging configuration files for humanity-icon-theme (0.6.16) ...
#38 1.682 Purging configuration files for adwaita-icon-theme (41.0-1ubuntu1) ...
#38 DONE 1.7s

#39 [33/36] RUN apt-get autoremove --purge -y
#39 0.061 Reading package lists...
#39 0.075 Building dependency tree...
#39 0.077 Reading state information...
#39 0.083 The following packages will be REMOVED:
#39 0.083   at-spi2-core* bzip2-doc* dbus-user-session* dconf-gsettings-backend*
#39 0.083   dconf-service* gsettings-desktop-schemas* gtk-update-icon-cache*
#39 0.083   hicolor-icon-theme* libatk-bridge2.0-0* libatk1.0-0* libatk1.0-data*
#39 0.083   libatspi2.0-0* libbz2-dev* libcairo-gobject2* libcolord2* libdconf1*
#39 0.083   libepoxy0* libgfortran-11-dev* libgtk-3-common* libjs-highlight.js*
#39 0.083   libllvm11* libncurses-dev* libncurses5-dev* libphobos2-ldc-shared98*
#39 0.083   libreadline-dev* librsvg2-2* librsvg2-common* libvte-2.91-common*
#39 0.083   libwayland-client0* libwayland-cursor0* libwayland-egl1* libxdamage1*
#39 0.083   libxkbcommon0* session-migration* tilix-common* xkb-data*
#39 0.231 0 upgraded, 0 newly installed, 36 to remove and 0 not upgraded.
#39 0.231 After this operation, 124 MB disk space will be freed.
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46258 from panbingkun/remove_packages_on_ubuntu.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
pull bot pushed a commit that referenced this pull request Jul 21, 2025
…ingBuilder`

### What changes were proposed in this pull request?

This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression.

### Why are the changes needed?

Since Java 9, `String Concatenation` has been handled better by default.

| ID | DESCRIPTION |
| - | - |
| JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) |

For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly.

**CODE CHANGE**
```java

- return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-   .append("appId", appId)
-   .append("execId", execId)
-   .append("blockIds", Arrays.toString(blockIds))
-   .toString();
+ return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" +
+     Arrays.toString(blockIds) + "]";
```

**BEFORE**
```
  public java.lang.String toString();
    Code:
       0: new           #39                 // class org/apache/commons/lang3/builder/ToStringBuilder
       3: dup
       4: aload_0
       5: getstatic     #41                 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle;
       8: invokespecial #47                 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V
      11: ldc           #50                 // String appId
      13: aload_0
      14: getfield      #7                  // Field appId:Ljava/lang/String;
      17: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      20: ldc           #55                 // String execId
      22: aload_0
      23: getfield      #13                 // Field execId:Ljava/lang/String;
      26: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      29: ldc           #56                 // String blockIds
      31: aload_0
      32: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      35: invokestatic  #57                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      38: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      41: invokevirtual #61                 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String;
      44: areturn
```

**AFTER**
```
  public java.lang.String toString();
    Code:
       0: aload_0
       1: getfield      #7                  // Field appId:Ljava/lang/String;
       4: aload_0
       5: getfield      #13                 // Field execId:Ljava/lang/String;
       8: aload_0
       9: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      12: invokestatic  #39                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      15: invokedynamic #43,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      20: areturn
```

### Does this PR introduce _any_ user-facing change?

No. This is an `toString` implementation improvement.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51572 from dongjoon-hyun/SPARK-52880.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.