forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 1
[pull] master from apache:master #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…Tests`: conversion and drop methods ### What changes were proposed in this pull request? Moving slow tests out of `IndexesTests`: conversion and drop methods ### Why are the changes needed? for testing parallelism ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44536 from zhengruifeng/ps_test_idx_base_conversion_drop. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? In the PR, I propose to create `SparkThrowable` sub-classes only with an error class by making the constructor with `message` private. ### Why are the changes needed? To improve user experience with Spark SQL by unifying error exceptions: the final goal is all Spark exception should contain an error class. ### Does this PR introduce _any_ user-facing change? No since user's code shouldn't throw `SparkThrowable` sub-classes but it can if it depends on error message formats. ### How was this patch tested? By existing test test suites like: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44464 from MaxGekk/ban-messages-SparkThrowable-subclass. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…rray` ### What changes were proposed in this pull request? This pr refine docstring of `get/array_zip/sort_array` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44545 from LuciferYang/array-functions-sort. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…dexesTests` ### What changes were proposed in this pull request? Moving move slow tests out of `IndexesTests` ### Why are the changes needed? for testing parallelism ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44543 from zhengruifeng/ps_test_idx_base_sort_take. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…ffle` ### What changes were proposed in this pull request? This pr refine docstring of `flatten/sequence/shuffle` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44548 from LuciferYang/SPARK-46551. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request? The pr aims to upgrade `slf4j` from 2.0.9 to 2.0.10. ### Why are the changes needed? The release notes as follows: - https://www.slf4j.org/news.html#2.0.10 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44544 from panbingkun/slf4j_api_2010. Authored-by: panbingkun <[email protected]> Signed-off-by: yangjie01 <[email protected]>
…ror for empty fields ### What changes were proposed in this pull request? Make `json_tuple` throw PySparkValueError for empty fields ### Why are the changes needed? Python side should have the same check as the Scala side: https://github.com/apache/spark/blob/fa4096eb6aba4c66f0d9c5dcbabdfc0804064fff/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L6330-L6334 ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? no Closes #44534 from zhengruifeng/py_check_functions. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…View/createOrReplaceTempView ### What changes were proposed in this pull request? This PR proposes to improve the docstring of `DataFrame.createTempView` and `DataFrame.createOrReplaceTempView`. ### Why are the changes needed? For better usability. ### Does this PR introduce _any_ user-facing change? Yes, it improves user-facing documentation. ### How was this patch tested? Manually ran the tests via: ```bash python/run-tests --python-executable=python3 --testnames 'pyspark.sql.dataframe' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44551 from HyukjinKwon/SPARK-46555. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…alTempView/createOrReplaceGlobalTempView ### What changes were proposed in this pull request? This PR proposes to improve the docstring of `DataFrame.createGlobalTempView` and `DataFrame.createOrReplaceGlobalTempView`. ### Why are the changes needed? For better usability. ### Does this PR introduce _any_ user-facing change? Yes, it improves user-facing documentation. ### How was this patch tested? Manually ran the tests via: ```bash python/run-tests --python-executable=python3 --testnames 'pyspark.sql.dataframe' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44552 from HyukjinKwon/SPARK-46556. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ad function outputs named Row objects
### What changes were proposed in this pull request?
This PR fixes an issue when the `read` method of Python DataSourceReader yields named `Row` objects.
Currently, it ignores the name in the Row object:
```Python
def read(self,...):
yield Row(a=1, b=2)
yield Row(b=3, a=2)
```
The result should be `[Row(a=1, b=2), Row(a=2, b=3)]`, instead of `[Row(a=1 , b=2), Row(a=3, b=2)]`.
### Why are the changes needed?
To fix an incorrect behavior.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44531 from allisonwang-db/spark-46540-named-rows.
Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…lain/printSchema ### What changes were proposed in this pull request? This PR proposes to improve the docstring of `DataFrame.schema`, `DataFrame.explain` and `DataFrame.printSchema`. ### Why are the changes needed? For better usability. ### Does this PR introduce _any_ user-facing change? Yes, it improves user-facing documentation. ### How was this patch tested? Manually ran the tests via: ```bash python/run-tests --python-executable=python3 --testnames 'pyspark.sql.dataframe' ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44553 from HyukjinKwon/SPARK-46557. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request?
This PR proposes to issue a `FutureWarning` for `(DataFrame|Series).interpolate` with object dtype.
### Why are the changes needed?
To match the behavior with Pandas. Using object dtype for `interpolate` is deprecated and will raise exception in the future version, so we should issue the proper warning such as Pandas does.
### Does this PR introduce _any_ user-facing change?
Given DataFrame below,
```python
>>> psdf = ps.DataFrame({"A": ['a', 'b', 'c'], "B": [1, 2, 3]})
>>> psdf
A B
0 a 1
1 b 2
2 c 3
```
**Before**
```python
>>> psdf.interpolate() # Excluding column with object dtype without any warning unlike pandas
B
0 1
1 2
2 3
```
**After**
```python
>>> psdf.interpolate() # Issuing a proper warning
FutureWarning: DataFrame.interpolate with object dtype is deprecated and will raise in a future version. Call df.infer_objects(copy=False) before interpolating instead.
warnings.warn(
B
0 1
1 2
2 3
```
### How was this patch tested?
No behavior changes, so the existing CI should pass.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44550 from itholic/SPARK-46553.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED with table stats ### Why are the changes needed? Similar to #40058, make DS v1/v2 command parity, e.g. DESC EXTENDED table | col_name | data_type | comment | |-------------------|---------------------------|------------| | ... | ... | ... | | Statistics | 864 bytes, 2 rows | | | ... | ... | ... | ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add test `describe extended table with stats` ### Was this patch authored or co-authored using generative AI tooling? No Closes #44535 from Zouxxyy/dev/desc-table-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Zouxxyy <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…licate code that retrieves `MessageParameters` from `ErrorParams` in `GrpcExceptionConverter` ### What changes were proposed in this pull request? This pr extract a helper funciton `errorParamsToMessageParameters` to eliminate the duplicate code that retrieves `MessageParameters` from `ErrorParams` in `GrpcExceptionConverter` ### Why are the changes needed? Eliminate the duplicate code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44554 from LuciferYang/SPARK-46558. Authored-by: yangjie01 <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…icks
### What changes were proposed in this pull request?
This pr wrap the `export` in the package name with backticks due to `export` will become keywords in Scala 3.
### Why are the changes needed?
`export` will become keywords in Scala 3, Scala 2.13 compiler will check for relevant cases in the code, but it does not check for cases in the package name. However, if we write a Scala file as follows and compile it with Scala 3,
```scala
package org.apache.spark.mllib.pmml.export
private class ExportABC() {}
```
it will throw an error:
```scala
bin/scalac test.scala -explain
-- [E040] Syntax Error: test.scala:1:36 ----------------------------------------
1 |package org.apache.spark.mllib.pmml.export
| ^^^^^^
| an identifier expected, but 'export' found
|-----------------------------------------------------------------------------
| Explanation (enabled by `-explain`)
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
| If you want to use 'export' as identifier, you may put it in backticks: `export`.
-----------------------------------------------------------------------------
1 error found
```
We can workaround by wrapping `export` with backticks.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44555 from LuciferYang/export-backtick.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
pull bot
pushed a commit
that referenced
this pull request
Nov 22, 2024
…ead pool ### What changes were proposed in this pull request? This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`. https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64 ### Why are the changes needed? This is helpful during JVM investigation. **BEFORE (4.0.0-preview2)** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28217 | grep qtp "qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable [0x0000000314592000] "qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable [0x000000031479e000] "qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable [0x00000003149aa000] "qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable [0x0000000314bb6000] "qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable [0x0000000314dc2000] "qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable [0x0000000314fce000] "qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable [0x00000003151da000] "qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable [0x00000003153e6000] "qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable [0x00000003155f2000] "qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition [0x00000003157fe000] ``` **AFTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28317 | grep StandaloneRestServer "StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable [0x000000032cfce000] "StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable [0x000000032d1da000] "StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable [0x000000032d3e6000] "StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable [0x000000032d5f2000] "StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable [0x000000032d7fe000] "StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable [0x000000032da0a000] "StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable [0x000000032dc16000] "StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable [0x000000032de22000] "StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable [0x000000032e02e000] "StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition [0x000000032e23a000] ``` ### Does this PR introduce _any_ user-facing change? No, the thread names are accessed during the debugging. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48924 from dongjoon-hyun/SPARK-50385. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: panbingkun <[email protected]>
pull bot
pushed a commit
that referenced
this pull request
Jul 21, 2025
…ingBuilder` ### What changes were proposed in this pull request? This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression. ### Why are the changes needed? Since Java 9, `String Concatenation` has been handled better by default. | ID | DESCRIPTION | | - | - | | JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) | For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly. **CODE CHANGE** ```java - return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE) - .append("appId", appId) - .append("execId", execId) - .append("blockIds", Arrays.toString(blockIds)) - .toString(); + return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" + + Arrays.toString(blockIds) + "]"; ``` **BEFORE** ``` public java.lang.String toString(); Code: 0: new #39 // class org/apache/commons/lang3/builder/ToStringBuilder 3: dup 4: aload_0 5: getstatic #41 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle; 8: invokespecial #47 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V 11: ldc #50 // String appId 13: aload_0 14: getfield #7 // Field appId:Ljava/lang/String; 17: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 20: ldc #55 // String execId 22: aload_0 23: getfield #13 // Field execId:Ljava/lang/String; 26: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 29: ldc #56 // String blockIds 31: aload_0 32: getfield #16 // Field blockIds:[Ljava/lang/String; 35: invokestatic #57 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 38: invokevirtual #51 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder; 41: invokevirtual #61 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String; 44: areturn ``` **AFTER** ``` public java.lang.String toString(); Code: 0: aload_0 1: getfield #7 // Field appId:Ljava/lang/String; 4: aload_0 5: getfield #13 // Field execId:Ljava/lang/String; 8: aload_0 9: getfield #16 // Field blockIds:[Ljava/lang/String; 12: invokestatic #39 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String; 15: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 20: areturn ``` ### Does this PR introduce _any_ user-facing change? No. This is an `toString` implementation improvement. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#51572 from dongjoon-hyun/SPARK-52880. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )