Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Nov 2, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

LuciferYang and others added 4 commits November 1, 2022 18:10
…s.xml`

### What changes were proposed in this pull request?
The pr aims to change the suppress files from `sql/core/src/main/java/org/apache/spark/sql/api.java/*` to `sql/core/src/main/java/org/apache/spark/sql/api/java/*`, the former seems to be a wrong code path.

### Why are the changes needed?
Correct the `files` contend in `checkstyle-suppressions.xml`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #38469 from LuciferYang/fix-java-supperessions.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request?

Before this PR,  the `collect()` call will throw an exception to recommend to use `toPandas()`.

With this PR, we can generate a list of PySpark `Row` upon calling `collect()`.

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

UT

Closes #38409 from amaliujia/python_support_collect.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request?
This reverts commit 9fc3aa0.

### Why are the changes needed?
The upgrade breaks `dev/sbt-checkstyle` script, below is the error
```
[error] org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
[error]         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
[error]         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1473)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:914)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
[error]         at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
[error]         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
[error]         at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
[error]         at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
[error]         at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
[error]         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
[error]         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
[error]         at scala.xml.factory.XMLLoader.parse(XMLLoader.scala:73)
[error]         at scala.xml.factory.XMLLoader.loadXML(XMLLoader.scala:54)
[error]         at scala.xml.factory.XMLLoader.loadXML$(XMLLoader.scala:53)
[error]         at scala.xml.XML$.loadXML(XML.scala:62)
[error]         at scala.xml.factory.XMLLoader.loadString(XMLLoader.scala:92)
[error]         at scala.xml.factory.XMLLoader.loadString$(XMLLoader.scala:92)
[error]         at scala.xml.XML$.loadString(XML.scala:62)
[error]         at com.etsy.sbt.checkstyle.Checkstyle$.checkstyle(Checkstyle.scala:35)
[error]         at com.etsy.sbt.checkstyle.CheckstylePlugin$autoImport$.$anonfun$checkstyleTask$1(CheckstylePlugin.scala:36)
[error]         at com.etsy.sbt.checkstyle.CheckstylePlugin$autoImport$.$anonfun$checkstyleTask$1$adapted(CheckstylePlugin.scala:34)
[error]         at scala.Function1.$anonfun$compose$1(Function1.scala:49)
[error]         at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62)
[error]         at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error]         at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error]         at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error]         at sbt.Execute.work(Execute.scala:291)
[error]         at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error]         at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error]         at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error]         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]         at java.lang.Thread.run(Thread.java:748)
```

Closes #38476 from linhongliu-db/fix-sbt.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…itrary columns

### What changes were proposed in this pull request?
Support DataFrame creation from 2d NumPy array with arbitrary columns.

### Why are the changes needed?
Currently, DataFrame creation from 2d ndarray works only with 2 columns. We should provide complete support for DataFrame creation with 2d ndarray.

As part of [SPARK-39405](https://issues.apache.org/jira/browse/SPARK-39405).

### Does this PR introduce _any_ user-facing change?
Yes.
Before
```py
>>> spark.createDataFrame(np.array([[1], [2]])).dtypes
Traceback (most recent call last):
...
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (2, 1), indices imply (2, 2)

>>> spark.createDataFrame(np.array([[1, 1, 1], [2, 2, 2]])).dtypes
Traceback (most recent call last):
...
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (2, 3), indices imply (2, 2)
```

After
```py
>>> spark.createDataFrame(np.array([[1], [2]])).dtypes
[('value', 'bigint')]

>>> spark.createDataFrame(np.array([[1, 1, 1], [2, 2, 2]])).dtypes
[('_1', 'bigint'), ('_2', 'bigint'), ('_3', 'bigint')]
```

### How was this patch tested?
Unit tests.

Closes #38473 from xinrong-meng/ncol_ndarr.

Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…or classes

### What changes were proposed in this pull request?
This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the conditional expressions, includes:

1. If (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L61-L67
2. CaseWhen (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L175-L183
3. InSubquery (2):
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L378-L396
4. In (1):
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L453

### Why are the changes needed?
Migration onto error classes unifies Spark SQL error messages.

### Does this PR introduce _any_ user-facing change?
Yes. The PR changes user-facing error messages.

### How was this patch tested?
1. Add new UT
2. Update existed UT
3. Pass GA

Closes #38438 from panbingkun/SPARK-40748.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
LuciferYang and others added 4 commits November 2, 2022 08:19
### What changes were proposed in this pull request?
A minor change to fix the a Scala related compilation warning

```
[WARNING] /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:105: [deprecation   | origin= | version=2.13.7] Wrap `given` in backticks to use it as an identifier, it will become a keyword in Scala 3.
```

### Why are the changes needed?
Fix a Scala related compilation warning.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Action

Closes #38478 from LuciferYang/minor-wrap-given.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request?
Fix a few wrong or misleading comments in DAGSchedulerSuite.

### Why are the changes needed?
The wrong or misleading comments in DAGSchedulerSuite introduce confusions and make it harder to understanding the code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No code changes, pure comment changes.
Original tests pass.

Closes #38371 from JiexingLi/fix-comments.

Authored-by: JiexingLi <[email protected]>
Signed-off-by: Mridul <mridul<at>gmail.com>
### What changes were proposed in this pull request?

This PR aims to update `cloudpickle` to `v2.2.0` for Apache Spark 3.4.0.

### Why are the changes needed?

SPARK-37457 updated `cloudpickle` v2.0.0 at Apache Spark 3.3.0.

To bring the latest bug fixes.
- https://github.com/cloudpipe/cloudpickle/releases/tag/v2.2.0
- https://github.com/cloudpipe/cloudpickle/releases/tag/2.1.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #38474 from dongjoon-hyun/SPARK-40991.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…lient should have a default value=1

### What changes were proposed in this pull request?

To match existing Python DataFarme API, this PR changes the `Range.step` as required and Python client keep `1` as a default value for this field.

### Why are the changes needed?

Matching existing DataFrame API.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes #38471 from amaliujia/range_step_required.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
@wangyum wangyum merged commit f03fdf9 into wangyum:master Nov 2, 2022
pull bot pushed a commit that referenced this pull request Nov 22, 2024
…ead pool

### What changes were proposed in this pull request?

This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`.

https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64

### Why are the changes needed?

This is helpful during JVM investigation.

**BEFORE (4.0.0-preview2)**

```
$ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh
$ jstack 28217 | grep qtp
"qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable  [0x0000000314592000]
"qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable  [0x000000031479e000]
"qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable  [0x00000003149aa000]
"qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable  [0x0000000314bb6000]
"qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable  [0x0000000314dc2000]
"qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable  [0x0000000314fce000]
"qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable  [0x00000003151da000]
"qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable  [0x00000003153e6000]
"qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable  [0x00000003155f2000]
"qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition  [0x00000003157fe000]
```

**AFTER**
```
$ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh
$ jstack 28317 | grep StandaloneRestServer
"StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable  [0x000000032cfce000]
"StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable  [0x000000032d1da000]
"StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable  [0x000000032d3e6000]
"StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable  [0x000000032d5f2000]
"StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable  [0x000000032d7fe000]
"StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable  [0x000000032da0a000]
"StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable  [0x000000032dc16000]
"StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable  [0x000000032de22000]
"StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable  [0x000000032e02e000]
"StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition  [0x000000032e23a000]
```

### Does this PR introduce _any_ user-facing change?

No, the thread names are accessed during the debugging.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48924 from dongjoon-hyun/SPARK-50385.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: panbingkun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants