forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] master from apache:master #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…s.xml` ### What changes were proposed in this pull request? The pr aims to change the suppress files from `sql/core/src/main/java/org/apache/spark/sql/api.java/*` to `sql/core/src/main/java/org/apache/spark/sql/api/java/*`, the former seems to be a wrong code path. ### Why are the changes needed? Correct the `files` contend in `checkstyle-suppressions.xml` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #38469 from LuciferYang/fix-java-supperessions. Authored-by: yangjie01 <[email protected]> Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request? Before this PR, the `collect()` call will throw an exception to recommend to use `toPandas()`. With this PR, we can generate a list of PySpark `Row` upon calling `collect()`. ### Why are the changes needed? Improve API coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38409 from amaliujia/python_support_collect. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? This reverts commit 9fc3aa0. ### Why are the changes needed? The upgrade breaks `dev/sbt-checkstyle` script, below is the error ``` [error] org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true. [error] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) [error] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) [error] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) [error] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) [error] at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1473) [error] at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:914) [error] at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) [error] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505) [error] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842) [error] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771) [error] at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) [error] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) [error] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643) [error] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327) [error] at scala.xml.factory.XMLLoader.parse(XMLLoader.scala:73) [error] at scala.xml.factory.XMLLoader.loadXML(XMLLoader.scala:54) [error] at scala.xml.factory.XMLLoader.loadXML$(XMLLoader.scala:53) [error] at scala.xml.XML$.loadXML(XML.scala:62) [error] at scala.xml.factory.XMLLoader.loadString(XMLLoader.scala:92) [error] at scala.xml.factory.XMLLoader.loadString$(XMLLoader.scala:92) [error] at scala.xml.XML$.loadString(XML.scala:62) [error] at com.etsy.sbt.checkstyle.Checkstyle$.checkstyle(Checkstyle.scala:35) [error] at com.etsy.sbt.checkstyle.CheckstylePlugin$autoImport$.$anonfun$checkstyleTask$1(CheckstylePlugin.scala:36) [error] at com.etsy.sbt.checkstyle.CheckstylePlugin$autoImport$.$anonfun$checkstyleTask$1$adapted(CheckstylePlugin.scala:34) [error] at scala.Function1.$anonfun$compose$1(Function1.scala:49) [error] at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:62) [error] at sbt.std.Transform$$anon$4.work(Transform.scala:68) [error] at sbt.Execute.$anonfun$submit$2(Execute.scala:282) [error] at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23) [error] at sbt.Execute.work(Execute.scala:291) [error] at sbt.Execute.$anonfun$submit$1(Execute.scala:282) [error] at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265) [error] at sbt.CompletionService$$anon$2.call(CompletionService.scala:64) [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [error] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [error] at java.lang.Thread.run(Thread.java:748) ``` Closes #38476 from linhongliu-db/fix-sbt. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…itrary columns ### What changes were proposed in this pull request? Support DataFrame creation from 2d NumPy array with arbitrary columns. ### Why are the changes needed? Currently, DataFrame creation from 2d ndarray works only with 2 columns. We should provide complete support for DataFrame creation with 2d ndarray. As part of [SPARK-39405](https://issues.apache.org/jira/browse/SPARK-39405). ### Does this PR introduce _any_ user-facing change? Yes. Before ```py >>> spark.createDataFrame(np.array([[1], [2]])).dtypes Traceback (most recent call last): ... raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") ValueError: Shape of passed values is (2, 1), indices imply (2, 2) >>> spark.createDataFrame(np.array([[1, 1, 1], [2, 2, 2]])).dtypes Traceback (most recent call last): ... raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") ValueError: Shape of passed values is (2, 3), indices imply (2, 2) ``` After ```py >>> spark.createDataFrame(np.array([[1], [2]])).dtypes [('value', 'bigint')] >>> spark.createDataFrame(np.array([[1, 1, 1], [2, 2, 2]])).dtypes [('_1', 'bigint'), ('_2', 'bigint'), ('_3', 'bigint')] ``` ### How was this patch tested? Unit tests. Closes #38473 from xinrong-meng/ncol_ndarr. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…or classes ### What changes were proposed in this pull request? This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the conditional expressions, includes: 1. If (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L61-L67 2. CaseWhen (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L175-L183 3. InSubquery (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L378-L396 4. In (1): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L453 ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? 1. Add new UT 2. Update existed UT 3. Pass GA Closes #38438 from panbingkun/SPARK-40748. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? A minor change to fix the a Scala related compilation warning ``` [WARNING] /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:105: [deprecation | origin= | version=2.13.7] Wrap `given` in backticks to use it as an identifier, it will become a keyword in Scala 3. ``` ### Why are the changes needed? Fix a Scala related compilation warning. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Action Closes #38478 from LuciferYang/minor-wrap-given. Authored-by: yangjie01 <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? Fix a few wrong or misleading comments in DAGSchedulerSuite. ### Why are the changes needed? The wrong or misleading comments in DAGSchedulerSuite introduce confusions and make it harder to understanding the code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No code changes, pure comment changes. Original tests pass. Closes #38371 from JiexingLi/fix-comments. Authored-by: JiexingLi <[email protected]> Signed-off-by: Mridul <mridul<at>gmail.com>
### What changes were proposed in this pull request? This PR aims to update `cloudpickle` to `v2.2.0` for Apache Spark 3.4.0. ### Why are the changes needed? SPARK-37457 updated `cloudpickle` v2.0.0 at Apache Spark 3.3.0. To bring the latest bug fixes. - https://github.com/cloudpipe/cloudpickle/releases/tag/v2.2.0 - https://github.com/cloudpipe/cloudpickle/releases/tag/2.1.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #38474 from dongjoon-hyun/SPARK-40991. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…lient should have a default value=1 ### What changes were proposed in this pull request? To match existing Python DataFarme API, this PR changes the `Range.step` as required and Python client keep `1` as a default value for this field. ### Why are the changes needed? Matching existing DataFrame API. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes #38471 from amaliujia/range_step_required. Authored-by: Rui Wang <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
pull bot
pushed a commit
that referenced
this pull request
Nov 22, 2024
…ead pool ### What changes were proposed in this pull request? This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`. https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64 ### Why are the changes needed? This is helpful during JVM investigation. **BEFORE (4.0.0-preview2)** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28217 | grep qtp "qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable [0x0000000314592000] "qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable [0x000000031479e000] "qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable [0x00000003149aa000] "qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable [0x0000000314bb6000] "qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable [0x0000000314dc2000] "qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable [0x0000000314fce000] "qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable [0x00000003151da000] "qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable [0x00000003153e6000] "qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable [0x00000003155f2000] "qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition [0x00000003157fe000] ``` **AFTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28317 | grep StandaloneRestServer "StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable [0x000000032cfce000] "StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable [0x000000032d1da000] "StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable [0x000000032d3e6000] "StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable [0x000000032d5f2000] "StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable [0x000000032d7fe000] "StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable [0x000000032da0a000] "StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable [0x000000032dc16000] "StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable [0x000000032de22000] "StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable [0x000000032e02e000] "StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition [0x000000032e23a000] ``` ### Does this PR introduce _any_ user-facing change? No, the thread names are accessed during the debugging. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48924 from dongjoon-hyun/SPARK-50385. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: panbingkun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )