[pull] master from apache:master #58

pull · 2024-01-09T08:21:36Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

….3.0 ### What changes were proposed in this pull request? 1. Bump jekyll to 4.3.3. 2. Loosen the dependency spec for jekyll to make updates easier. 3. Don't mention Ruby 1 or 2 in the docs. 4. Don't use `sudo` with `gem` in the docs. ### Why are the changes needed? 1. Jekyll 4.3.2 is [broken on Ruby 3.3.0][1]. Jekyll 4.3.3 [fixes the issue][2]. 2. There is no need to pin Jekyll in the Gemfile since it gets pinned automatically for us in the lock file. This makes updating dependencies via `bundle update` easier. 3. Both Ruby 1 and 2 are [EOL][eol]. We should not use or reference them in the docs. 4. Installing stuff as the superuser is explicitly discouraged by both pip and gem. Pip issues this warning: > WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv And bundler issues this warning: > Don't run Bundler as root. Installing your bundle as root will break this application for all non-root users on this machine. We should not encourage this pattern in our docs. [1]: jekyll/jekyll#9510 [2]: https://github.com/jekyll/jekyll/releases/tag/v4.3.3 [eol]: https://www.ruby-lang.org/en/news/2022/04/12/ruby-2-7-6-released/ ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Building the docs against Ruby 3.2.2 and 3.3.0. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44628 from nchammas/SPARK-46626-jekyll-ruby. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Fixing a copy + paste typo. ### Why are the changes needed? Readbility. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44625 from grundprinzip/fix_typo. Authored-by: Martin Grund <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? This PR aims to use SPDX short identifier as `license`'s `name` field. - https://spdx.org/licenses/Apache-2.0.html ### Why are the changes needed? SPDX short identifier is recommended as `name` field by `Apache Maven`. - https://maven.apache.org/pom.html#Licenses ASF pom file has been using it. This PR aims to match with ASF pom file. - apache/maven-apache-parent#118 - https://github.com/apache/maven-apache-parent/blob/7888bdb8ee653ecc03b5fee136540a607193c240/pom.xml#L46 ``` <name>Apache-2.0</name> ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44631 from dongjoon-hyun/SPARK-46628. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…for a key in options ### What changes were proposed in this pull request? Before SPARK-43529, there was a check from `visitPropertyKeyValues` that throws for null values for option keys. After SPARK-43529, a new function is used to support expressions in options but the new function lose the check. This PR adds the check back. ### Why are the changes needed? Throw exception when a option value is null. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? NO Closes #44615 from amaliujia/fix_create_table_options. Lead-authored-by: Rui Wang <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Default ignoreSurroundingSpaces to true. ### Why are the changes needed? To handle values interspersed between elements better ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44629 from shujingyang-db/IGNORE_SURROUNDING_SPACES. Authored-by: Shujing Yang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

… SqlApiConfHelper ### What changes were proposed in this pull request? This code proposes to introduce a new object named `SqlApiConfHelper` to contain shared code between `SqlApiConf` and `SqlConf`. ### Why are the changes needed? As of now, SqlConf will access some of the variables of SqlApiConf while SqlApiConf also try to initialize SqlConf upon initialization. This PR is to avoid potential circular dependency between SqlConf and SqlApiConf. The shared variables or access to the shared variables are moved to the new `SqlApiConfHelper`. So either SqlApiConf and SqlConf wants to initialize the other side, they will only initialize the same third object. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing UT ### Was this patch authored or co-authored using generative AI tooling? NO Closes #44602 from amaliujia/refactor_sql_api. Authored-by: Rui Wang <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>

This reverts commit 82d0fb4.

… when ANSI mode is on ### What changes were proposed in this pull request? This PR is a followup of #44513 that excludes `Decimal(5, 4)` for `10.34` that cannot be represented with ANSI mode on. ### Why are the changes needed? ANSI build is broken (https://github.com/apache/spark/actions/runs/7455394893/job/20284415710): ``` org.apache.spark.SparkArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE] 10.34 cannot be represented as Decimal(5, 4). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003 == DataFrame == "cast" was called from org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113) at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotChangeDecimalPrecisionError(QueryExecutionErrors.scala:116) at org.apache.spark.sql.errors.QueryExecutionErrors.cannotChangeDecimalPrecisionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test cases should cover. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44632 from HyukjinKwon/SPARK-40876-followup. Lead-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Fix timeline tooltip content on streaming ui ### Why are the changes needed? d3 v7 mouseover event has changed, we shall get the values from the 2nd parameter instead of the 1st one which points to the mouse event. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ![image](https://github.com/apache/spark/assets/8326978/ded6e09e-837d-499c-8fed-21e3f42d8244) Additionally, https://issues.apache.org/jira/browse/SPARK-46631 was created to add UT for drawTimeline. We don't do it in this patch because it needs code refactoring beyond the scope of the current PR. ### Was this patch authored or co-authored using generative AI tooling? no Closes #44633 from yaooqinn/SPARK-46627. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kousuke Saruta <[email protected]>

…xpressions and version() expression ### What changes were proposed in this pull request? This PR moves us a bit closer to removing CodegenFallback class and instead of it relying on RuntimeReplaceable with StaticInvoke. In this PR there are following changes: - Doing StaticInvoke + RuntimeReplaceable against spark version expression. - Adding Unevaluable trait for DateTime expressions. These expressions need to be replaced during analysis anyhow so we explicitly forbid eval from being called. ### Why are the changes needed? Direction is to get away from CodegenFallback. This PR moves us closer to that destination. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44261 from dbatomic/codegenfallback_removal. Lead-authored-by: Aleksandar Tomic <[email protected]> Co-authored-by: Aleksandar Tomic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ffledb.StoreVersion` ### What changes were proposed in this pull request? This pr aims to override `toString` method for `o.a.s.network.shuffledb.StoreVersion` ### Why are the changes needed? Avoid displaying `StoreVersionhashCode` in the `IOException` thrown after the checkVersion check fails in RocksDBProvider/LevelDBProvider, show something like: ``` cannot read state DB with version org.apache.spark.network.shuffledb.StoreVersion1f, incompatible with current version org.apache.spark.network.shuffledb.StoreVersion3e ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44624 from LuciferYang/SPARK-46622. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…ields ### What changes were proposed in this pull request? This fixes a minor bug in literal validation. The contract of `InternalRow` is people should call `isNullAt` instead of relying on the `get` function to return null. `InternalRow` is an abstract class and it's not guaranteed that the `get` function can work for null field. This PR fixes the literal validation to check `isNullAt` before getting the field value. ### Why are the changes needed? Fix bugs for specific `InternalRow` implementations. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44640 from cloud-fan/literal. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Validate XML element name on write. Spark SQL permits spaces in field names and they may even start with number or special characters. These field names cannot be converted to XML element names. This PR adds validation to throw error on non-compliant XML element names. This applies only to XML write. Validation is on by default. User can choose to disable this validation. ### Why are the changes needed? Same as above ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? New unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44634 from sandip-db/SPARK-46630-xml-validate-element-name. Lead-authored-by: Sandip Agarwala <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Enhance the Visual Appeal of Spark doc website after #40269: #### 1. There is a weird indent on the top right side of the first paragraph of the Spark 3.5.0 doc overview page Before this PR <img width="680" alt="image" src="https://github.com/apache/spark/assets/1097932/84d21ca1-a4d0-4bd4-8f20-a34fa5db4000"> After this PR: <img width="1035" alt="image" src="https://github.com/apache/spark/assets/1097932/4ffc0d5a-ed75-44c5-b20a-475ff401afa8"> #### 2. All the titles are too big and therefore less readable. In the website https://spark.apache.org/downloads.html, titles are h2 while in doc site https://spark.apache.org/docs/latest/ titles are h1. So we should make the font size of titles smaller. Before this PR: <img width="935" alt="image" src="https://github.com/apache/spark/assets/1097932/5bbbd9eb-432a-42c0-98be-ff00a9099cd6"> After this PR: <img width="965" alt="image" src="https://github.com/apache/spark/assets/1097932/dc94c1fb-6ac1-41a8-b4a4-19b3034125d7"> #### 3. The banner image can't be displayed correct. Even when it shows up, it will be hover by the text. To make it simple, let's not show the banner image as we did in https://spark.apache.org/docs/3.4.2/ <img width="570" alt="image" src="https://github.com/apache/spark/assets/1097932/f6d34261-a352-44e2-9633-6e96b311a0b3"> <img width="1228" alt="image" src="https://github.com/apache/spark/assets/1097932/c49ce6b6-13d9-4d8f-97a9-7ed8b037be57"> ### Why are the changes needed? Improve the Visual Appeal of Spark doc website ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually build doc and verify on local setup. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44642 from gengliangwang/enhance_doc. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Refactor `data_type_ops` tests again (the previous pr #44592 has been reverted) ### Why are the changes needed? make `OpsTestBase` reusable and reuse it in the parity tests ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44637 from zhengruifeng/ps_test_rere_data_type_ops_again. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Retry `test_map_in_pandas_with_column_vector` and its parity test ### Why are the changes needed? I am seeing this test and its parity test failing from time to time, and then fails `pyspark-sql` and `pyspark-connect`. It seems due to some log4j issue, e.g https://github.com/zhengruifeng/spark/actions/runs/7459243602/job/20294868487 ``` test_map_in_pandas_with_column_vector (pyspark.sql.tests.pandas.test_pandas_map.MapInPandasTests) ... ERROR StatusConsoleListener An exception occurred processing Appender File java.lang.IllegalArgumentException: found 1 argument placeholders, but provided 0 for pattern `0, VisitedIndex{visitedIndexes={}}: [] r:0` at org.apache.logging.log4j.message.ParameterFormatter.formatMessage(ParameterFormatter.java:233) ``` https://github.com/apache/spark/actions/runs/7460093200/job/20297508703 ``` test_map_in_pandas_with_column_vector (pyspark.sql.tests.connect.test_parity_pandas_map.MapInPandasParityTests) ... ERROR StatusConsoleListener An exception occurred processing Appender File java.lang.IllegalArgumentException: found 1 argument placeholders, but provided 0 for pattern `0, VisitedIndex{visitedIndexes={}}: [] r:0` at org.apache.logging.log4j.message.ParameterFormatter.formatMessage(ParameterFormatter.java:233) at org.apache.logging.log4j.message.ParameterizedMessage.formatTo(ParameterizedMessage.java:266) at ``` this PR simply attempt to retry it after failures ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44641 from zhengruifeng/py_test_retry_mip. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Add [custom Jekyll tags][custom] to enable us to conditionally include files in our documentation build in a more user-friendly manner. [This example][example] demonstrates how a custom tag can build on one of Jekyll's built-in tags. [custom]: https://github.com/Shopify/liquid/wiki/Liquid-for-Programmers#create-your-own-tags [example]: Shopify/liquid#370 (comment) Without this change, files have to be included as follows: ```liquid {% for static_file in site.static_files %} {% if static_file.name == 'generated-agg-funcs-table.html' %} {% include_relative generated-agg-funcs-table.html %} {% break %} {% endif %} {% endfor %} ``` With this change, they can be included more intuitively in one of two ways: ```liquid {% include_relative_if_exists generated-agg-funcs-table.html %} {% include_api_gen generated-agg-funcs-table.html %} ``` `include_relative_if_exists` includes a file if it exists and substitutes an HTML comment if not. Use this tag when it's always OK for an include not to exist. `include_api_gen` includes a file if it exists. If it doesn't, it tolerates the missing file only if one of the `SKIP_` flags is set. Otherwise it raises an error. Use this tag for includes that are generated for the language APIs. These files are required to generate complete documentation, but we tolerate their absence during development---i.e. when a skip flag is set. `include_api_gen` will place a visible text placeholder in the document and post a warning to the console to indicate that missing API files are being tolerated. ```sh $ SKIP_API=1 bundle exec jekyll build Configuration file: /Users/nchammas/dev/nchammas/spark/docs/_config.yml Source: /Users/nchammas/dev/nchammas/spark/docs Destination: /Users/nchammas/dev/nchammas/spark/docs/_site Incremental build: disabled. Enable with --incremental Generating... Warning: Tolerating missing API files because the following skip flags are set: SKIP_API done in 1.703 seconds. Auto-regeneration: disabled. Use --watch to enable. ``` This PR supersedes #44393. ### Why are the changes needed? Jekyll does not have a succinct way to [check if a file exists][check], so the required directives to implement such functionality are very cumbersome. We need the ability to do this so that we can [build the docs successfully with `SKIP_API=1`][build], since many includes reference files that are only generated when `SKIP_API` is _not_ set. [check]: jekyll/jekyll#7528 [build]: #44627 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually building and reviewing the docs, both with and without `SKIP_API=1`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44630 from nchammas/SPARK-46437-conditional-jekyll-include. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? This PR fixes a bug in Avro connector with regard to zero-length blocks. If a file contains one of these blocks, the Avro connector may return an incorrect number of records or even an empty DataFrame in some cases. This was due to the way the `hasNextRow` check worked. `hasNext` method in Avro loads the next block so if the block is empty, it would return false and Avro connector will stop reading rows. However, we should continue checking the next block instead until the sync point. ### Why are the changes needed? Fixes a correctness bug in Avro connector. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a unit test and a generated sample file to verify the fix. Without the patch, reading such file would return fewer records or 0 compared to the actual number (depends on the maxPartitionBytes config). ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44635 from sadikovi/SPARK-46633. Authored-by: Ivan Sadikov <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? This PR tweaks the docs build so that the general docs are first built with `SKIP_API=1` to ensure that the docs build works without any language being built beforehand. ### Why are the changes needed? [Committers expect][1] docs to build with `SKIP_API=1` on a fresh checkout. Yet, our CI build does not ensure this. This PR corrects this gap. [1]: https://github.com/apache/spark/pull/44393/files#r1444169083 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Via test commits against this PR. [The build now fails][f] if the docs reference an include that has not been generated yet. [f]: https://github.com/nchammas/spark/actions/runs/7450949388/job/20271048581#step:30:29 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44627 from nchammas/skip-api-docs-build. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…alue ### What changes were proposed in this pull request? This PR proposes to fix `Series.astype` to work properly with missing value. ### Why are the changes needed? To follow the behavior of latest Pandas. ### Does this PR introduce _any_ user-facing change? Yes, the bug is fixed to follow the behavior of Pandas: **Before** ```python >>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)]) >>> psser.astype(bool) 0 True 1 True 2 False dtype: bool ``` **After** ```python >>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)]) >>> psser.astype(bool) 0 True 1 True 2 True dtype: bool ``` ### How was this patch tested? Enable the existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44570 from itholic/SPARK-37039. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? This PR adds license header to `docs/_plugins` files. ### Why are the changes needed? To comply Apache License 2.0 ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Existing CI should verify it e.g., linter. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44647 from HyukjinKwon/minor-license. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…t compression ### What changes were proposed in this pull request? This PR aims to fix ORC tests to be independent from the change of default ORC compression. ### Why are the changes needed? Currently, a few test cases have implicit assumption. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44648 from dongjoon-hyun/SPARK-46643. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…f join ### What changes were proposed in this pull request? fix the logic of ambiguous column detection in spark connect ### Why are the changes needed? ``` In [24]: df1 = spark.range(10).withColumn("a", sf.lit(0)) In [25]: df2 = df1.withColumnRenamed("a", "b") In [26]: df1.join(df2, df1["a"] == df2["b"]) Out[26]: 23/12/22 09:33:28 ERROR ErrorUtils: Spark Connect RPC error during: analyze. UserId: ruifeng.zheng. SessionId: eaa2161f-4b64-4dbf-9809-af6b696d3005. org.apache.spark.sql.AnalysisException: [AMBIGUOUS_COLUMN_REFERENCE] Column a is ambiguous. It's because you joined several DataFrame together, and some of these DataFrames are the same. This column points to one of the DataFrame but Spark is unable to figure out which one. Please alias the DataFrames with different names via DataFrame.alias before joining them, and specify the column using qualified name, e.g. df.alias("a").join(df.alias("b"), col("a.id") > col("b.id")). SQLSTATE: 42702 at org.apache.spark.sql.catalyst.analysis.ColumnResolutionHelper.findPlanById(ColumnResolutionHelper.scala:555) at ``` ### Does this PR introduce _any_ user-facing change? yes, fix a bug ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? no Closes #44532 from zhengruifeng/sql_connect_find_plan_id. Lead-authored-by: Ruifeng Zheng <[email protected]> Co-authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…her file formats ### What changes were proposed in this pull request? This PR aims to improve `TPCDSQueryBenchmark` to support other file formats. ### Why are the changes needed? Currently, `parquet` is a hard-coded because it's the default value of `spark.sql.sources.default`. https://github.com/apache/spark/blob/48d22e9f876f070d35ff3dd011bfbd1b6bccb4ac/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala#L77 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual. **BEFORE** ``` $ build/sbt "sql/Test/runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /tmp/tpcds-sf-1-orc-snappy/" ... [info] 18:36:39.698 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) [info] java.lang.RuntimeException: file:/tmp/tpcds-sf-1-orc-snappy/catalog_page/part-00000-40446d2a-f814-4e26-b3e1-664b833bf041-c000.snappy.orc is not a Parquet file. Expected magic number at tail, but found [79, 82, 67, 25] [info] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:565) ... ``` **AFTER** ``` $ JDK_JAVA_OPTIONS='-Dspark.sql.sources.default=orc' \ build/sbt "sql/Test/runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /tmp/tpcds-sf-1-orc-snappy/" ... [info] Running benchmark: TPCDS Snappy [info] Running case: q1 [info] Stopped after 6 iterations, 2028 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q1 305 338 24 1.5 660.4 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? Closes #44651 from dongjoon-hyun/SPARK-46646. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Allow group by on columns of type CalendarInterval ### Why are the changes needed? Currently, Spark GROUP BY only allows orderable data types, otherwise the plan analysis fails: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala#L197-L203 However, this is too strict as GROUP BY only cares about equality, not ordering. The CalendarInterval type is not orderable (1 month and 30 days, we don't know which one is larger), but has well-defined equality. In fact, we already support `SELECT DISTINCT calendar_interval_type` in some cases (when hash aggregate is picked by the planner). ### Does this PR introduce _any_ user-facing change? Yes, users will now be able to do group by on columns of type CalendarInterval ### How was this patch tested? By adding new UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes #44538 from stefankandic/SPARK-46536-groupby-calendarInterval. Lead-authored-by: Stefan Kandic <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? This PR proposes to exclude unittest-xml-reporting in Python 3.12 image ### Why are the changes needed? `unittest-xml-reporting` seems not supporting, and this seems hiding the real error: ``` File "/__w/spark/spark/python/pyspark/streaming/tests/test_kinesis.py", line 118, in <module> unittest.main(testRunner=testRunner, verbosity=2) File "/usr/lib/python3.12/unittest/main.py", line 105, in __init__ self.runTests() File "/usr/lib/python3.12/unittest/main.py", line 281, in runTests self.result = testRunner.run(self.test) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/xmlrunner/runner.py", line 67, in run test(result) File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.12/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.12/unittest/case.py", line 692, in __call__ return self.run(*args, **kwds) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/unittest/case.py", line 662, in run result.stopTest(self) File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 327, in stopTest self.callback() File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 235, in callback test_info.test_finished() File "/usr/local/lib/python3.12/dist-packages/xmlrunner/result.py", line 180, in test_finished self.test_result.stop_time - self.test_result.start_time ^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: '_XMLTestResult' object has no attribute 'start_time'. Did you mean: 'stop_time'? ``` This is optional dependency in testing so we can exclude this (see https://github.com/apache/spark/actions/runs/7462843546/job/20306214215) ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR should test it out. ### Was this patch authored or co-authored using generative AI tooling? Np. Closes #44652 from HyukjinKwon/SPARK-46645. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…dently ### What changes were proposed in this pull request? This PR proposes to split PyPy 3 and Python 3.10 builds ### Why are the changes needed? https://github.com/apache/spark/actions/runs/7462843546/job/20306241275 Seems like it terminates in the middle because of OOM. we should split ### Does this PR introduce _any_ user-facing change? No, dev-only ### How was this patch tested? CI should verify the change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44655 from HyukjinKwon/SPARK-46649. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? Closes #44342 from Aiden-Dong/aiden-dev. Lead-authored-by: aiden <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…NTILE_DISC ### What changes were proposed in this pull request? This PR will translate the aggregate function `PERCENTILE_CONT` and `PERCENTILE_DISC` for pushdown. - This PR adds `Expression[] orderingWithinGroups` into `GeneralAggregateFunc`, so as DS V2 pushdown framework could compile the `WITHIN GROUP (ORDER BY ...)` easily. - This PR also split `visitInverseDistributionFunction` from `visitAggregateFunction`, so as DS V2 pushdown framework could generate the syntax `WITHIN GROUP (ORDER BY ...)` easily. - This PR also fix a bug that `JdbcUtils` can't treat the precision and scale of decimal returned from JDBC. ### Why are the changes needed? DS V2 supports push down `PERCENTILE_CONT` and `PERCENTILE_DISC`. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44397 from beliefer/SPARK-46442. Lead-authored-by: Jiaan Geng <[email protected]> Co-authored-by: beliefer <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? This PR aims to use `zstd` as the default ORC compression. Note that Apache ORC v2.0 also uses `zstd` as the default compression via [ORC-1577](https://issues.apache.org/jira/browse/ORC-1577). The following was the presentation about the usage of ZStandard. - _The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro_ - [Slides](https://www.slideshare.net/databricks/the-rise-of-zstandard-apache-sparkparquetorcavro) - [Youtube](https://youtu.be/dTGxhHwjONY) ### Why are the changes needed? In general, `ZStandard` is better in terms of the file size. ``` $ aws s3 ls s3://dongjoon/orc2/tpcds-sf-10-orc-snappy/ --recursive --summarize --human-readable | tail -n1 Total Size: 2.8 GiB $ aws s3 ls s3://dongjoon/orc2/tpcds-sf-10-orc-zstd/ --recursive --summarize --human-readable | tail -n1 Total Size: 2.4 GiB ``` As a result, the performance is also better in general in the cloud storage . ``` $ JDK_JAVA_OPTIONS='-Dspark.sql.sources.default=orc' \ build/sbt "sql/Test/runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location s3a://dongjoon/orc2/tpcds-sf-1-orc-snappy" ... [info] Running benchmark: TPCDS Snappy [info] Running case: q1 [info] Stopped after 2 iterations, 5712 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q1 2708 2856 210 0.2 5869.3 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q2 [info] Stopped after 2 iterations, 7006 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q2 3424 3503 113 0.7 1533.9 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q3 [info] Stopped after 2 iterations, 6577 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q3 3146 3289 202 0.9 1059.0 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q4 [info] Stopped after 2 iterations, 36228 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q4 17592 18114 738 0.3 3375.5 1.0X ... ``` ``` $ JDK_JAVA_OPTIONS='-Dspark.sql.sources.default=orc' \ build/sbt "sql/Test/runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location s3a://dongjoon/orc2/tpcds-sf-1-orc-zstd" [info] Running benchmark: TPCDS Snappy [info] Running case: q1 [info] Stopped after 2 iterations, 5235 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q1 2496 2618 172 0.2 5409.7 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q2 [info] Stopped after 2 iterations, 6765 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q2 3338 3383 63 0.7 1495.6 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q3 [info] Stopped after 2 iterations, 5882 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q3 2820 2941 172 1.1 949.1 1.0X [info] Running benchmark: TPCDS Snappy [info] Running case: q4 [info] Stopped after 2 iterations, 32925 ms [info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.3 [info] Apple M1 Max [info] TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------------------------------ [info] q4 16315 16463 208 0.3 3130.5 1.0X ... ``` ### Does this PR introduce _any_ user-facing change? Yes, the default ORC compression is changed. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44654 from dongjoon-hyun/SPARK-46648. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Split `FrameTakeTests` ### Why are the changes needed? for testing parallelism ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44656 from zhengruifeng/ps_test_split_take. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? As part of #44630 I neglected to update some places that still use the following Liquid directive pattern: ```liquid {% for static_file in site.static_files %} {% if static_file.name == 'generated-agg-funcs-table.html' %} {% include_relative generated-agg-funcs-table.html %} {% break %} {% endif %} {% endfor %} ``` This PR replaces all remaining instances of this pattern with the new `include_api_gen` Jekyll tag. ### Why are the changes needed? For consistency in how we build our docs. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually building and reviewing the configuration docs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44663 from nchammas/configuration-include-api-gen. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…sv/to_csv` ### What changes were proposed in this pull request? This pr refine docstring of `from_csv/schema_of_csv/to_csv` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44639 from LuciferYang/csv-functions. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…benchmark case name ### What changes were proposed in this pull request? This PR aims to remove a hard-coded compression codec name from `benchmark case name` in `TPCDSQueryBenchmark`. ### Why are the changes needed? `GenTPCDSData` can generate dataset with the other compression codecs than `snappy`. So, we had better remove the hard-coded `Snappy` because it's misleading. ``` $ JDK_JAVA_OPTIONS='spark.sql.parquet.compression.codec=zstd' \ build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData --dsdgenDir /Users/dongjoon/DATA/tpcds-kit/tools --location /Users/dongjoon/DATA/tpcds-sf-1-zstd --scaleFactor 1 --numPartitions 1" ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44657 from dongjoon-hyun/SPARK-46652. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…avoid deadlock between maintenance thread and streaming aggregation operator ### What changes were proposed in this pull request? Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator ### Why are the changes needed? This change fixes a race condition that causes a deadlock between the task thread and the maintenance thread. This is primarily only possible with the streaming aggregation operator. In this case, we use 2 physical operators - `StateStoreRestoreExec` and `StateStoreSaveExec`. The first one opens the store in read-only mode and the 2nd one does the actual commit. However, the following sequence of events creates an issue 1. Task thread runs the `StateStoreRestoreExec` and gets the store instance and thereby the DB instance lock 2. Maintenance thread fails with an error for some reason 3. Maintenance thread takes the `loadedProviders` lock and tries to call `close` on all the loaded providers 4. Task thread tries to execute the StateStoreRDD for the `StateStoreSaveExec` operator and tries to acquire the `loadedProviders` lock which is held by the thread above So basically if the maintenance thread is interleaved between the `restore/save` operations, there is a deadlock condition based on the `loadedProviders` lock and the DB instance lock. The fix proposes to simply release the resources at the end of the `StateStoreRestoreExec` operator (note that `abort` for `ReadStateStore` is likely a misnomer - but we choose to follow the already provided API in this case) Relevant Logs: Link - https://github.com/anishshri-db/spark/actions/runs/7356847259/job/20027577445?pr=4 ``` 2023-12-27T09:59:02.6362466Z 09:59:02.635 WARN org.apache.spark.sql.execution.streaming.state.StateStore: Error in maintenanceThreadPool 2023-12-27T09:59:02.6365616Z java.io.FileNotFoundException: File file:/home/runner/work/spark/spark/target/tmp/spark-8ef51f34-b9de-48f2-b8df-07e14599b4c9/state/0/1 does not exist 2023-12-27T09:59:02.6367861Z at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:733) 2023-12-27T09:59:02.6369383Z at org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:177) 2023-12-27T09:59:02.6370693Z at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:571) 2023-12-27T09:59:02.6371781Z at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1940) 2023-12-27T09:59:02.6372876Z at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1936) 2023-12-27T09:59:02.6373967Z at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) 2023-12-27T09:59:02.6375104Z at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1942) 2023-12-27T09:59:02.6376676Z 09:59:02.636 WARN org.apache.spark.sql.execution.streaming.state.StateStore: Error running maintenance thread 2023-12-27T09:59:02.6379079Z java.io.FileNotFoundException: File file:/home/runner/work/spark/spark/target/tmp/spark-8ef51f34-b9de-48f2-b8df-07e14599b4c9/state/0/1 does not exist 2023-12-27T09:59:02.6381083Z at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:733) 2023-12-27T09:59:02.6382490Z at org.apache.hadoop.fs.DelegateToFileSystem.listStatus(DelegateToFileSystem.java:177) 2023-12-27T09:59:02.6383816Z at org.apache.hadoop.fs.ChecksumFs.listStatus(ChecksumFs.java:571) 2023-12-27T09:59:02.6384875Z at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1940) 2023-12-27T09:59:02.6386294Z at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1936) 2023-12-27T09:59:02.6387439Z at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) 2023-12-27T09:59:02.6388674Z at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1942) ... 2023-12-27T10:01:02.4292831Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m- changing schema of state when restarting query - state format version 2 (RocksDBStateStore) *** FAILED *** (2 minutes)�[0m�[0m 2023-12-27T10:01:02.4295311Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m Timed out waiting for stream: The code passed to failAfter did not complete within 120 seconds.�[0m�[0m 2023-12-27T10:01:02.4297271Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m java.base/java.lang.Thread.getStackTrace(Thread.java:1619)�[0m�[0m 2023-12-27T10:01:02.4299084Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:277)�[0m�[0m 2023-12-27T10:01:02.4300948Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)�[0m�[0m ... 2023-12-27T10:01:02.6474472Z 10:01:02.646 WARN org.apache.spark.sql.execution.streaming.state.RocksDB StateStoreId(opId=0,partId=0,name=default): Error closing RocksDB 2023-12-27T10:01:02.6482792Z org.apache.spark.SparkException: [CANNOT_LOAD_STATE_STORE.UNRELEASED_THREAD_ERROR] An error occurred during loading state. StateStoreId(opId=0,partId=0,name=default): RocksDB instance could not be acquired by [ThreadId: Some(1858)] as it was not released by [ThreadId: Some(3835), task: partition 0.0 in stage 513.0, TID 1369] after 120009 ms. 2023-12-27T10:01:02.6488483Z Thread holding the lock has trace: app//org.apache.spark.sql.execution.streaming.state.StateStore$.getStateStoreProvider(StateStore.scala:577) 2023-12-27T10:01:02.6490896Z app//org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:565) 2023-12-27T10:01:02.6493072Z app//org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:128) 2023-12-27T10:01:02.6494915Z app//org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 2023-12-27T10:01:02.6496232Z app//org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 2023-12-27T10:01:02.6497655Z app//org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 2023-12-27T10:01:02.6499153Z app//org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 2023-12-27T10:01:02.6556758Z 10:01:02.654 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 513.0 (TID 1369) (localhost executor driver): TaskKilled (Stage cancelled: [SPARK_JOB_CANCELLED] Job 260 cancelled part of cancelled job group cf26288c-0158-48ce-8a86-00a596dd45d8 SQLSTATE: XXKDA) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests ``` [info] Run completed in 6 minutes, 20 seconds. [info] Total number of tests run: 80 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 80, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? Yes Closes #44542 from anishshri-db/task/SPARK-46547. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>

…ead pool ### What changes were proposed in this pull request? This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, `"qtp"+super.hashCode()`. https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64 ### Why are the changes needed? This is helpful during JVM investigation. **BEFORE (4.0.0-preview2)** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28217 | grep qtp "qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable [0x0000000314592000] "qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable [0x000000031479e000] "qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable [0x00000003149aa000] "qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable [0x0000000314bb6000] "qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable [0x0000000314dc2000] "qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable [0x0000000314fce000] "qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable [0x00000003151da000] "qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable [0x00000003153e6000] "qtp1925630411-60-acceptor-0108e9815-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable [0x00000003155f2000] "qtp1925630411-61-acceptor-11d90f2aa-ServerConnector1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition [0x00000003157fe000] ``` **AFTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ jstack 28317 | grep StandaloneRestServer "StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable [0x000000032cfce000] "StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable [0x000000032d1da000] "StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable [0x000000032d3e6000] "StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable [0x000000032d5f2000] "StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable [0x000000032d7fe000] "StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable [0x000000032da0a000] "StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable [0x000000032dc16000] "StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable [0x000000032de22000] "StandaloneRestServer-60-acceptor-04aefbaa8-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable [0x000000032e02e000] "StandaloneRestServer-61-acceptor-148976251-ServerConnector44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition [0x000000032e23a000] ``` ### Does this PR introduce _any_ user-facing change? No, the thread names are accessed during the debugging. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48924 from dongjoon-hyun/SPARK-50385. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: panbingkun <[email protected]>

nchammas and others added 8 commits January 9, 2024 09:14

Revert "[SPARK-46474][INFRA] Upgrade upload-artifact action to v4"

f4c8dd6

This reverts commit 82d0fb4.

github-actions bot added INFRA BUILD SQL DOCS CONNECT labels Jan 9, 2024

pull bot added ⤵️ pull and removed INFRA BUILD SQL DOCS CONNECT labels Jan 9, 2024

github-actions bot added INFRA BUILD SQL DOCS CONNECT WEB UI labels Jan 9, 2024

dbatomic and others added 2 commits January 9, 2024 19:20

github-actions bot added the CORE label Jan 9, 2024

github-actions bot added the PYTHON label Jan 10, 2024

gengliangwang and others added 2 commits January 10, 2024 09:32

github-actions bot added the PANDAS API ON SPARK label Jan 10, 2024

zhengruifeng and others added 3 commits January 10, 2024 09:38

github-actions bot added the AVRO label Jan 10, 2024

nchammas and others added 17 commits January 10, 2024 09:48

github-actions bot added the STRUCTURED STREAMING label Jan 10, 2024

pull bot merged commit f7b0b45 into huangxiaopingRD:master Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from apache:master #58

[pull] master from apache:master #58

Uh oh!

pull bot commented Jan 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

[pull] master from apache:master #58

[pull] master from apache:master #58

Uh oh!

Conversation

pull bot commented Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

pull bot commented Jan 9, 2024 •

edited

Loading