Skip to content

Conversation

@EnricoMi
Copy link

@EnricoMi EnricoMi commented Mar 7, 2024

What changes were proposed in this pull request?

Add config spark.ui.custom.driver.log.url to Kubernetes resource manager allow setting log url templates simillar to spark.ui.custom.executor.log.url.

Allows for

--conf spark.ui.custom.driver.log.url="https://my-custom.url/api/logs?applicationId={{APP_ID}}

Supports these variables:

  • APP_ID: The unique application id
  • KUBERNETES_NAMESPACE: The namespace where the executor pods run
  • KUBERNETES_POD_NAME: The name of the pod that contains the executor
  • FILE_NAME: The name of the log, which is always "log"

Adds driver pod environment variable SPARK_DRIVER_POD_NAME.

Why are the changes needed?

Running Spark on Kubernetes requires persisting the logs elsewhere. Having the Spark UI link to those logs is very useful. This should be possible for driver logs similar to executor logs.

Does this PR introduce any user-facing change?

Spark UI provides links to logs when run on Kubernetes.

How was this patch tested?

Unit test and manually tested on minikube K8S cluster.

Was this patch authored or co-authored using generative AI tooling?

No

@EnricoMi EnricoMi changed the title K8s custom driver log url Support custom driver log url Mar 7, 2024
@EnricoMi EnricoMi force-pushed the k8s-custom-driver-log-url branch from 37a5a81 to e28de27 Compare March 22, 2024 14:53
@github-actions github-actions bot added the DOCS label Mar 26, 2024
@EnricoMi EnricoMi changed the title Support custom driver log url [SPARK-47573][K8S] Support custom driver log url Mar 26, 2024
@EnricoMi EnricoMi force-pushed the k8s-custom-driver-log-url branch 2 times, most recently from e12ba58 to a3fe442 Compare March 27, 2024 12:15
@EnricoMi EnricoMi force-pushed the k8s-custom-driver-log-url branch from 412e9d0 to 409ce35 Compare April 20, 2024 16:23
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 30, 2024
@EnricoMi EnricoMi force-pushed the k8s-custom-driver-log-url branch from 409ce35 to b0e1cde Compare July 30, 2024 08:07
@github-actions github-actions bot closed this Jul 31, 2024
EnricoMi pushed a commit that referenced this pull request Sep 2, 2025
…ingBuilder`

### What changes were proposed in this pull request?

This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression.

### Why are the changes needed?

Since Java 9, `String Concatenation` has been handled better by default.

| ID | DESCRIPTION |
| - | - |
| JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) |

For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly.

**CODE CHANGE**
```java

- return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-   .append("appId", appId)
-   .append("execId", execId)
-   .append("blockIds", Arrays.toString(blockIds))
-   .toString();
+ return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" +
+     Arrays.toString(blockIds) + "]";
```

**BEFORE**
```
  public java.lang.String toString();
    Code:
       0: new           #39                 // class org/apache/commons/lang3/builder/ToStringBuilder
       3: dup
       4: aload_0
       5: getstatic     #41                 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle;
       8: invokespecial #47                 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V
      11: ldc           #50                 // String appId
      13: aload_0
      14: getfield      #7                  // Field appId:Ljava/lang/String;
      17: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      20: ldc           #55                 // String execId
      22: aload_0
      23: getfield      #13                 // Field execId:Ljava/lang/String;
      26: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      29: ldc           #56                 // String blockIds
      31: aload_0
      32: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      35: invokestatic  #57                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      38: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      41: invokevirtual #61                 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String;
      44: areturn
```

**AFTER**
```
  public java.lang.String toString();
    Code:
       0: aload_0
       1: getfield      #7                  // Field appId:Ljava/lang/String;
       4: aload_0
       5: getfield      #13                 // Field execId:Ljava/lang/String;
       8: aload_0
       9: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      12: invokestatic  #39                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      15: invokedynamic #43,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      20: areturn
```

### Does this PR introduce _any_ user-facing change?

No. This is an `toString` implementation improvement.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51572 from dongjoon-hyun/SPARK-52880.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
EnricoMi pushed a commit that referenced this pull request Sep 2, 2025
…onicalized expressions

### What changes were proposed in this pull request?

Make PullOutNonDeterministic use canonicalized expressions to dedup group and  aggregate expressions. This affects pyspark udfs in particular. Example:

```
from pyspark.sql.functions import col, avg, udf

pythonUDF = udf(lambda x: x).asNondeterministic()

spark.range(10)\
.selectExpr("id", "id % 3 as value")\
.groupBy(pythonUDF(col("value")))\
.agg(avg("id"), pythonUDF(col("value")))\
.explain(extended=True)
```

Currently results in a plan like this:

```
Aggregate [_nondeterministic#15](#15), [_nondeterministic#15 AS dummyNondeterministicUDF(value)#12, avg(id#0L) AS avg(id)#13, dummyNondeterministicUDF(value#6L)#8 AS dummyNondeterministicUDF(value)#14](#15%20AS%20dummyNondeterministicUDF(value)#12,%20avg(id#0L)%20AS%20avg(id)#13,%20dummyNondeterministicUDF(value#6L)#8%20AS%20dummyNondeterministicUDF(value)#14)
+- Project [id#0L, value#6L, dummyNondeterministicUDF(value#6L)#7 AS _nondeterministic#15](#0L,%20value#6L,%20dummyNondeterministicUDF(value#6L)#7%20AS%20_nondeterministic#15)
   +- Project [id#0L, (id#0L % cast(3 as bigint)) AS value#6L](#0L,%20(id#0L%20%%20cast(3%20as%20bigint))%20AS%20value#6L)
      +- Range (0, 10, step=1, splits=Some(2))
```

and then it throws:

```
[[MISSING_AGGREGATION] The non-aggregating expression "value" is based on columns which are not participating in the GROUP BY clause. Add the columns or the expression to the GROUP BY, aggregate the expression, or use "any_value(value)" if you do not care which of the values within a group is returned. SQLSTATE: 42803
```

- how canonicalized fixes this:
  -  nondeterministic PythonUDF expressions always have distinct resultIds per udf
  - The fix is to canonicalize the expressions when matching. Canonicalized means that we're setting the resultIds to -1, allowing us to dedup the PythonUDF expressions.
- for deterministic UDFs, this rule does not apply and "Post Analysis" batch extracts and deduplicates the expressions, as expected

### Why are the changes needed?

- the output of the query with the fix applied still makes sense - the nondeterministic UDF is invoked only once, in the project.

### Does this PR introduce _any_ user-facing change?

Yes, it's additive, it enables queries to run that previously threw errors.

### How was this patch tested?

- added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52061 from benrobby/adhoc-fix-pull-out-nondeterministic.

Authored-by: Ben Hurdelhey <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
EnricoMi pushed a commit that referenced this pull request Nov 3, 2025
…building

### What changes were proposed in this pull request?

This PR aims to add `libwebp-dev` to fix `dev/infra/Dockerfile` building.

### Why are the changes needed?

To fix `build_infra_images_cache` GitHub Action job
- https://github.com/apache/spark/actions/workflows/build_infra_images_cache.yml

<img width="545" height="88" alt="Screenshot 2025-11-02 at 14 56 19" src="https://github.com/user-attachments/assets/f70d6093-6574-40f3-a097-ba5c9086f3c1" />

The root cause is identical with other Dockerfile failure.
```
#13 578.4 -------------------------- [ERROR MESSAGE] ---------------------------
#13 578.4 <stdin>:1:10: fatal error: ft2build.h: No such file or directory
#13 578.4 compilation terminated.
#13 578.4 --------------------------------------------------------------------
#13 578.4 ERROR: configuration failed for package 'ragg'
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. Especially, `Cache base image` test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52840 from dongjoon-hyun/SPARK-54141.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants