Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Jul 13, 2025

What changes were proposed in this pull request?

This PR adds TIME type support to Spark Connect.

Why are the changes needed?

TIME type is new in 4.1 and Spark Connect should support it.

Does this PR introduce any user-facing change?

Yes, it adds basic TIME type support.

How was this patch tested?

Added new UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@MaxGekk
Copy link
Member

MaxGekk commented Jul 13, 2025

@dengziming Could you review this PR since you are working on similar one: #51462

Copy link
Member

@dengziming dengziming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth for this PR, I think we should add a test case for this change, and I don't think your test case makes sense, I duplicated it locally and it run smoothly even without this change, take a look at the image below.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make precision optional analogous to Decimal.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @peter-toth .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the PR description looks wrong to me.

# How was this patch tested?

Manually:

scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
|       12:13:14|
+---------------+

As reported on the JIRA issue, show is handled by differently and works already. collect was the problem.

$ bin/spark-connect-shell --remote sc://localhost:15002
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-preview1
      /_/

Type in expressions to have them evaluated.
Spark connect server version 4.1.0-preview1.
Spark session available as 'spark'.

scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
|       12:13:14|
+---------------+

scala> sql("SELECT TIME '12:13:14'").collect()
org.apache.spark.SparkException: org.apache.spark.sql.connect.common.InvalidPlanInput: [INTERNAL_ERROR] Does not support convert time(6) to connect proto types. SQLSTATE: XX000

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case like the above comment.

@dengziming
Copy link
Member

The difference is that we need to return the result schema which is represented by DataType, when calling show the schema is converted to StringType so it will work well, whereas when calling collect we need TimeType, this can't be tested by the tests I provided before.

-- sql("SELECT TIME '12:13:14'").show()
root {
  show_string {
    input {
      sql {
        query: "SELECT TIME \'12:13:14\'"
      }
    }
    num_rows: 20
    truncate: 20
    vertical: true
  }
}

-- sql("SELECT TIME '12:13:14'").collect()
root {
  sql {
    query: "SELECT TIME \'12:13:14\'"
  }
}

@peter-toth
Copy link
Contributor Author

peter-toth commented Jul 14, 2025

@dongjoon-hyun, ahh indeed, I ran the wrong test. The proper type support needs a bit more changes.

The difference is that we need to return the result schema which is represented by DataType, when calling show the schema is converted to StringType so it will work well, whereas when calling collect we need TimeType, this can't be tested by the tests I provided before.

@dengziming, that's expected.

@peter-toth peter-toth force-pushed the SPARK-52770-suppot-time-in-connect-proto branch from 68a1563 to 66361ef Compare July 14, 2025 12:11
@peter-toth peter-toth force-pushed the SPARK-52770-suppot-time-in-connect-proto branch from 66361ef to b16793b Compare July 14, 2025 13:18
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the PR, @peter-toth .

@peter-toth
Copy link
Contributor Author

peter-toth commented Jul 14, 2025

Thank you for updating the PR, @peter-toth .

Sure. I will add some more tests and adjust the description as well, but I don't see yet what would be the best place for the sql("SELECT TIME '12:13:14'").collect() test.

// UnparsedDataType
Unparsed unparsed = 24;

Time time = 28;
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding. Do you know where is 25, 26, 27, @peter-toth ?

Copy link
Contributor Author

@peter-toth peter-toth Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately these are not in order. A few lines up:

  Variant variant = 25;

and down:

  // Reserved for geometry and geography types
  reserved 26, 27;

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side. Thank you, @peter-toth .

cc @MaxGekk , @grundprinzip , @HyukjinKwon

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accessor = new TimestampAccessor(timeStampMicroTZVector);
} else if (vector instanceof TimeStampMicroVector timeStampMicroVector) {
accessor = new TimestampNTZAccessor(timeStampMicroVector);
} else if (vector instanceof TimeNanoVector timeNanoVector) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a stupid question, when is these method used? why should we change arrow related method when adding connect dataType?

Copy link
Contributor Author

@peter-toth peter-toth Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't a stupid question. Arrow is used for streaming data between connect server and clients: (https://spark.apache.org/docs/latest/spark-connect-overview.html#how-spark-connect-works)

checkAnswer(df, (0 until 6).map(i => Row(i)))
}

test("SPARK-52770: Support Time type in connect") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test seems good, but the test name seems a bit overkill; "Time type" is OK enough.

Copy link
Member

@dengziming dengziming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name nitpick, it's OK to leave it as it is.

@peter-toth
Copy link
Contributor Author

Merged to master (4.1.0).

Thanks @dongjoon-hyun, @HyukjinKwon, @MaxGekk and @dengziming for the review!

@dongjoon-hyun
Copy link
Member

Thank you, @peter-toth and all!

dongjoon-hyun added a commit to apache/spark-connect-swift that referenced this pull request Oct 1, 2025
…th `4.1.0-preview2`

### What changes were proposed in this pull request?

This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`.

### Why are the changes needed?

There are many changes from Apache Spark 4.1.0.

- apache/spark#52342
- apache/spark#52256
- apache/spark#52271
- apache/spark#52242
- apache/spark#51473
- apache/spark#51653
- apache/spark#52072
- apache/spark#51561
- apache/spark#51563
- apache/spark#51489
- apache/spark#51507
- apache/spark#51462
- apache/spark#51464
- apache/spark#51442

To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`.

```
$ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git
$ cd spark/sql/connect/common/src/main/protobuf/
$ protoc --swift_out=. spark/connect/*.proto
$ protoc --grpc-swift_out=. spark/connect/*.proto

// Remove empty GRPC files
$ cd spark/connect

$ grep 'This file contained no services' *
catalog.grpc.swift:// This file contained no services.
commands.grpc.swift:// This file contained no services.
common.grpc.swift:// This file contained no services.
example_plugins.grpc.swift:// This file contained no services.
expressions.grpc.swift:// This file contained no services.
ml_common.grpc.swift:// This file contained no services.
ml.grpc.swift:// This file contained no services.
pipelines.grpc.swift:// This file contained no services.
relations.grpc.swift:// This file contained no services.
types.grpc.swift:// This file contained no services.

$ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift
```

### Does this PR introduce _any_ user-facing change?

Pass the CIs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #250 from dongjoon-hyun/SPARK-53777.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants