Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Dec 3, 2025

What changes were proposed in this pull request?

This PR adds support for collecting column statistics for the TIME data type to enable optimization for queries involving TIME columns.

Why are the changes needed?

Without statistics collection support for TIME columns, Spark cannot make informed decisions for queries involving TIME predicates and join

Does this PR introduce any user-facing change?

Yes, users can now collect statistics for TIME columns

Before this PR:

CREATE TABLE events (id INT, event_time TIME(6)) USING parquet;
ANALYZE TABLE events COMPUTE STATISTICS FOR COLUMNS event_time;
-- No statistics collected, optimizer cannot use TIME column info

After this PR:

CREATE TABLE events (id INT, event_time TIME(6)) USING parquet;
ANALYZE TABLE events COMPUTE STATISTICS FOR COLUMNS event_time;
-- Statistics collected: min, max, distinct count, null count

DESCRIBE EXTENDED events event_time;
-- Shows:
-- min: 08:00:00.000000
-- max: 18:00:00.000000
-- num_nulls: 5
-- distinct_count: 1000
-- avg_col_len: 8
-- max_col_len: 8
-- histogram: NULL

Note : histogram: NULL as TIME represents time-of-day, which is fundamentally circular/periodic data, while histograms are designed for linear data with a clear progression.

How was this patch tested?

Added tests in StatisticsCollectionSuite, StatisticsCollectionTestBase, StatisticsSuite

Manual Testing:

-- Test various precisions
CREATE TABLE time_test (
  t0 TIME(0), t1 TIME(1), t2 TIME(2), 
  t3 TIME(3), t4 TIME(4), t5 TIME(5), t6 TIME(6)
) USING parquet;

INSERT INTO time_test VALUES
  (TIME '08:00:00', TIME '08:00:00.1', TIME '08:00:00.12',
   TIME '08:00:00.123', TIME '08:00:00.1234', 
   TIME '08:00:00.12345', TIME '08:00:00.123456');

ANALYZE TABLE time_test COMPUTE STATISTICS FOR COLUMNS 
  t0, t1, t2, t3, t4, t5, t6;

-- Verify all columns have statistics
DESCRIBE EXTENDED time_test t0;
DESCRIBE EXTENDED time_test t6;

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant