Skip to content

Conversation

marcinsbd
Copy link
Contributor

@marcinsbd marcinsbd commented Aug 28, 2025

Description

The aim of this change is to add to the ORC file's footer the calendar type that was used to represent the dates or timestamps. Trino uses the Proleptic Gregorian calendar. The motivation behind is compatibility of that ORC files written by Trino to be properly readable by other tools like Apache Hive that can read ORC files.

Follow-up issue: #26865
Doc entry #26874

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
() Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hive
* Fix ORC files written by Trino to ensure that dates and timestamps older than 1582 are read correctly by Apache Hive. ({issue}`26507`)

@cla-bot cla-bot bot added the cla-signed label Aug 28, 2025
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from f733ee2 to cbb51f9 Compare August 28, 2025 10:51
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for setting the calendar type in ORC file footers for temporal data types. When writing ORC files, the system now detects if temporal types (DATE, TIMESTAMP, TIMESTAMP_INSTANT) are present and sets the appropriate calendar metadata in the footer.

Key changes:

  • Added CalendarKind enum with three calendar types: UNKNOWN_CALENDAR, JULIAN_GREGORIAN, and PROLEPTIC_GREGORIAN
  • Enhanced Footer class to include optional calendar metadata
  • Implemented logic to automatically set PROLEPTIC_GREGORIAN calendar when temporal types are detected during ORC writing

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
CalendarKind.java New enum defining the three supported calendar types
Footer.java Added optional calendar field to store calendar metadata
OrcMetadataReader.java Added conversion from protobuf CalendarKind to Trino CalendarKind
OrcMetadataWriter.java Added conversion from Trino CalendarKind to protobuf and footer writing logic
OrcType.java Added constant set defining temporal ORC types
OrcWriter.java Added logic to detect temporal types and set PROLEPTIC_GREGORIAN calendar
TestOrcWriter.java Added comprehensive tests for calendar footer writing functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from cbb51f9 to f181afe Compare August 28, 2025 11:07
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from f181afe to be4ac5f Compare September 3, 2025 11:10
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from be4ac5f to aa57717 Compare September 15, 2025 13:00
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from aa57717 to bfcee07 Compare September 18, 2025 10:43
Copy link
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. The only point left is the decision around reading, which I expect should not significantly impact the PR’s structure

@marcinsbd
Copy link
Contributor Author

I'm planning to continue work on the issue this week.

@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from 5d27751 to daabbcf Compare October 7, 2025 12:21
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from daabbcf to 593e7f0 Compare October 8, 2025 14:36
@marcinsbd marcinsbd force-pushed the marcinsbd/make-trino-write-calendar-used-hive-orc branch from 593e7f0 to 5c98822 Compare October 9, 2025 06:58
@raunaqmorarka raunaqmorarka merged commit 8711771 into trinodb:master Oct 9, 2025
98 checks passed
@github-actions github-actions bot added this to the 478 milestone Oct 9, 2025
@marcinsbd marcinsbd deleted the marcinsbd/make-trino-write-calendar-used-hive-orc branch October 9, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants