Skip to content

Conversation

@dain
Copy link
Member

@dain dain commented Mar 13, 2023

Description

Do not verify length when determining if a file is split and instead only check that start offset is zero. This is because Hive supports replacing files during a query, and users take advantage of this. This check is still correct, because a split file necessarily has second split that will have a non-zero start offset.

For all native file formats, hardcode the handling of the isSplittable check as it is a behavior of the code and not the file format.

Fixes #16492
Fixes #16510

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Hive
* Fix support for text formats with single header row when the file is split. ({issue}`16492`)
* Fix support for reading text formats when the file is replaced during the query. ({issue}`16510`)

dain added 2 commits March 13, 2023 14:10
For legacy reasons, line oriented files may be replaced during query
execution.  The new file may have a different length, so avoid checking
that the length is consistent.
@dain dain requested a review from electrum March 13, 2023 23:24
@cla-bot cla-bot bot added the cla-signed label Mar 13, 2023
@github-actions github-actions bot added hive Hive connector tests:hive labels Mar 13, 2023
Do not check length when detecting a split file, because Hive allows
files to be replaced during a query.  Instead only the start position is
checked.
Allow a single header line for split files, because this always works
due to the way file split handling works.
@dain dain force-pushed the fix-native-format branch from 9e027c9 to 5c6c871 Compare March 14, 2023 00:23
@dain dain merged commit 16ce73b into trinodb:master Mar 14, 2023
@dain dain deleted the fix-native-format branch March 14, 2023 03:35
@github-actions github-actions bot added this to the 411 milestone Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

Queries fail while using native hive formats CSV headers broken in Hive tables

2 participants