Skip to content

Add missing serde/inputFormat combination for Hive#18989

Merged
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/add_missing_serde_input_format_combination_for_hive
Sep 18, 2023
Merged

Add missing serde/inputFormat combination for Hive#18989
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/add_missing_serde_input_format_combination_for_hive

Conversation

@homar
Copy link
Copy Markdown
Member

@homar homar commented Sep 11, 2023

Description

Support for this was accidentally removed in scope of #18556 (424)

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Add back support for ParquetHiveSerDe with TextInputFormat. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 11, 2023
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 11, 2023

This combination was working before so we should keep supporting it.

add information which release or which code change affected it

( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

please feel in

@github-actions github-actions bot added tests:hive hive Hive connector labels Sep 11, 2023
@homar homar force-pushed the homar/add_missing_serde_input_format_combination_for_hive branch from d827ea9 to f7054ed Compare September 11, 2023 09:19
@homar homar self-assigned this Sep 11, 2023
Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for reads, but what happens if you attempt to write to such a table? I'm worried that this could silently create corrupted data that is not readable. I'd like to understand writing behavior for such a table with

  • Current Trino
  • Older version of Trino (before the change that prompted this fix)
  • Hive
  • Spark

@homar
Copy link
Copy Markdown
Member Author

homar commented Sep 11, 2023

This works for reads, but what happens if you attempt to write to such a table? I'm worried that this could silently create corrupted data that is not readable. I'd like to understand writing behavior for such a table with

  • Current Trino
  • Older version of Trino (before the change that prompted this fix)
  • Hive
  • Spark

Current trino/previous trino can't write to it as it is a non managed hive table.
Spark as you can see from tests can read and write.
Hive seems not to able to write to it.

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 11, 2023

Current trino/previous trino can't write to it as it is a non managed hive table.

there is a toggle to enable writes to non-managed tables.

@electrum
Copy link
Copy Markdown
Member

When you write from Spark, are the files written in Parquet format?

@electrum
Copy link
Copy Markdown
Member

The test seems to only read with Spark, not write. Can you also read and write with Hive?

@homar homar force-pushed the homar/add_missing_serde_input_format_combination_for_hive branch 3 times, most recently from 2fa9650 to 4a556da Compare September 12, 2023 15:51
@homar
Copy link
Copy Markdown
Member Author

homar commented Sep 12, 2023

@electrum sorry, my bad.
I modified the test.
Now it writes and reads with spark and it validates that the created file is parquet.
Reading and writing with hive doesn't really work, I also added code to validate that.
Writing with trino doesn't work with new version and it didn't work with the old. It fails on this line

throw new TrinoException(HIVE_UNSUPPORTED_FORMAT, format("Output format %s with SerDe %s is not supported", outputFormat, serde));

@homar homar force-pushed the homar/add_missing_serde_input_format_combination_for_hive branch from 4a556da to eefc5c8 Compare September 12, 2023 16:25
Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming the behavior. This change looks good. Can you also add a test to verify that writing fails with Trino? I'm assuming something similar to the Hive insert would work.

@homar homar force-pushed the homar/add_missing_serde_input_format_combination_for_hive branch from 5674d53 to 3789811 Compare September 12, 2023 21:31
@homar homar force-pushed the homar/add_missing_serde_input_format_combination_for_hive branch from 3789811 to c664648 Compare September 13, 2023 07:55
@homar
Copy link
Copy Markdown
Member Author

homar commented Sep 14, 2023

@electrum can you merge this ? I don't have the button

@findepi findepi merged commit 37579bb into trinodb:master Sep 18, 2023
@github-actions github-actions bot added this to the 427 milestone Sep 18, 2023
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 18, 2023

Release notes are required, with the following suggested text:

# Section
* Add back support for ParquetHiveSerDe with TextInputFormat. ({issue}`issuenumber`)

or maybe "[Hive] Fix reading from certain Parquet tables incorrectly created by Spark"

cc @colebow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

3 participants