Skip to content

Report error when partition schema and parquet file schema mismatch#11515

Closed
qqibrow wants to merge 1 commit intoprestodb:masterfrom
qqibrow:report_error_when_file_schema_mismatch
Closed

Report error when partition schema and parquet file schema mismatch#11515
qqibrow wants to merge 1 commit intoprestodb:masterfrom
qqibrow:report_error_when_file_schema_mismatch

Conversation

@qqibrow
Copy link
Contributor

@qqibrow qqibrow commented Sep 18, 2018

Current parquet reader will throw internal error when parquet schema and partition schema mismatch, e,g, double in partition schema but float in parquet file. This patch propose a friendly error message for that.

@findepi
Copy link
Contributor

findepi commented Sep 18, 2018

throw internal error when parquet schema and partition schema mismatch, e,g, double in partition schema but float in parquet file

I don't object merging this (helpful message is better than an unhelpuf one), but just to make sure we're on the same page -- we should be able to read those files just fine.
I think #9422 is related.

@findepi findepi requested a review from electrum September 18, 2018 20:24
@qqibrow
Copy link
Contributor Author

qqibrow commented Nov 9, 2018

@findepi I notice that commit hasn't updated for a while. what's the plan? shall we proceed with this commit or close it?

@findepi
Copy link
Contributor

findepi commented Nov 9, 2018

@qqibrow i would love to hear @electrum 's opinion.

btw you may wish to rebase and make sure travis is happy

@qqibrow
Copy link
Contributor Author

qqibrow commented Nov 12, 2018

@findepi sure. will do now.

@guozygeorage
Copy link

guozygeorage commented Nov 16, 2018

Hive's query results are different from Presto's ,
They have the same data source

For exampe , The format of the table is parquet ,
but Presto sql search_word = '童鞋' is no result,
Presto sql search_word liek '童鞋%' have result,
Hive both have result.

I don't know the reason. I hope you can help me solve it. Sometimes this can appear
Thank you very much

Presto version :0211
Hive version: hive-1.1.0-cdh5.10.2、

Hive image
image

Presto image
image

@qqibrow @findepi @facebook-github-bot @zpao

@qqibrow
Copy link
Contributor Author

qqibrow commented Dec 7, 2018

error message is not related to the PR. I don't know how to re-trigger the build though.

The job exceeded the maximum time limit for jobs, and has been terminated.

@nezihyigitbasi
Copy link
Contributor

I restarted the build.

return columnReader.readPrimitive(field);
}
catch (UnsupportedOperationException e) {
throw new ParquetCorruptionException(format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a corruption issue. Sometimes users just alter their tables and hit this issue as well.

try {
return columnReader.readPrimitive(field);
}
catch (UnsupportedOperationException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on the UnsupportedOperationException thrown by the Type classes, it may be cleaner to throw a specific exception (e.g., ParquetSchemaMismatchException or sth like that) in PrimitiveColumnReader::readValue(). What do you think?

@stale
Copy link

stale bot commented Jun 6, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label Jun 6, 2019
@stale stale bot closed this Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants