Skip to content

Fix minor bugs on querying Hudi MOR tables#17477

Merged
arunthirupathi merged 1 commit intoprestodb:masterfrom
7c00:bugfix-hudi-mor
Mar 17, 2022
Merged

Fix minor bugs on querying Hudi MOR tables#17477
arunthirupathi merged 1 commit intoprestodb:masterfrom
7c00:bugfix-hudi-mor

Conversation

@7c00
Copy link
Copy Markdown
Member

@7c00 7c00 commented Mar 16, 2022

This PR fixed two minor bugs on querying Hudi MOR tables.

  • Make HudiRealtimeSplitConverter work with MOR splits without log files
  • Fix HadoopExtendedFileSystem to support MOR reader constructor

It is extracted from #17463.

Test plan - Unit tests.

== NO RELEASE NOTE ==

@7c00
Copy link
Copy Markdown
Member Author

7c00 commented Mar 16, 2022

@codope @pratyakshsharma Hi, would you like to take a review on this bugfix?

Copy link
Copy Markdown
Contributor

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find @7c00 !
Would it be possible to add a test for this scenario? Possibly in TestCustomSplitConversionUtils

@7c00
Copy link
Copy Markdown
Member Author

7c00 commented Mar 16, 2022

Good find @7c00 ! Would it be possible to add a test for this scenario? Possibly in TestCustomSplitConversionUtils

Sure. I will do that.

@7c00 7c00 force-pushed the bugfix-hudi-mor branch from bfc3854 to b353355 Compare March 16, 2022 10:14
@7c00
Copy link
Copy Markdown
Member Author

7c00 commented Mar 16, 2022

rerun the test

@7c00 7c00 force-pushed the bugfix-hudi-mor branch from b353355 to 2301c7c Compare March 16, 2022 12:46
@7c00
Copy link
Copy Markdown
Member Author

7c00 commented Mar 17, 2022

@codope I haved added the tests. Could you take a second review?

Copy link
Copy Markdown
Contributor

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @7c00 for quick revert. One small question below.

@Override
public String getScheme()
{
return fs.getScheme();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this API?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we dont fix this method, MOR tables could not be queried. I added the intergration tests in #17463.
Following is part of the error messages:

Caused by: java.lang.UnsupportedOperationException: Not implemented by the HadoopExtendedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:219)
at org.apache.hudi.common.fs.FSUtils.isGCSFileSystem(FSUtils.java:592)
at org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:119)
at org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:95)
at org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:86)
at org.apache.hudi.common.table.log.HoodieLogFormat.newReader(HoodieLogFormat.java:282)
at org.apache.hudi.common.table.log.LogReaderUtils.readSchemaFromLogFileInReverse(LogReaderUtils.java:49)
at org.apache.hudi.common.table.log.LogReaderUtils.readLatestSchemaFromLogFiles(LogReaderUtils.java:77)
at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:85)
at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:323)
at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:272)
at com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:81)
at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
at com.facebook.presto.hive.HivePageSourceProvider.getPageSourceFromCursorProvider(HivePageSourceProvider.java:571)

You can check the full error log at https://github.com/prestodb/presto/runs/5566744421?check_suite_focus=true.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, please review the PR at #17463

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@codope
Copy link
Copy Markdown
Contributor

codope commented Mar 17, 2022

@arunthirupathi Could you please take a quick pass? Would be good to have it in the upcoming release.

Copy link
Copy Markdown

@arunthirupathi arunthirupathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please edit the description and title to reflect what bug it fixes ?

@7c00
Copy link
Copy Markdown
Member Author

7c00 commented Mar 17, 2022

@arunthirupathi I have updated the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants