Skip to content

[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication#4417

Merged
yihua merged 9 commits intoapache:masterfrom
onehouseinc:ak/rpath-ref-1
Jan 11, 2022
Merged

[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication#4417
yihua merged 9 commits intoapache:masterfrom
onehouseinc:ak/rpath-ref-1

Conversation

@alexeykudinkin
Copy link
Copy Markdown
Contributor

Tips

What is the purpose of the pull request

Unify Hive's FileInputFormat implementations to avoid unnecessary duplication

Brief change log

  • Extracted HoodieFileInputFormatBase
  • Rebased Parquet, HFile implementations onto HoodieFileInputFormatBase
  • Tidying up

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@alexeykudinkin alexeykudinkin changed the title [WIP][HUDI-3082] Unify Hive's InputFormat implementations to avoid duplication [WIP][HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication Dec 22, 2021
Alexey Kudinkin added 6 commits January 5, 2022 14:40
…ting FS for to be shared across File-format specific implementations

  - Snapshot queries
  - Incremental queries
Killing dead-code
@alexeykudinkin alexeykudinkin changed the title [WIP][HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication [HUDI-3094] Unify Hive's InputFormat implementations to avoid duplication Jan 5, 2022
…en in the super-class by introducing standalone non-overridable `doListStatus` proxying to `FileInputFormat.listStatus`
@yihua yihua self-assigned this Jan 11, 2022
Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I put up a couple of clarification questions.

import java.util.Map;

/**
* TODO java-doc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docs can be put in now :)

Comment thread hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHFileInputFormat.java Outdated
@alexeykudinkin
Copy link
Copy Markdown
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua merged commit 6cdcd89 into apache:master Jan 11, 2022
Copy link
Copy Markdown
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that there are no actual code changes beyond consolidating into hierarchies here?

@UseRecordReaderFromInputFormat
@UseFileSplitsFromInputFormat
public class HoodieParquetInputFormat extends MapredParquetInputFormat implements Configurable {
public class HoodieParquetInputFormat extends HoodieFileInputFormatBase implements Configurable {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a problem. Hive code special cases subclasses of MapredParquetInputFormat for some of the optimizations. So we need to maintain this hierarchy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexeykudinkin
Copy link
Copy Markdown
Contributor Author

@vinothchandar correct, it mostly moves code around to avoid duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants