Skip to content

[HUDI-8771] Fix Incorrect classification of input paths in InputPathHandler and HoodieInputFormatUtils#12495

Merged
wombatu-kun merged 1 commit intoapache:masterfrom
skyshineb:hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler
Dec 18, 2024
Merged

[HUDI-8771] Fix Incorrect classification of input paths in InputPathHandler and HoodieInputFormatUtils#12495
wombatu-kun merged 1 commit intoapache:masterfrom
skyshineb:hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler

Conversation

@skyshineb
Copy link
Copy Markdown
Contributor

@skyshineb skyshineb commented Dec 16, 2024

Change Logs

  • Fix Incorrect classification of input paths in InputPathHandler and HoodieInputFormatUtils
  • HoodieInputFormatUtils.groupSnapshotPathsByMetaClient() was not strict enough and allowed multiple ownership of paths. forEach() usage could potentially make multiple metaClient associations for a single path.
  • Remove not used HoodieInputFormatUtils.groupFileStatusForSnapshotPaths() since the logic that called this method were refactored in HUDI-3094 and this responsibility moved to InputPathHandler

Impact

Reduce ambiguity of input paths classification

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@skyshineb skyshineb marked this pull request as draft December 16, 2024 09:36
@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Dec 16, 2024
@skyshineb skyshineb force-pushed the hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler branch from e62aa6e to 35fc0a7 Compare December 16, 2024 10:02
@skyshineb skyshineb marked this pull request as ready for review December 16, 2024 10:03
@github-actions github-actions Bot added size:S PR with lines of changes in (10, 100] and removed size:M PR with lines of changes in (100, 300] labels Dec 16, 2024
Option<HoodieTableMetaClient> matchedMetaClient = Option.fromJavaOptional(metaClientList.stream()
.filter(metaClient -> {
String basePathStr = metaClient.getBasePath().toString();
return inputPathStr.equals(basePathStr) || inputPathStr.startsWith(basePathStr + "/"); })
Copy link
Copy Markdown
Contributor

@wombatu-kun wombatu-kun Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do the same fix for checks in groupFileStatusForSnapshotPaths (461 and 463 lines)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The groupFileStatusForSnapshotPaths is not used anywhere. In the HUDI-3094 the HoodieParquetInputFormat.listStatus() which was the caller of this method was removed and related logic was moved to InputPathHandler where it belongs now. So I deleted this method entirely.

@skyshineb skyshineb force-pushed the hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler branch from 35fc0a7 to e5de599 Compare December 18, 2024 04:43
@skyshineb skyshineb force-pushed the hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler branch from e5de599 to 402ecde Compare December 18, 2024 04:58
@wombatu-kun
Copy link
Copy Markdown
Contributor

@hudi-bot run azure

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@wombatu-kun wombatu-kun merged commit 895957d into apache:master Dec 18, 2024
@skyshineb skyshineb deleted the hudi-8771-Incorrect-classification-of-input-paths-in-InputPathHandler branch December 18, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants