Skip to content

Conversation

@nsivabalan
Copy link
Contributor

What is the purpose of the pull request

Same as #3590. Added log statements to debug CI failures. Locally not reproducible.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

prashantwason and others added 2 commits September 24, 2021 01:40
…of commits in data timelime.

- Added support to do synchronous updates to metadata table. A write will first get commited to metadata table followed by data table.
- Reader will ignore any commits found only in metadata table while serving files via metadata table.
- Removed option for runtime validation of metadata table against file listing. Validation does not work in several cases especially with multi-writer. So its best to remove it.
- Fixed compactions on the Metadata Table. We cannot perform compaction if there are previous inflight operations on the dataset. This is because a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx.
- Added support for buckets in metadata table partitions.

1. There will be fixed number of buckets for each Metadata Table partition.
2. Buckets are implemented using filenames of format bucket-ABCD where ABCD is the bucket number. This allows easy identification of the files and their order while still keeping the names unique.
3. Buckets are pre-allocated during the time of bootstrap.
4. Currently only "files" partition has 1 bucket. But this building block is required for record-level-index and other indices and so implemented here.

- Do not archive instants on the dataset if they are newer than latest compaction on metadata table.

LogBlocks written to the log file of Metadata Table need to be validated - they are used only if they correspond to a completed action on the dataset.

- Handle Metadata Table upgrade/downgrade by deleting the table and re-bootstrapping. The two versions differ in schema and its complicated to check whether the table is in sync. So its simpler to re-bootstrap as its only the file listing which needs to be re-bootstrapped.

- Multi-writers in data table is also supported with metadata table. Since each operation on metadata table writes to the same files (file-listing partition has a single FileSlice), we can only allow single-writer access to the metadata table. To ensure this, any commit that happens in data table is guarded within the data table lock. Prior to this patch, table services like compaction and clustering was not taking any locks while committing to data table. But with this patch, added the lock.

- Enabling metadata table by default.
@nsivabalan nsivabalan added the status:in-progress Work in progress label Sep 25, 2021
@nsivabalan nsivabalan changed the title [DO_NOT_MERGE][WIP]Sync metadata debug [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug Sep 25, 2021
@hudi-bot
Copy link
Collaborator

hudi-bot commented Sep 25, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan force-pushed the sync_metadata_debug branch 3 times, most recently from 34d27aa to 0597e01 Compare September 25, 2021 19:56
@nsivabalan
Copy link
Contributor Author

@hudi-bot azure run

1 similar comment
@nsivabalan
Copy link
Contributor Author

@hudi-bot azure run

@nsivabalan nsivabalan force-pushed the sync_metadata_debug branch 4 times, most recently from edf9f3a to 6b38fee Compare September 26, 2021 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status:in-progress Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants