Skip to content

Conversation

@danny0405
Copy link
Contributor

…a' is true

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@danny0405 danny0405 force-pushed the HUDI-3678 branch 2 times, most recently from 1f4b599 to d707c69 Compare March 22, 2022 07:15
@danny0405 danny0405 added the priority:blocker Production down; release blocker label Mar 22, 2022
@danny0405 danny0405 force-pushed the HUDI-3678 branch 2 times, most recently from 88b9fed to ab52e9f Compare March 23, 2022 04:20
// do not preserve FILENAME_METADATA_FIELD
recordWithMetadataInSchema.put(FILENAME_METADATA_FIELD_POS, newFilePath.getName());
fileWriter.writeAvro(hoodieRecord.getRecordKey(), recordWithMetadataInSchema);
if (preserveMetadata && useWriterSchema) { // useWriteSchema will be true only in case of compaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason why I had to rewriteRecord is that, indexexRecord did not have meta fields with update path from the caller. Hence passed in the oldRecord from which we can deduce the meta fields.

For eg,

the return value from combineAndGetUpdate(...) does have the meta fields.

not sure if we can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically I am talking about this code snippet in HoodieMergeHandle. combinedAvroRecord below does not contain any meta fields. So, if you remove rewriting w/ meta columns, not sure how that would pan out

  public void write(GenericRecord oldRecord) {
    String key = KeyGenUtils.getRecordKeyFromGenericRecord(oldRecord, keyGeneratorOpt);
    boolean copyOldRecord = true;
    if (keyToNewRecords.containsKey(key)) {
      // If we have duplicate records that we are updating, then the hoodie record will be deflated after
      // writing the first record. So make a copy of the record to be merged
      HoodieRecord<T> hoodieRecord = keyToNewRecords.get(key).newInstance();
      try {
        Option<IndexedRecord> combinedAvroRecord =
            hoodieRecord.getData().combineAndGetUpdateValue(oldRecord,
              useWriterSchema ? tableSchemaWithMetaFields : tableSchema,
                config.getPayloadConfig().getProps());
.
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we read the old records from base file with metadata fields in schema, see:

And for new records in the in-coming dataset, the schema also includes the metadata fields.
The combineAndGetUpdateValue method also handle the records with write schema(schema with metadata fields)

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one clarifying question

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405
Copy link
Contributor Author

The failure is not caused by the patch, so i would just merge it.

@danny0405 danny0405 merged commit 8896864 into apache:master Mar 25, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants