Skip to content

Conversation

@wzx140
Copy link
Contributor

@wzx140 wzx140 commented Jun 29, 2022

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzx140 IMO, you are trying to fix an issue. Correct me if I am wrong.
When fixing an issue, we should file a Jira ticket to track it.

@wzx140 wzx140 changed the title [MINOR] fix use of HoodieDataBlock#getRecordIterator [HUDI-4344] [MINOR] fix use of HoodieDataBlock#getRecordIterator Jun 29, 2022
@wzx140
Copy link
Contributor Author

wzx140 commented Jun 29, 2022

@yanghua Thank you for your correction. This pr just fix compile error in release-feature-rfc46 after rebase on master.

@yanghua
Copy link
Contributor

yanghua commented Jun 29, 2022

Ok, so make your PR title more reasonable?

@xushiyan xushiyan changed the title [HUDI-4344] [MINOR] fix use of HoodieDataBlock#getRecordIterator [HUDI-4344] fix use of HoodieDataBlock#getRecordIterator Jun 29, 2022
@xushiyan
Copy link
Member

@wzx140 if you linked a jira in title, then you can remove MINOR

@wzx140
Copy link
Contributor Author

wzx140 commented Jun 29, 2022

@xushiyan ok. I'll pay attention next time.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan merged commit 9c361b7 into apache:release-feature-rfc46 Jun 29, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Aug 3, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Aug 28, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Sep 18, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Oct 3, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Oct 5, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Oct 6, 2022
wzx140 added a commit to wzx140/hudi that referenced this pull request Nov 30, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 1, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 2, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 3, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 9, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 13, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
alexeykudinkin pushed a commit to wzx140/hudi that referenced this pull request Dec 13, 2022
[minor] add more test for rfc46 (apache#7003)

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 13, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
alexeykudinkin pushed a commit to wzx140/hudi that referenced this pull request Dec 13, 2022
[minor] add more test for rfc46 (apache#7003)

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
[minor] add more test for rfc46 (apache#7003)

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
wzx140 added a commit to wzx140/hudi that referenced this pull request Dec 14, 2022
[minor] add more test for rfc46 (apache#7003)

## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should
 use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

* Redirected Kryo registration to `HoodieKryoRegistrar`

* Registered additional classes likely to be serialized by Kryo

* Updated tests

* Fixed serialization of Avro's `Utf8` to serialize just the bytes

* Added tests

* Added custom `AvroUtf8Serializer`;
Tidying up

* Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils`

* `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`;
Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar`

* `lint`

* Fixing compilation for Spark 2.x

* Disabling flaky test

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

* Implemented serialization hooks for `HoodieSparkRecord`

* Added `TestHoodieSparkRecord`

* Added tests for Avro-based records

* Added test for `HoodieEmptyRecord`

* Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter`

* Properly handle deflated records

* Fixing `Row`s encoding

* Fixed `HoodieRecord` to be properly sealed/unsealed

* Fixed serialization of the `HoodieRecordGlobalLocation`

[MINOR] Additional fixes for apache#6745 (apache#6947)

* Tidying up

* Tidying up more

* Cleaning up duplication

* Tidying up

* Revisited legacy operating mode configuration

* Tidying up

* Cleaned up `projectUnsafe` API

* Fixing compilation

* Cleaning up `HoodieSparkRecord` ctors;
Revisited mandatory unsafe-projection

* Fixing compilation

* Cleaned up `ParquetReader` initialization

* Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection

* Cleaning up redundant exception spec

* Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary;
Cleaned up `MetadataValues`

* Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord`

* De-duplicate `HoodieSparkRecord` ctors;
Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord`

* Removed unnecessary copying

* Cleaned up projection for `HoodieSparkRecord` (dropping partition columns);
Removed unnecessary copying

* Fixing compilation

* Fixing compilation (for Flink)

* Cleaned up File Raders' interfaces:
  - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads)
  - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces

* Cleaned up File Readers impls (inline with then new interfaces)

* Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader`

* Tidying up

* Missing licenses

* Re-instate custom override for `HoodieAvroParquetReader`;
Tidying up

* Fixed missing cloning w/in `HoodieLazyInsertIterable`

* Fixed missing cloning in deduplication flow

* Allow `HoodieSparkRecord` to hold `ColumnarBatchRow`

* Missing licenses

* Fixing compilation

* Missing changes

* Fixed Spark 2.x validation whether the row was read as a batch

Fix comment in RFC46 (apache#6745)

* rename

* add MetadataValues in updateMetadataValues

* remove singleton in fileFactory

* add truncateRecordKey

* remove hoodieRecord#setData

* rename HoodieAvroRecord

* fix code style

* fix HoodieSparkRecordSerializer

* fix benchmark

* fix SparkRecordUtils

* instantiate HoodieWriteConfig on the fly

* add test

* fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo

* add broadcast

* fix comment

* remove unnecessary broadcast

* add unsafe check in spark record

* fix getRecordColumnValues

* remove spark.sql.parquet.writeLegacyFormat

* fix unsafe projection

* fix

* pass external schema

* update doc

* rename back to HoodieAvroRecord

* fix

* remove comparable wrapper

* fix comment

* fix comment

* fix comment

* fix comment

* simplify row copy

* fix ParquetReaderIterator

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

* Update the RFC-46 doc to fix comments feedback

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

* [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

* add schema finger print

* add benchmark

* a new way to config the merger

* fix

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
xushiyan pushed a commit that referenced this pull request Dec 14, 2022
## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

[MINOR] Additional fixes for #6745 (#6947)

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (#6132)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(#5629)

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
## Change Logs

 - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict".
 - Fix nest field exist in HoodieCatalystExpressionUtils
 - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro
 - Fallback to avro when use "merge into" sql
 - Fix some schema handling issue
 - Support delta streamer
 - Convert parquet schema to spark schema and then avro schema(in
 org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with
 parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as
 int/long . We will lose the logic type info if we directly convert it to avro schema.
 - Support schema evolution in parquet block

[Minor] fix multi deser avro payload (apache#7021)

In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore.

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>

[MINOR] Properly registering target classes w/ Kryo (apache#7026)

* Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name)

[MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977)

* Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable`

* Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly
serialize record's payload;
Implemented serialization hooks for `HoodieAvroIndexedRecord`;
Implemented serialization hooks for `HoodieEmptyRecord`;
Implemented serialization hooks for `HoodieAvroRecord`;

* Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row

[MINOR] Additional fixes for apache#6745 (apache#6947)

Co-authored-by: Shawy Geng <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>

[RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629)

Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific  HoodieRecord (apache#5627)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005)

Co-authored-by: wangzixuan.wzxuan <[email protected]>

[HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927)

[MINOR] Fix type casting in TestHoodieHFileReaderWriter

[HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522)

Co-authored-by: Alexey Kudinkin <[email protected]>
Co-authored-by: wangzixuan.wzxuan <[email protected]>
Co-authored-by: gengxiaoyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants