Fix comment in RFC46 #6745

wzx140 · 2022-09-22T15:05:25Z

Change Logs

Fix comments in rfc46. You can check all commits.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level: none | low | medium | high

Choose one. If medium or high, explain what verification was done to mitigate the risks.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

wzx140 · 2022-09-22T15:11:22Z

@alexeykudinkin Could you please review it? Really thanks! Geng Xiaoyu and me have fix comments in RFC46. I found @DefaultSerializer(HoodieSparkRecordSerializer.class) and user need not to set spark.kryo.registrator.

wzx140 · 2022-09-23T12:41:30Z

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11609&view=results

alexeykudinkin · 2022-09-23T18:52:00Z

@wzx140 can you please validate that @DefaultSerializer(HoodieSparkRecordSerializer.class) leads to HoodieSparkRecordSerializer being used by Kryo when shuffling?

alexeykudinkin · 2022-09-23T18:54:57Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

      if (combinedValue != oldValue) {
-        hoodieRecord.setData(combinedValue);
-        records.put(key, hoodieRecord);
+        records.put(key, hoodieRecord.newInstance(combinedValue));


@wzx140 why do we still need to do newInstance why can't we use HoodieRecord returned by the merger?

We can see the old logic. It make a new record that use combinedValue and other parts from hoodieRecord.

HoodieRecord<? extends HoodieRecordPayload> oldRecord = records.get(key); HoodieRecordPayload oldValue = oldRecord.getData(); HoodieRecordPayload combinedValue = hoodieRecord.getData().preCombine(oldValue); // If combinedValue is oldValue, no need rePut oldRecord if (combinedValue != oldValue) { HoodieOperation operation = hoodieRecord.getOperation(); records.put(key, new HoodieAvroRecord<>(new HoodieKey(key, hoodieRecord.getPartitionPath()), combinedValue, operation)); }

But that doesn't make sense, right? Previously preCombine would return HoodieRecordPayload we're now returning HoodieRecord then call getData and immediately write it back into HoodieRecord.

Let's simplify it

alexeykudinkin · 2022-09-23T19:05:56Z

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java

   * @param structType  {@link StructType} instance.
   * @return Column value if a single column, or concatenated String values by comma.
   */
  public static Object getRecordColumnValues(InternalRow row,


Let's not forget to change the interface to return Object[]

alexeykudinkin · 2022-09-23T19:06:10Z

...hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/HoodieSparkRecordSerializer.scala

- *                that needs to be serialized.
 */
-class SparkStructTypeSerializer(schemas: Map[Long, StructType]) extends KSerializer[HoodieSparkRecord] {
+class HoodieSparkRecordSerializer() extends KSerializer[HoodieSparkRecord] {


You can drop ()

...-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java

alexeykudinkin · 2022-09-23T19:26:37Z

...hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/HoodieSparkRecordSerializer.scala

    val record = datum.newInstance().asInstanceOf[HoodieSparkRecord]
    record.setStructType(null)
-    val stream = new ObjectOutputStream(output)
+    val byteStream = new ByteArrayOutputStream()


Sorry, i skimmed t/h this initially but overlooked how we're doing the serialization here:

We should only serialize the UnsafeRow that HoodieSparkRecord holds (see my comment above); UnsafeRow implements KryoSerializable so we can use these methods to properly write it out into the stream; we should not be relying on Java's object serialization framework for that

alexeykudinkin · 2022-09-23T19:28:03Z

hudi-common/src/main/java/org/apache/hudi/common/util/collection/FlatLists.java

+/**
+ * Space-efficient, comparable, immutable lists, copied from calcite core.
+ */
+public class FlatLists {


Shall we call this ComparableLists? "Flat" reference is confusing

It is copied from calcite core. And ComparableLists already exists in FlatLists.java.

wzx140 · 2022-09-24T15:58:18Z

@wzx140 can you please validate that @DefaultSerializer(HoodieSparkRecordSerializer.class) leads to HoodieSparkRecordSerializer being used by Kryo when shuffling?

Yes. I check it in UT with printing some message in HoodieSparkRecordSerializer.

wzx140 · 2022-09-25T03:39:57Z

@hudi-bot run azure

wzx140 · 2022-09-26T14:41:45Z

@alexeykudinkin Please review the update when you have time.

...-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java

alexeykudinkin · 2022-09-26T21:21:09Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

      if (combinedValue != oldValue) {
-        hoodieRecord.setData(combinedValue);
-        records.put(key, hoodieRecord);
+        records.put(key, hoodieRecord.newInstance(combinedValue));


But that doesn't make sense, right? Previously preCombine would return HoodieRecordPayload we're now returning HoodieRecord then call getData and immediately write it back into HoodieRecord.

Let's simplify it

alexeykudinkin · 2022-09-26T21:24:19Z

...di-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkParquetStreamWriter.java

  public HoodieSparkParquetStreamWriter(FSDataOutputStream outputStream,
      HoodieRowParquetConfig parquetConfig) throws IOException {
    this.writeSupport = parquetConfig.getWriteSupport();
+    this.writeSupport.enableLegacyFormat();


Why are we doing that?

Because in avro2parquetLog AvroWriteSupport#"parquet.avro.write-old-list-structure" default value is true and currently we use its default value. In order to be compatible, We should also let sparkRecord2parquetLog write with legency format

Sorry, not sure what avro2parquetLog is

If we use HoodieAvroRecord to write parquetLog, it will write with AvroWriteSupport#"parquet.avro.write-old-list-structure"=true. In order to be compatible, We should also use HoodieSparkRecord write parquetLog with legency format. So that we can use HoodieAvroRecord to read the parquet log which written by HoodieSparkRecord.

// Standard mode: // // <list-repetition> group <name> (LIST) { // repeated group list { // ^~~~ repeatedGroupName // <element-repetition> <element-type> element; // ^~~~~~~ elementFieldName // } // } // Legacy mode, with non-nullable elements: // // <list-repetition> group <name> (LIST) { // repeated <element-type> array; // ^~~~~ repeatedFieldName // }

alexeykudinkin · 2022-09-26T21:53:26Z

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java

+    List<Object> result = new ArrayList<>();
+    for (String col : columns) {
+      NestedFieldPath posList = HoodieInternalRowUtils.getCachedPosList(structType, col);
+      result.add(Option.ofNullable(HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList)).orElse("").toString());


@wzx140 We should not be converting values to strings

alexeykudinkin · 2022-09-26T21:54:32Z

hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieInternalRowUtils.scala

+  }
+
+  /**
+   * For UT.


There's common annotation used for this: @VisibleForTesting

alexeykudinkin · 2022-09-26T22:10:39Z

...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala

          hoodieRecord
        }).toJavaRDD()
      case HoodieRecord.HoodieRecordType.SPARK =>
+        log.info("Use spark record")


Let's generalize this to log whatever record-type is used. Let's also downgrade to debug

alexeykudinkin · 2022-09-26T22:11:17Z

...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala

-        HoodieInternalRowUtils.addCompressedSchema(structType)
+        val structTypeWithMetaField = HoodieInternalRowUtils.getCachedSchema(schemaWithMetaField)
+        val structTypeBC = sparkContext.broadcast(structType)
+        HoodieInternalRowUtils.broadcastCompressedSchema(List(structType, structTypeWithMetaField), sparkContext)


Why do we need this?

The broadcast ensures that every executor has the same fingerprint->schema cache. We can't make every executor initialize fingerprint->schema locally.

For example, if we initialize fingerprint->schema in sparksqlwriter df, it is possible that the number of dataframe partitions is less than the number of executors.

HoodieInternalRowUtils.addCompressedSchema(structType) df.queryExecution.toRdd.map(row => { val internalRow = row.copy() val (processedRow, writeSchema) = getSparkProcessedRecord(partitionCols, internalRow, dropPartitionColumns, structTypeBC.value) val recordKey = sparkKeyGenerator.getRecordKey(internalRow, structTypeBC.value) val partitionPath = sparkKeyGenerator.getPartitionPath(internalRow, structTypeBC.value) val key = new HoodieKey(recordKey.toString, partitionPath.toString) // See this We can't guarantee that this is executed in all executors HoodieInternalRowUtils.addCompressedSchema(structType) new HoodieSparkRecord(key, processedRow, writeSchema) }).toJavaRDD().asInstanceOf[JavaRDD[HoodieRecord[_]]]

As shown in the figure below, executor3 do not has the fingerPrint cache and can not deserialize the fingerPrint to schema.

Sorry, i misread your original intent with this change. Unfortunately, this is not going to work: for Broadcast to work exactly the same object has to be passed in (w/in captured Closure) from Driver to Executor (Broadcasts are identified by unique bid). In the current setup you're just overwriting static variables w/in HoodieInternalRowUtils creating new Broadcasts which aren't propagated to Executors (the reason it works in tests is b/c we run phony Spark cluster where both Driver and Executor share memory, ie they access the same loaded HoodieInternalRowUtils and this problem is not apparent)

We need a simpler solution here:

We should remove the schema from HoodieSparkRecords. At this point we ultimately are familiar with all the limitations of the approach which Spark uses to reduce overhead of serializing the schema along w/ every record -- it requires us to provide all the schemas upfront to be distributed to all Executors (via SparkConf which is read-only after SparkContext is created).

We will have to pass record's schema externally: we can approach it the same way it's currently done for ExpressionPayload in this PR

I don't agree with that Broadcast does not work. I have tested this on yarn and it worked. The process of broadcast: the driver will put the blockcast variable into the blockmanager. And executors will be obtained locally according to the block id, and if it is not available locally, it will be pulled from the remote driver.

If the Broadcast works, we do not need to pass record's schema externally right? Passing record's schema externally requires user of HoodieRecord to pass the right schema.

alexeykudinkin · 2022-09-26T22:21:16Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala

-        HoodieWriteConfig.MERGER_IMPLS.defaultValue()),
-      mergerStrategy = optParams.getOrElse(HoodieWriteConfig.MERGER_STRATEGY.key(),
-        metaClient.getTableConfig.getMergerStrategy)
+      mergerImpls = mergerImpls,


Leaving comments here for generateRecordMerger as well:

Let's do config parsing here, passing in Seq[String] instead of String

Let's rename generateRecordMerger to createRecordMerger

alexeykudinkin · 2022-09-26T22:28:05Z

...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala

        maxCompactionMemoryInBytes, config, internalSchema)
    }

+    log.info(s"Use ${logScanner.getRecordType}")


Please downgrade to debug

alexeykudinkin · 2022-09-26T22:31:04Z

...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala

      .getParent
  }

  private def registerStructTypeSerializerIfNeed(schemas: List[StructType]): Unit = {


Why do we need this method?
Schema registration should be adhoc -- whenever someone is requesting to serialize HoodieSparkRecord we'd cache the schema, replacing w/ the fingerprint

wzx140 · 2022-10-01T09:08:17Z

@hudi-bot run azure

wzx140 · 2022-10-05T16:07:54Z

@alexeykudinkin I have implement without the Broadcast and schema inside HoodieSparkRecord and pass record's schema externally. Please review the update when you have time.

wzx140 · 2022-10-05T16:08:37Z

@hudi-bot run azure

wzx140 · 2022-10-06T07:08:00Z

CI fail with testReadFilterExistAfterBulkInsertPrepped. And it will fix after rebasing on master with commit HUDI-4923.

alexeykudinkin · 2022-10-10T22:32:20Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieConcatHandle.java

  @Override
  public void write(HoodieRecord oldRecord) {
-    String key = oldRecord.getRecordKey(keyGeneratorOpt);
-    Schema schema = useWriterSchemaForCompaction ? tableSchemaWithMetaFields : tableSchema;


@wzx140 would need to clarify this one

alexeykudinkin · 2022-10-10T23:01:49Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordMerger.java

   * of the single record, both orders of operations applications have to yield the same result)
   */
-  Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+  Pair<Option<HoodieRecord>, Schema> merge(HoodieRecord older, Schema oldSchema, HoodieRecord newer, Schema newSchema, Properties props) throws IOException;


I think we should invert this to be Option<Pair<HoodieRecord, Schema>>

alexeykudinkin · 2022-10-10T23:05:20Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java

          // If it is an IGNORE_RECORD, just copy the old record, and do not update the new record.
          copyOldRecord = true;
-        } else if (writeUpdateRecord(hoodieRecord, oldRecord, combinedRecord)) {
+        } else if (writeUpdateRecord(hoodieRecord, oldRecord, combinedRecord, combineRecordSchema)) {


nit: hoodieRecord > newRecord

alexeykudinkin · 2022-10-10T23:21:08Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java

  }

-  protected boolean writeUpdateRecord(HoodieRecord<T> hoodieRecord, HoodieRecord oldRecord, Option<HoodieRecord> combineRecordOp) throws IOException {
+  protected boolean writeUpdateRecord(HoodieRecord<T> hoodieRecord, HoodieRecord<T> oldRecord, Option<HoodieRecord> combineRecordOp, Schema writerSchema) throws IOException {


@wzx140 this API is rather confusing: why do we need to pass all 3 records at the same time?

Disregard. This is how it's right now, left note to self to revisit this.

...ient/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleWithChangeLog.java

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java

hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieInternalRowUtils.scala

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java

alexeykudinkin · 2022-10-11T22:38:26Z

...ient/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleWithChangeLog.java

+  protected boolean writeUpdateRecord(HoodieRecord<T> hoodieRecord, HoodieRecord<T> oldRecord, Option<HoodieRecord> combineRecordOp, Schema writerSchema)
      throws IOException {
-    final boolean result = super.writeUpdateRecord(hoodieRecord, oldRecord, combineRecordOp);
+    Option<HoodieRecord> savedCombineRecordOp = combineRecordOp.map(HoodieRecord::newInstance);


I see what you're working around now.

Let's leave a TODO and create a ticket to follow-up

Remove record deflation: instead of deflation, we should just make sure we avoid keeping refs to the Record instant itself (so that it'd be collected)

Remove these unnecessary newInstance invocations

.../hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieWriteHelper.java

...ient/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleWithChangeLog.java

...i-flink-client/src/main/java/org/apache/hudi/io/FlinkMergeAndReplaceHandleWithChangeLog.java

alexeykudinkin · 2022-10-11T23:00:04Z

...-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java

  public HoodieRecord rewriteRecord(Schema recordSchema, Properties props, Schema targetSchema) throws IOException {
+    StructType structType = HoodieInternalRowUtils.getCachedSchema(recordSchema);
    StructType targetStructType = HoodieInternalRowUtils.getCachedSchema(targetSchema);
    UTF8String[] metaFields = extractMetaField(targetStructType);


Right, but you're extracting this meta-fields from the current record, and so if it doesn't currently have it there's nothing to extract.
My point is we can't apply new schema to the record, until we rewrite the record into new schema -- we should use recordSchema to check whether it has the meta-fields

...-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java

alexeykudinkin · 2022-10-12T00:22:48Z

...ent/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkParquetReader.java

      }
    }.withConf(conf).build();
-    ParquetReaderIterator<InternalRow> parquetReaderIterator = new ParquetReaderIterator<>(reader, InternalRow::copy);
+    ParquetReaderIterator<InternalRow> parquetReaderIterator = new ParquetReaderIterator<>(reader,


@wzx140 let me elaborate on the context of my question:

I'm aware that Spark is using mutable buffers for reading the data, but we shouldn't need to copy it right away, we should only have to copy where absolutely necessary. Meaning that by default, if we'd just be iterating over the records from one file and writing it into another one we wouldn't need to make any copies, right? We can easily reuse the same buffer, and while we continue to read and write 1 record at a time (while iterating over provided Iterator).

val iter = parquet.read(file) while (iter.hasNext) { writeToFile(iter.next) }

We only really need to make a copy whenever we retain a reference to the Row, for ex, if we do reduceByKey (as below, or if we'd hold it in a Map, etc)

val iter = parquet.read(file) iter.reduceByKey((r1, r2) => { // NOTE: In this case we're holding refs to 2 rows simultaneously, hence we'd need to make a copy before that // since otherwise both r1 and r2 will be pointing to the same buffer })

So, we basically need:

Avoid blanket copying here

Find the place where we hold the refs to the row (as described above, this is the reason the test is failing) and allow copying in the least possible scope

Does it make sense?

hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieInternalRowUtils.scala

alexeykudinkin · 2022-10-12T00:42:41Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java

  @Override
-  public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema schema, Properties props) throws IOException {
-    Option<IndexedRecord> avroData = getData().getInsertValue(schema, props);
+  public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema recordSchema, Properties props) throws IOException {


Sorry, my 2d statement was confusing, let me elaborate on my comment:

This API converts from one HoodieRecord to another HoodieRecord

There's no reason why we'd be returning an Option (from external perspective, it looks really strange that we actually return an Option, though we simply do a lateral conversion)

Ideally we should be returning an Option when this object is dereferenced (getData is called), but this would affect the API

Let's keep it as it is for now, we can change it later when we'll be getting rid of HoodieAvroRecord

alexeykudinkin · 2022-10-12T00:58:09Z

hudi-common/src/main/java/org/apache/hudi/common/util/ConfigUtils.java

+        .map(String::trim).distinct().collect(Collectors.toList());
+  }
+
+  public static Properties setDeDuping(Properties props) {


Let's move this util to HoodieAvroRecordMerger (since it's only useful with it)

alexeykudinkin · 2022-10-12T01:17:40Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java


-    return ((HoodieAvroRecord) newer).getData().combineAndGetUpdateValue(previousRecordAvroPayload.get().getData(), schema, props)
+    return ((HoodieAvroRecord) newer).getData().combineAndGetUpdateValue(previousAvroData.get(), schema, props)
        .map(combinedAvroPayload -> new HoodieAvroIndexedRecord((IndexedRecord) combinedAvroPayload));


@wzx140 we should actually produce HoodieAvroRecord here instead:

Merged records will be subsequently reshuffled a few times, and shuffling IndexedRecord will kill performance

Instead we should use HoodieAvroRecord holding some dummy RecordPayload just holds Avro bytes for us (and throws if anyone will try to use it for merging)

Does it make sense?

I don't think so. In HoodieAvroRecordMerger, we do combineAndGetUpdateValue before writing and shuffling is done before combineAndGetUpdateValue. We do preCombine with record which hold avro byte and then shuffle the result. So there is no performce loss. This is also the original logic.

You're right, i double-checked and we do indeed not shuffle the HoodieAvroIndexedRecord

alexeykudinkin · 2022-10-12T23:09:38Z

...ient/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleWithChangeLog.java

+  protected boolean writeUpdateRecord(HoodieRecord<T> newRecord, HoodieRecord<T> oldRecord, Option<HoodieRecord> combineRecordOpt, Schema writerSchema)
      throws IOException {
-    final boolean result = super.writeUpdateRecord(hoodieRecord, oldRecord, combineRecordOp);
+    // TODO Remove these unnecessary newInstance invocations


@wzx140 please create a ticket under RFC-46 epic, and link it here like TODO(HUDI-xxx)

alexeykudinkin · 2022-10-12T23:15:36Z

...nt/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseWriteHelper.java

-      I records, HoodieIndex<?, ?> index, int parallelism, String schema, Properties props, HoodieRecordMerger merge);
+  public I deduplicateRecords(
+      I records, HoodieIndex<?, ?> index, int parallelism, String schema, Properties props, HoodieRecordMerger merger) {
+    return innerDeduplicateRecords(records, index, parallelism, schema, HoodieAvroRecordMerger.withDeDuping(props), merger);


It's better to actually do the other way around:

Keep inherited method named deduplicateRecords

Create new private method deduplicateRecordsInternal modifying props and invoke it in this class

Well, deduplicateRecords also used in StreamWriteFunction in Flink. So we should modify props in deduplicateRecords.

alexeykudinkin · 2022-10-12T23:19:47Z

...-client/hudi-spark-client/src/main/java/org/apache/hudi/commmon/model/HoodieSparkRecord.java

  public HoodieRecord rewriteRecord(Schema recordSchema, Properties props, Schema targetSchema) throws IOException {
+    StructType structType = HoodieInternalRowUtils.getCachedSchema(recordSchema);
    StructType targetStructType = HoodieInternalRowUtils.getCachedSchema(targetSchema);
    UTF8String[] metaFields = extractMetaField(targetStructType);


@wzx140 you extract meta-fields from the current records using targetStructType which is a new schema.

Let's consider the case when current record doesn't have meta-fields (it doesn't have it in the schema), this code would fail unpredictably in that case since we're using field ordinals from targetStructType (targetStructType != structType)

alexeykudinkin · 2022-10-12T23:23:42Z

...ent/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkParquetReader.java

      }
    }.withConf(conf).build();
-    ParquetReaderIterator<InternalRow> parquetReaderIterator = new ParquetReaderIterator<>(reader, InternalRow::copy);
+    ParquetReaderIterator<InternalRow> parquetReaderIterator = new ParquetReaderIterator<>(reader,


@wzx140 that's exactly what i'm referring to -- let's move such copying to the least possible scope to make sure we're only making copies where we absolutely have (and nowhere else)

alexeykudinkin · 2022-10-13T17:31:47Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java


-    return ((HoodieAvroRecord) newer).getData().combineAndGetUpdateValue(previousRecordAvroPayload.get().getData(), schema, props)
+    return ((HoodieAvroRecord) newer).getData().combineAndGetUpdateValue(previousAvroData.get(), schema, props)
        .map(combinedAvroPayload -> new HoodieAvroIndexedRecord((IndexedRecord) combinedAvroPayload));


You're right, i double-checked and we do indeed not shuffle the HoodieAvroIndexedRecord

alexeykudinkin · 2022-10-14T02:00:41Z

hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java

-  // For directly use InternalRow
-  private Function<T, T> mapper;
+  // Whether next is consumed
+  private boolean consumed;


Why are we changing this one?

ParquetReaderIterator#next will return the current row and get the next row. We should get rid of it if we do not want to copy row in ParquetReaderIterator.

T retVal = this.next; // This is what I want to remove if we not copy the retVal. this.next = read(); return retVal;

alexeykudinkin · 2022-10-14T02:40:19Z

hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java

        }
      }
-      T retVal = this.next;
-      this.next = read();


This is wrong, we should not be doing read() in next, instead it should be like following:

def hasNext() { if (next == null) { next = read() } return next != null } def next() { if (next == null) throw val ret = next next = null return next }

wzx140 · 2022-10-14T07:56:42Z

@hudi-bot run azure

hudi-bot · 2022-10-14T08:13:44Z

CI report:

0650eb4 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

* Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch

[minor] add more test for rfc46 (apache#7003) ## Change Logs - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) * Redirected Kryo registration to `HoodieKryoRegistrar` * Registered additional classes likely to be serialized by Kryo * Updated tests * Fixed serialization of Avro's `Utf8` to serialize just the bytes * Added tests * Added custom `AvroUtf8Serializer`; Tidying up * Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils` * `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`; Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar` * `lint` * Fixing compilation for Spark 2.x * Disabling flaky test [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row * Implemented serialization hooks for `HoodieSparkRecord` * Added `TestHoodieSparkRecord` * Added tests for Avro-based records * Added test for `HoodieEmptyRecord` * Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter` * Properly handle deflated records * Fixing `Row`s encoding * Fixed `HoodieRecord` to be properly sealed/unsealed * Fixed serialization of the `HoodieRecordGlobalLocation` [MINOR] Additional fixes for apache#6745 (apache#6947) * Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch Fix comment in RFC46 (apache#6745) * rename * add MetadataValues in updateMetadataValues * remove singleton in fileFactory * add truncateRecordKey * remove hoodieRecord#setData * rename HoodieAvroRecord * fix code style * fix HoodieSparkRecordSerializer * fix benchmark * fix SparkRecordUtils * instantiate HoodieWriteConfig on the fly * add test * fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo * add broadcast * fix comment * remove unnecessary broadcast * add unsafe check in spark record * fix getRecordColumnValues * remove spark.sql.parquet.writeLegacyFormat * fix unsafe projection * fix * pass external schema * update doc * rename back to HoodieAvroRecord * fix * remove comparable wrapper * fix comment * fix comment * fix comment * fix comment * simplify row copy * fix ParquetReaderIterator Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) * Update the RFC-46 doc to fix comments feedback * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) * [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer. * add schema finger print * add benchmark * a new way to config the merger * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]>

[minor] add more test for rfc46 (apache#7003) - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) * Redirected Kryo registration to `HoodieKryoRegistrar` * Registered additional classes likely to be serialized by Kryo * Updated tests * Fixed serialization of Avro's `Utf8` to serialize just the bytes * Added tests * Added custom `AvroUtf8Serializer`; Tidying up * Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils` * `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`; Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar` * `lint` * Fixing compilation for Spark 2.x * Disabling flaky test [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row * Implemented serialization hooks for `HoodieSparkRecord` * Added `TestHoodieSparkRecord` * Added tests for Avro-based records * Added test for `HoodieEmptyRecord` * Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter` * Properly handle deflated records * Fixing `Row`s encoding * Fixed `HoodieRecord` to be properly sealed/unsealed * Fixed serialization of the `HoodieRecordGlobalLocation` [MINOR] Additional fixes for apache#6745 (apache#6947) * Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch Fix comment in RFC46 (apache#6745) * rename * add MetadataValues in updateMetadataValues * remove singleton in fileFactory * add truncateRecordKey * remove hoodieRecord#setData * rename HoodieAvroRecord * fix code style * fix HoodieSparkRecordSerializer * fix benchmark * fix SparkRecordUtils * instantiate HoodieWriteConfig on the fly * add test * fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo * add broadcast * fix comment * remove unnecessary broadcast * add unsafe check in spark record * fix getRecordColumnValues * remove spark.sql.parquet.writeLegacyFormat * fix unsafe projection * fix * pass external schema * update doc * rename back to HoodieAvroRecord * fix * remove comparable wrapper * fix comment * fix comment * fix comment * fix comment * simplify row copy * fix ParquetReaderIterator Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) * Update the RFC-46 doc to fix comments feedback * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) * [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer. * add schema finger print * add benchmark * a new way to config the merger * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]>

[minor] add more test for rfc46 (apache#7003) ## Change Logs - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) * Redirected Kryo registration to `HoodieKryoRegistrar` * Registered additional classes likely to be serialized by Kryo * Updated tests * Fixed serialization of Avro's `Utf8` to serialize just the bytes * Added tests * Added custom `AvroUtf8Serializer`; Tidying up * Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils` * `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`; Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar` * `lint` * Fixing compilation for Spark 2.x * Disabling flaky test [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row * Implemented serialization hooks for `HoodieSparkRecord` * Added `TestHoodieSparkRecord` * Added tests for Avro-based records * Added test for `HoodieEmptyRecord` * Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter` * Properly handle deflated records * Fixing `Row`s encoding * Fixed `HoodieRecord` to be properly sealed/unsealed * Fixed serialization of the `HoodieRecordGlobalLocation` [MINOR] Additional fixes for apache#6745 (apache#6947) * Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch Fix comment in RFC46 (apache#6745) * rename * add MetadataValues in updateMetadataValues * remove singleton in fileFactory * add truncateRecordKey * remove hoodieRecord#setData * rename HoodieAvroRecord * fix code style * fix HoodieSparkRecordSerializer * fix benchmark * fix SparkRecordUtils * instantiate HoodieWriteConfig on the fly * add test * fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo * add broadcast * fix comment * remove unnecessary broadcast * add unsafe check in spark record * fix getRecordColumnValues * remove spark.sql.parquet.writeLegacyFormat * fix unsafe projection * fix * pass external schema * update doc * rename back to HoodieAvroRecord * fix * remove comparable wrapper * fix comment * fix comment * fix comment * fix comment * simplify row copy * fix ParquetReaderIterator Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) * Update the RFC-46 doc to fix comments feedback * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) * [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer. * add schema finger print * add benchmark * a new way to config the merger * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]>

[minor] add more test for rfc46 (apache#7003) - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) * Redirected Kryo registration to `HoodieKryoRegistrar` * Registered additional classes likely to be serialized by Kryo * Updated tests * Fixed serialization of Avro's `Utf8` to serialize just the bytes * Added tests * Added custom `AvroUtf8Serializer`; Tidying up * Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils` * `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`; Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar` * `lint` * Fixing compilation for Spark 2.x * Disabling flaky test [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row * Implemented serialization hooks for `HoodieSparkRecord` * Added `TestHoodieSparkRecord` * Added tests for Avro-based records * Added test for `HoodieEmptyRecord` * Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter` * Properly handle deflated records * Fixing `Row`s encoding * Fixed `HoodieRecord` to be properly sealed/unsealed * Fixed serialization of the `HoodieRecordGlobalLocation` [MINOR] Additional fixes for apache#6745 (apache#6947) * Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch Fix comment in RFC46 (apache#6745) * rename * add MetadataValues in updateMetadataValues * remove singleton in fileFactory * add truncateRecordKey * remove hoodieRecord#setData * rename HoodieAvroRecord * fix code style * fix HoodieSparkRecordSerializer * fix benchmark * fix SparkRecordUtils * instantiate HoodieWriteConfig on the fly * add test * fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo * add broadcast * fix comment * remove unnecessary broadcast * add unsafe check in spark record * fix getRecordColumnValues * remove spark.sql.parquet.writeLegacyFormat * fix unsafe projection * fix * pass external schema * update doc * rename back to HoodieAvroRecord * fix * remove comparable wrapper * fix comment * fix comment * fix comment * fix comment * simplify row copy * fix ParquetReaderIterator Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) * Update the RFC-46 doc to fix comments feedback * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) * [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer. * add schema finger print * add benchmark * a new way to config the merger * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]>

[minor] add more test for rfc46 (apache#7003) ## Change Logs - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) * Redirected Kryo registration to `HoodieKryoRegistrar` * Registered additional classes likely to be serialized by Kryo * Updated tests * Fixed serialization of Avro's `Utf8` to serialize just the bytes * Added tests * Added custom `AvroUtf8Serializer`; Tidying up * Extracted `HoodieCommonKryoRegistrar` to leverage in `SerializationUtils` * `HoodieKryoRegistrar` > `HoodieSparkKryoRegistrar`; Rebased `HoodieSparkKryoRegistrar` onto `HoodieCommonKryoRegistrar` * `lint` * Fixing compilation for Spark 2.x * Disabling flaky test [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row * Implemented serialization hooks for `HoodieSparkRecord` * Added `TestHoodieSparkRecord` * Added tests for Avro-based records * Added test for `HoodieEmptyRecord` * Fixed sealing/unsealing for `HoodieRecord` in `HoodieBackedTableMetadataWriter` * Properly handle deflated records * Fixing `Row`s encoding * Fixed `HoodieRecord` to be properly sealed/unsealed * Fixed serialization of the `HoodieRecordGlobalLocation` [MINOR] Additional fixes for apache#6745 (apache#6947) * Tidying up * Tidying up more * Cleaning up duplication * Tidying up * Revisited legacy operating mode configuration * Tidying up * Cleaned up `projectUnsafe` API * Fixing compilation * Cleaning up `HoodieSparkRecord` ctors; Revisited mandatory unsafe-projection * Fixing compilation * Cleaned up `ParquetReader` initialization * Revisited `HoodieSparkRecord` to accept either `UnsafeRow` or `HoodieInternalRow`, and avoid unnecessary copying after unsafe-projection * Cleaning up redundant exception spec * Make sure `updateMetadataFields` properly wraps `InternalRow` into `HoodieInternalRow` if necessary; Cleaned up `MetadataValues` * Fixed meta-fields extraction and `HoodieInternalRow` composition w/in `HoodieSparkRecord` * De-duplicate `HoodieSparkRecord` ctors; Make sure either only `UnsafeRow` or `HoodieInternalRow` are permitted inside `HoodieSparkRecord` * Removed unnecessary copying * Cleaned up projection for `HoodieSparkRecord` (dropping partition columns); Removed unnecessary copying * Fixing compilation * Fixing compilation (for Flink) * Cleaned up File Raders' interfaces: - Extracted `HoodieSeekingFileReader` interface (for key-ranged reads) - Pushed down concrete implementation methods into `HoodieAvroFileReaderBase` from the interfaces * Cleaned up File Readers impls (inline with then new interfaces) * Rebsaed `HoodieBackedTableMetadata` onto new `HoodieSeekingFileReader` * Tidying up * Missing licenses * Re-instate custom override for `HoodieAvroParquetReader`; Tidying up * Fixed missing cloning w/in `HoodieLazyInsertIterable` * Fixed missing cloning in deduplication flow * Allow `HoodieSparkRecord` to hold `ColumnarBatchRow` * Missing licenses * Fixing compilation * Missing changes * Fixed Spark 2.x validation whether the row was read as a batch Fix comment in RFC46 (apache#6745) * rename * add MetadataValues in updateMetadataValues * remove singleton in fileFactory * add truncateRecordKey * remove hoodieRecord#setData * rename HoodieAvroRecord * fix code style * fix HoodieSparkRecordSerializer * fix benchmark * fix SparkRecordUtils * instantiate HoodieWriteConfig on the fly * add test * fix HoodieSparkRecordSerializer. Replace Java's object serialization with kryo * add broadcast * fix comment * remove unnecessary broadcast * add unsafe check in spark record * fix getRecordColumnValues * remove spark.sql.parquet.writeLegacyFormat * fix unsafe projection * fix * pass external schema * update doc * rename back to HoodieAvroRecord * fix * remove comparable wrapper * fix comment * fix comment * fix comment * fix comment * simplify row copy * fix ParquetReaderIterator Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) * Update the RFC-46 doc to fix comments feedback * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) * [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer. * add schema finger print * add benchmark * a new way to config the merger * fix Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]>

## Change Logs - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row [MINOR] Additional fixes for #6745 (#6947) Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (#6132) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(#5629) Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]>

## Change Logs - Add HoodieSparkValidateDuplicateKeyRecordMerger behaving the same as ValidateDuplicateKeyPayload. We should use it with config "hoodie.sql.insert.mode=strict". - Fix nest field exist in HoodieCatalystExpressionUtils - Fix rewrite in HoodieInternalRowUtiles to support type promoted as avro - Fallback to avro when use "merge into" sql - Fix some schema handling issue - Support delta streamer - Convert parquet schema to spark schema and then avro schema(in org.apache.hudi.io.storage.HoodieSparkParquetReader#getSchema). Some types in avro are not compatible with parquet. For ex, decimal as int32/int64 in parquet will convert to int/long in avro. Because avro do not has decimal as int/long . We will lose the logic type info if we directly convert it to avro schema. - Support schema evolution in parquet block [Minor] fix multi deser avro payload (apache#7021) In HoodieAvroRecord, we will call isDelete, shouldIgnore before we write it to the file. Each method will deserialize HoodiePayload. So we add deserialization method in HoodieRecord and call this method once before calling isDelete or shouldIgnore. Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> [MINOR] Properly registering target classes w/ Kryo (apache#7026) * Added `HoodieKryoRegistrar` registering necessary Hudi's classes w/ Kryo to make their serialization more efficient (by serializing just the class id, in-liue the fully qualified class-name) [MINOR] Make sure all `HoodieRecord`s are appropriately serializable by Kryo (apache#6977) * Make sure `HoodieRecord`, `HoodieKey`, `HoodieRecordLocation` are all `KryoSerializable` * Revisited `HoodieRecord` serialization hooks to make sure they a) could not be overridden, b) provide for hooks to properly serialize record's payload; Implemented serialization hooks for `HoodieAvroIndexedRecord`; Implemented serialization hooks for `HoodieEmptyRecord`; Implemented serialization hooks for `HoodieAvroRecord`; * Revisited `HoodieSparkRecord` to transiently hold on to the schema so that it could project row [MINOR] Additional fixes for apache#6745 (apache#6947) Co-authored-by: Shawy Geng <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> [RFC-46][HUDI-4414] Update the RFC-46 doc to fix comments feedback (apache#6132) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4301] [HUDI-3384][HUDI-3385] Spark specific file reader/writer.(apache#5629) Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]> [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord (apache#5627) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4344] fix usage of HoodieDataBlock#getRecordIterator (apache#6005) Co-authored-by: wangzixuan.wzxuan <[email protected]> [HUDI-4292][RFC-46] Update doc to align with the Record Merge API changes (apache#5927) [MINOR] Fix type casting in TestHoodieHFileReaderWriter [HUDI-3378][HUDI-3379][HUDI-3381] Migrate usage of HoodieRecordPayload and raw Avro payload to HoodieRecord (apache#5522) Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: wangzixuan.wzxuan <[email protected]> Co-authored-by: gengxiaoyu <[email protected]>

wzx140 changed the title ~~Fix comment~~ Fix comment in RFC46 Sep 22, 2022

yihua added priority:critical Production degraded; pipelines stalled writer-core labels Sep 22, 2022

yihua assigned bvaradar Sep 22, 2022

yihua added big-needle-movers priority:blocker Production down; release blocker and removed priority:critical Production degraded; pipelines stalled labels Sep 22, 2022

wzx140 force-pushed the fix-comment branch from a2febfb to 7cec5ae Compare September 23, 2022 04:33

alexeykudinkin reviewed Sep 23, 2022

View reviewed changes

yihua assigned alexeykudinkin and unassigned bvaradar Sep 24, 2022

wzx140 force-pushed the fix-comment branch from 7cec5ae to cfc2f99 Compare September 24, 2022 18:49

wzx140 force-pushed the fix-comment branch from bd8029e to 2cd84b4 Compare September 26, 2022 14:38

wzx140 force-pushed the fix-comment branch from 2cd84b4 to 8937349 Compare September 26, 2022 16:25

alexeykudinkin reviewed Sep 26, 2022

View reviewed changes

wzx140 force-pushed the fix-comment branch 2 times, most recently from 5365302 to f2823f9 Compare September 29, 2022 18:17

wzx140 closed this Oct 5, 2022

wzx140 reopened this Oct 5, 2022

wzx140 force-pushed the fix-comment branch from 7525a09 to af8e587 Compare October 5, 2022 19:19

alexeykudinkin reviewed Oct 11, 2022

View reviewed changes

fix comment

102ba04

alexeykudinkin reviewed Oct 12, 2022

View reviewed changes

fix comment

5fc3060

alexeykudinkin reviewed Oct 12, 2022

View reviewed changes

fix comment

a598907

wzx140 force-pushed the fix-comment branch from 2d69488 to a598907 Compare October 13, 2022 09:42

simplify row copy

0650eb4

alexeykudinkin approved these changes Oct 14, 2022

View reviewed changes

alexeykudinkin reviewed Oct 14, 2022

View reviewed changes

fix ParquetReaderIterator

f2e5741

alexeykudinkin mentioned this pull request Oct 14, 2022

[MINOR] Additional fixes for #6745 #6947

Merged

4 tasks

alexeykudinkin merged commit 19ed31f into apache:release-feature-rfc46 Oct 15, 2022

Fix comment in RFC46 #6745

Fix comment in RFC46 #6745

Uh oh!

Conversation

wzx140 commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Contributor's checklist

Uh oh!

wzx140 commented Sep 22, 2022

Uh oh!

wzx140 commented Sep 23, 2022

Uh oh!

alexeykudinkin commented Sep 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wzx140 commented Sep 24, 2022

Uh oh!

wzx140 commented Sep 25, 2022

Uh oh!

wzx140 commented Sep 26, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wzx140 commented Oct 1, 2022

Uh oh!

wzx140 commented Oct 5, 2022

Uh oh!

wzx140 commented Oct 5, 2022

Uh oh!

wzx140 commented Sep 22, 2022 •

edited

Loading