[HUDI-76] Add CSV Source support for Hudi Delta Streamer #1165

yihua · 2019-12-31T20:00:24Z

What is the purpose of the pull request

Add CSV Source support for Hudi Delta Streamer

Brief change log

Verify this pull request

This change added tests and can be verified as follows:

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

vinothchandar · 2020-01-01T05:03:21Z

is this still WIP?

yihua · 2020-01-01T05:06:54Z

@vinothchandar I'm adding unit test cases.

bvaradar · 2020-01-06T18:27:19Z

@yihua : Please ping me once the PR is ready for review.

Thanks,
Balaji.V

yihua · 2020-01-19T07:44:08Z

@bvaradar This PR is ready for review.

@leesf @vinothchandar Feel free to also review this PR. I'm not sure if we can merge this PR by the release cut. If not, we can add this feature to the next release.

Thanks @UZi5136225 for helping test the functionality of this PR and reporting the issue of corrupt data generated from DeltaStreamer with text files (CSV format with no header line). The latter has been fix in another PR.

yihua · 2020-01-19T07:54:13Z

TestCsvDFSSource will be added once #1239 is merged.

vinothchandar · 2020-01-19T23:38:42Z

@yihua are you targeting this for the next release still

yihua · 2020-01-20T01:01:12Z

@vinothchandar From my side, the code change is ready. I'm not sure if it can be reviewed and merged in time. I'm fine with pushing this to v0.6.0.

yihua · 2020-01-20T01:04:09Z

@bvaradar @leesf Could any of you review this PR by EOD?

yihua · 2020-01-26T23:25:34Z

@bvaradar I added more javadoc and checked that Spark CSV supports timestamp-type fields.

vinothchandar · 2020-02-05T06:21:13Z

cc @pratyakshsharma could you help review this?

pratyakshsharma · 2020-02-05T13:22:04Z

@vinothchandar ack.

vinothchandar · 2020-02-14T02:00:24Z

cc @nsivabalan could you take over in case @pratyakshsharma is busy..

pratyakshsharma · 2020-02-14T08:36:02Z

@vinothchandar will review it by EOD today. :)

nsivabalan · 2020-02-15T02:34:04Z

Sure. go ahead. I also plan to review it sometime. but will let you be the primary reviewer.

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java

vinothchandar · 2020-02-22T03:53:23Z

@nsivabalan Please shepherd this across the finish line :)

nsivabalan · 2020-02-22T17:00:39Z

Sure. Will take care.

yihua

@pratyakshsharma Thanks for the review! Good catches on the nits. I've addressed them.

@nsivabalan The PR is ready for a final pass.

yihua · 2020-02-26T07:31:32Z

hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java

Yes, for CSV format, the nested schema is not well supported. So to test CSV source, we need to generate the test CSV data with a flattened schema.

nsivabalan

Minor comments. Will merge once addressed.

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/CsvDFSSource.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/UtilitiesTestBase.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestCsvDFSSource.java

nsivabalan · 2020-02-29T03:59:11Z

One question about using nested schema. Can you remind me what happens if someone passes in a nested schema for CsvDeltaStreamer?

nsivabalan · 2020-03-09T22:53:15Z

@yihua : Can we get this across the line.

yihua · 2020-03-10T20:16:07Z

Sorry for the delay. I'll get to this PR this week.

yihua · 2020-03-11T05:58:42Z

One question about using nested schema. Can you remind me what happens if someone passes in a nested schema for CsvDeltaStreamer?

I used some code below to test the nested schema for CSV reader in Spark. It throws the following exception, which means that Spark CSV source does not support nested schema currently.

In most cases, the CSV schemas should be flattened. It depends on Spark's behavior whether nested schema is supported for CSV source (in the future nested schema may be supported for CSV). So we don't enforce the check in our Hudi code.

org.apache.spark.sql.AnalysisException: CSV data source does not support struct<amount:double,currency:string> data type.;

	at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:69)
	at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:67)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifySchema(DataSourceUtils.scala:67)
	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifyReadSchema(DataSourceUtils.scala:41)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:400)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
	at org.apache.hudi.utilities.sources.CsvDFSSource.fromFiles(CsvDFSSource.java:120)
	at org.apache.hudi.utilities.sources.CsvDFSSource.fetchNextBatch(CsvDFSSource.java:93)
	at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
	at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
	at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:66)
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:317)
	at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
	at org.apache.hudi.utilities.TestHoodieDeltaStreamer.testCsvDFSSourceWithNestedSchema(TestHoodieDeltaStreamer.java:812)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

Appendix: a simple diff for testing nested CSV schema:

diff --git a/hudi-utilities/null/parquetFiles/.1.parquet.crc b/hudi-utilities/null/parquetFiles/.1.parquet.crc
new file mode 100644
index 00000000..f48941c4
Binary files /dev/null and b/hudi-utilities/null/parquetFiles/.1.parquet.crc differ
diff --git a/hudi-utilities/null/parquetFiles/1.parquet b/hudi-utilities/null/parquetFiles/1.parquet
new file mode 100644
index 00000000..7780cb89
Binary files /dev/null and b/hudi-utilities/null/parquetFiles/1.parquet differ
diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
index 4b69d223..e2921a5f 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
@@ -21,7 +21,6 @@ package org.apache.hudi.utilities.deltastreamer;
 import org.apache.hudi.AvroConversionUtils;
 import org.apache.hudi.DataSourceUtils;
 import org.apache.hudi.client.HoodieWriteClient;
-import org.apache.hudi.keygen.KeyGenerator;
 import org.apache.hudi.client.WriteStatus;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
 import org.apache.hudi.common.model.HoodieRecord;
@@ -40,6 +39,7 @@ import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.hive.HiveSyncConfig;
 import org.apache.hudi.hive.HiveSyncTool;
 import org.apache.hudi.index.HoodieIndex;
+import org.apache.hudi.keygen.KeyGenerator;
 import org.apache.hudi.utilities.UtilHelpers;
 import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.Operation;
 import org.apache.hudi.utilities.exception.HoodieDeltaStreamerException;
@@ -332,6 +332,7 @@ public class DeltaSync implements Serializable {
     }
 
     JavaRDD<GenericRecord> avroRDD = avroRDDOptional.get();
+    List<GenericRecord> r = avroRDD.collect();
     JavaRDD<HoodieRecord> records = avroRDD.map(gr -> {
       HoodieRecordPayload payload = DataSourceUtils.createPayload(cfg.payloadClassName, gr,
           (Comparable) DataSourceUtils.getNestedFieldVal(gr, cfg.sourceOrderingField, false));
diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/RowSource.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/RowSource.java
index 9e289f10..6f0cc8f8 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/RowSource.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/RowSource.java
@@ -41,6 +41,7 @@ public abstract class RowSource extends Source<Dataset<Row>> {
   @Override
   protected final InputBatch<Dataset<Row>> fetchNewData(Option<String> lastCkptStr, long sourceLimit) {
     Pair<Option<Dataset<Row>>, String> res = fetchNextBatch(lastCkptStr, sourceLimit);
+    Row[] x = (Row[]) res.getKey().get().collect();
     return res.getKey().map(dsr -> {
       SchemaProvider rowSchemaProvider = new RowBasedSchemaProvider(dsr.schema());
       return new InputBatch<>(res.getKey(), res.getValue(), rowSchemaProvider);
diff --git a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
index 43f76904..46761703 100644
--- a/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
+++ b/hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
@@ -19,7 +19,6 @@
 package org.apache.hudi.utilities;
 
 import org.apache.hudi.DataSourceWriteOptions;
-import org.apache.hudi.keygen.SimpleKeyGenerator;
 import org.apache.hudi.common.HoodieTestDataGenerator;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
 import org.apache.hudi.common.model.HoodieTableType;
@@ -33,11 +32,12 @@ import org.apache.hudi.common.util.FSUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.TypedProperties;
 import org.apache.hudi.config.HoodieCompactionConfig;
-import org.apache.hudi.exception.TableNotFoundException;
 import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.TableNotFoundException;
 import org.apache.hudi.hive.HiveSyncConfig;
 import org.apache.hudi.hive.HoodieHiveClient;
 import org.apache.hudi.hive.MultiPartKeysValueExtractor;
+import org.apache.hudi.keygen.SimpleKeyGenerator;
 import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
 import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.Operation;
 import org.apache.hudi.utilities.schema.FilebasedSchemaProvider;
@@ -101,6 +101,7 @@ public class TestHoodieDeltaStreamer extends UtilitiesTestBase {
   private static final String PROPS_FILENAME_TEST_SOURCE = "test-source.properties";
   private static final String PROPS_FILENAME_TEST_INVALID = "test-invalid.properties";
   private static final String PROPS_FILENAME_TEST_CSV = "test-csv-dfs-source.properties";
+  private static final String PROPS_FILENAME_TEST_CSV_NESTED = "test-csv-dfs-source-nested.properties";
   private static final String PROPS_FILENAME_TEST_PARQUET = "test-parquet-dfs-source.properties";
   private static final String PARQUET_SOURCE_ROOT = dfsBasePath + "/parquetFiles";
   private static final int PARQUET_NUM_RECORDS = 5;
@@ -728,7 +729,51 @@ public class TestHoodieDeltaStreamer extends UtilitiesTestBase {
       csvProps.setProperty("hoodie.deltastreamer.csv.header", Boolean.toString(hasHeader));
     }
 
-    UtilitiesTestBase.Helpers.savePropsToDFS(csvProps, dfs, dfsBasePath + "/" + PROPS_FILENAME_TEST_CSV);
+    UtilitiesTestBase.Helpers
+        .savePropsToDFS(csvProps, dfs, dfsBasePath + "/" + PROPS_FILENAME_TEST_CSV);
+
+    String path = sourceRoot + "/1.csv";
+    HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+    UtilitiesTestBase.Helpers.saveCsvToDFS(
+        hasHeader, sep,
+        Helpers.jsonifyRecords(dataGenerator.generateInserts("000", CSV_NUM_RECORDS, true)),
+        dfs, path);
+  }
+
+  private void prepareCsvDFSSourceNested(
+      boolean hasHeader, char sep, boolean useSchemaProvider, boolean hasTransformer)
+      throws IOException {
+    String sourceRoot = dfsBasePath + "/csvFiles";
+    String recordKeyField = (hasHeader || useSchemaProvider) ? "_row_key" : "_c0";
+
+    // Properties used for testing delta-streamer with CSV source
+    TypedProperties csvProps = new TypedProperties();
+    csvProps.setProperty("include", "base.properties");
+    csvProps.setProperty("hoodie.datasource.write.recordkey.field", recordKeyField);
+    csvProps.setProperty("hoodie.datasource.write.partitionpath.field", "not_there");
+    if (useSchemaProvider) {
+      csvProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
+          dfsBasePath + "/source.avsc");
+      if (hasTransformer) {
+        csvProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file",
+            dfsBasePath + "/target.avsc");
+      }
+    }
+    csvProps.setProperty("hoodie.deltastreamer.source.dfs.root", sourceRoot);
+
+    if (sep != ',') {
+      if (sep == '\t') {
+        csvProps.setProperty("hoodie.deltastreamer.csv.sep", "\\t");
+      } else {
+        csvProps.setProperty("hoodie.deltastreamer.csv.sep", Character.toString(sep));
+      }
+    }
+    if (hasHeader) {
+      csvProps.setProperty("hoodie.deltastreamer.csv.header", Boolean.toString(hasHeader));
+    }
+
+    UtilitiesTestBase.Helpers
+        .savePropsToDFS(csvProps, dfs, dfsBasePath + "/" + PROPS_FILENAME_TEST_CSV_NESTED);
 
     String path = sourceRoot + "/1.csv";
     HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
@@ -739,7 +784,8 @@ public class TestHoodieDeltaStreamer extends UtilitiesTestBase {
   }
 
   private void testCsvDFSSource(
-      boolean hasHeader, char sep, boolean useSchemaProvider, String transformerClassName) throws Exception {
+      boolean hasHeader, char sep, boolean useSchemaProvider, String transformerClassName)
+      throws Exception {
     prepareCsvDFSSource(hasHeader, sep, useSchemaProvider, transformerClassName != null);
     String tableBasePath = dfsBasePath + "/test_csv_table" + testNum;
     String sourceOrderingField = (hasHeader || useSchemaProvider) ? "timestamp" : "_c0";
@@ -753,6 +799,25 @@ public class TestHoodieDeltaStreamer extends UtilitiesTestBase {
     testNum++;
   }
 
+  @Test
+  public void testCsvDFSSourceWithNestedSchema() throws Exception {
+    prepareCsvDFSSourceNested(true, ',', true, false);
+    String tableBasePath = dfsBasePath + "/test_csv_table" + testNum;
+    String sourceOrderingField = "timestamp";
+    HoodieDeltaStreamer deltaStreamer =
+        new HoodieDeltaStreamer(TestHelpers.makeConfig(
+            tableBasePath, Operation.INSERT, CsvDFSSource.class.getName(),
+            null, PROPS_FILENAME_TEST_CSV_NESTED, false,
+            true, 1000, false, null, null, sourceOrderingField), jsc);
+    deltaStreamer.sync();
+
+    Row[] x = (Row[]) sqlContext.read().format("org.apache.hudi")
+        .load(tableBasePath + "/*/*.parquet")
+        .collect();
+    TestHelpers.assertRecordCount(CSV_NUM_RECORDS, tableBasePath + "/*/*.parquet", sqlContext);
+    testNum++;
+  }
+
   @Test
   public void testCsvDFSSourceWithHeaderWithoutSchemaProviderAndNoTransformer() throws Exception {
     // The CSV files have header, the columns are separated by ',', the default separator

yihua

@nsivabalan Thanks for the review. Please take another look.

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/CsvDFSSource.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/UtilitiesTestBase.java

hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestCsvDFSSource.java

codecov-io · 2020-03-11T06:45:08Z

Codecov Report

Merging #1165 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #1165      +/-   ##
============================================
+ Coverage     67.69%   67.74%   +0.04%     
- Complexity      243      253      +10     
============================================
  Files           338      339       +1     
  Lines         16371    16396      +25     
  Branches       1672     1676       +4     
============================================
+ Hits          11083    11108      +25     
  Misses         4548     4548              
  Partials        740      740

Impacted Files	Coverage Δ	Complexity Δ
...rg/apache/hudi/utilities/sources/CsvDFSSource.java	`100.00% <100.00%> (ø)`	`10.00 <10.00> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23afe7a...cf765df. Read the comment docs.

nsivabalan · 2020-03-15T19:45:59Z

@yihua : LGTM. Can you squash the commits. I will merge after that.

nsivabalan · 2020-03-15T19:47:16Z

@vinothchandar : do you think we can have a blog on deltastreamer with csv?

yihua · 2020-03-16T02:01:57Z

I plan to write one after this is merged.

yihua · 2020-03-16T02:04:38Z

@nsivabalan Done squashing the commits.

vinothchandar · 2020-03-16T05:43:21Z

yes.. lets please do a blog!

nsivabalan · 2020-03-17T04:19:26Z

I don't see an option to merge the PR. Is it that @leesf is yet to approve? or do I need to request permission or something ?

leesf · 2020-03-17T04:23:01Z

I don't see an option to merge the PR. Is it that @leesf is yet to approve? or do I need to request permission or something ?

@nsivabalan Please refer to this wiki to get github write access to the repository. https://cwiki.apache.org/confluence/display/HUDI/Committer+On-boarding+Guide

nsivabalan · 2020-03-19T14:00:32Z

@leesf : Thanks. I got the permission now.

leesf · 2020-03-19T14:18:10Z

@leesf : Thanks. I got the permission now.

You are welcome and a nice shot. just one minor tip, please merge(squash & merge / rebase & merge) with [HUDI-xxx] at the begining of commit. :)

nsivabalan · 2020-03-19T14:45:07Z

got it, sure.

vinothchandar · 2020-03-19T16:43:09Z

@leesf this has happened enough times now, that we probably need a Code Review guide as well? wdyt

leesf · 2020-03-20T01:23:54Z

@leesf this has happened enough times now, that we probably need a Code Review guide as well? wdyt

Agree, I would like to update https://cwiki.apache.org/confluence/display/HUDI/Committer+On-boarding+Guide to add Code Review guide for new committers, wdyt?

vinothchandar · 2020-03-20T02:13:14Z

I feel it can go on the contributing guide.. Code reviews are also contributing :) .. either way is fine by me.. Draft something and share on the mailing list?

leesf · 2020-03-20T02:18:21Z

I feel it can go on the contributing guide.. Code reviews are also contributing :) .. either way is fine by me.. Draft something and share on the mailing list?

Sure, will draft when get a chance.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

leesf self-requested a review January 4, 2020 00:26

vinothchandar assigned bvaradar Jan 6, 2020

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch 6 times, most recently from c73b7e9 to f37bd95 Compare January 15, 2020 01:54

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch from f37bd95 to 03f37c4 Compare January 19, 2020 07:22

yihua changed the title ~~[HUDI-76][WIP] Add CSV Source support for Hudi Delta Streamer~~ [HUDI-76] Add CSV Source support for Hudi Delta Streamer Jan 19, 2020

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch from e68ad87 to f9bf397 Compare January 19, 2020 19:20

yihua mentioned this pull request Jan 21, 2020

[HUDI-552] Fix the schema mismatch in Row-to-Avro conversion #1246

Merged

5 tasks

vinothchandar self-assigned this Feb 5, 2020

pratyakshsharma reviewed Feb 16, 2020

View reviewed changes

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java Outdated Show resolved Hide resolved

pratyakshsharma reviewed Feb 16, 2020

View reviewed changes

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java Outdated Show resolved Hide resolved

pratyakshsharma reviewed Feb 16, 2020

View reviewed changes

hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java Outdated Show resolved Hide resolved

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch from bb188f2 to e23adf4 Compare February 26, 2020 07:30

yihua commented Feb 26, 2020

View reviewed changes

nsivabalan reviewed Feb 29, 2020

View reviewed changes

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch from e23adf4 to fb6bc0b Compare March 11, 2020 06:14

yihua commented Mar 11, 2020

View reviewed changes

[HUDI-76] Add CSV Source support for Hudi Delta Streamer

cf765df

yihua force-pushed the HUDI-76-deltastreamer-csv-source branch from fb6bc0b to cf765df Compare March 16, 2020 02:04

nsivabalan merged commit a752b7b into apache:master Mar 19, 2020

vamsikarnika pushed a commit to vamsikarnika/hudi that referenced this pull request Jan 30, 2025

[AUDIT-1146]: Changelog for release-v1.81.0 (apache#1165)

2d09fa3

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

[HUDI-76] Add CSV Source support for Hudi Delta Streamer #1165

[HUDI-76] Add CSV Source support for Hudi Delta Streamer #1165

Uh oh!

Conversation

yihua commented Dec 31, 2019

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

vinothchandar commented Jan 1, 2020

Uh oh!

yihua commented Jan 1, 2020

Uh oh!

bvaradar commented Jan 6, 2020

Uh oh!

yihua commented Jan 19, 2020

Uh oh!

yihua commented Jan 19, 2020

Uh oh!

vinothchandar commented Jan 19, 2020

Uh oh!

yihua commented Jan 20, 2020

Uh oh!

yihua commented Jan 20, 2020

Uh oh!

yihua commented Jan 26, 2020

Uh oh!

vinothchandar commented Feb 5, 2020

Uh oh!

pratyakshsharma commented Feb 5, 2020

Uh oh!

vinothchandar commented Feb 14, 2020

Uh oh!

pratyakshsharma commented Feb 14, 2020

Uh oh!

nsivabalan commented Feb 15, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinothchandar commented Feb 22, 2020

Uh oh!

nsivabalan commented Feb 22, 2020

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

yihua Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nsivabalan commented Feb 29, 2020

Uh oh!

nsivabalan commented Mar 9, 2020

Uh oh!

yihua commented Mar 10, 2020

Uh oh!

yihua commented Mar 11, 2020

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nsivabalan commented Mar 15, 2020

codecov-io commented Mar 11, 2020 •

edited

Loading

nsivabalan commented Mar 15, 2020 •

edited

Loading

nsivabalan commented Mar 19, 2020 •

edited

Loading