Skip to content

Conversation

@dongkelun
Copy link
Contributor

Tips

What is the purpose of the pull request

Add support allowDuplicateInserts in HoodieJavaClient

Brief change log

(for example:)

  • Add support allowDuplicateInserts in HoodieJavaClient

Verify this pull request

This change added tests and can be verified as follows:

  • Added testHoodieConcatHandle in TestJavaCopyOnWriteActionExecutor

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@pratyakshsharma
Copy link
Contributor

LGTM!

@vinothchandar vinothchandar self-assigned this Sep 23, 2021
@vinothchandar
Copy link
Member

cc @yihua

#3696 is refactoring a bunch of this to ensure such cross client inconsistencies are minimized. Will take a look once we land that.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on the tests, but given the other PR is delayed. Might be okay to land this and reconcile. but still this is just duplicating code from the spark action executor

records2.add(new HoodieRecord(new HoodieKey(insertRow1.getRowKey(), insertRow1.getPartitionPath()), insertRow1));
records2.add(new HoodieRecord(new HoodieKey(insertRow2.getRowKey(), insertRow2.getPartitionPath()), insertRow2));

Thread.sleep(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid relying on sleep in tests. You can generate the commit times in the test, for greater control? Also could we move this test as a separate one for HoodieConcatHandle?

}

@Test
public void testHoodieConcatHandle() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondeing if this can be rewritten more nicely using the TestTable abstraction. cc @xushiyan .

@dongkelun dongkelun force-pushed the HUDI-2417 branch 5 times, most recently from 4feee2b to dc76423 Compare December 22, 2021 03:30
@dongkelun
Copy link
Contributor Author

@vinothchandar Hello, I have added comments, removed sleep and rebase with latest master, and put the test cases in separate class: TestHoodieConcatHandle.Can you take a look?

int index = 0;
for (GenericRecord record : fileRecords) {
assertEquals(records1.get(index).getRecordKey(), record.get("_row_key").toString());
if (index == 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add assertion for index = 2. I assume "number" value should be 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add it later

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in general is good. just 1 nit. Can we try to see if we can leverage HoodieTestDataGenerator to write tests.
you can do something like this
dataGen.generateInserts : batch1
dataGen.generateUpdates : batch2
if not for allowDuplicates config set, total record count in snapshot view will be just batch1 records (assuming we updated all records from batch1)
if config is set to true, total record count in snapshot view will be batch1 records + batch2 records. and you can do some assertions to ensure both batch of records are present.

@dongkelun
Copy link
Contributor Author

changes in general is good. just 1 nit. Can we try to see if we can leverage HoodieTestDataGenerator to write tests. you can do something like this dataGen.generateInserts : batch1 dataGen.generateUpdates : batch2 if not for allowDuplicates config set, total record count in snapshot view will be just batch1 records (assuming we updated all records from batch1) if config is set to true, total record count in snapshot view will be batch1 records + batch2 records. and you can do some assertions to ensure both batch of records are present.

OK, I'll try

@nsivabalan
Copy link
Contributor

since you have already put in time to write some tests, let already existing tests be there. may be in addition, give it a try.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 1f7b6b2 into apache:master Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants