change delete logic #3834

hameizi · 2021-12-31T07:39:15Z

Previous delete logic is write all delete data in eq-delete file although there is same key in pos-delete file. This PR change this logic to just write the delete data what is not exist in pos-delete file in eq-delete file. And the old logic will send several delete but one insert of one primary key in one snapshot when we want reperform the data by streaming read, in some scene will cause error (like aggregete count).
the following is difference between old delete logic and the new.
table schema:

int key;  (primary key) 
string data;

old logic:
tx1:

insert (1,'aa');

tx2:

delete (1,'aa'); -->eq-delete file add (1,'aa')
insert(1,'bb');
delete (1,'bb'); -->eq-delete file add (1,'bb') pos-delete file add(1,filepath)
insert(1,'cc')

result:
eq-delete file has (1,'aa'),(1,'bb')
pos-delete file has (1,filepath)

new logic:
tx1:

insert (1,'aa');

tx2:

delete (1,'aa'); -->eq-delete file add (1,'aa')
insert(1,'bb');
delete (1,'bb'); --> pos-delete file add(1,filepath)

insert(1,'cc')

result:
eq-delete file has (1,'aa')
pos-delete file has (1,filepath)

Actually the data (1,'bb') is unnecessary in eq-delete file, because when we call function applyPosdelete that (1,'bb') will be delete from result so there is not data match (1,'bb') when we call applyEqdelete. In streaming read scene the old logic will send two delete (1,'aa'),(1,'bb') of key 1 but one insert (1,'cc') to down opeator, but new logic just send one delete and one insert.

hameizi · 2021-12-31T08:53:48Z

@openinx @rdblue can you help take a look?

rdblue · 2022-01-03T16:52:51Z

I'm assuming that the equality delete column in your example is the ID. Is that correct?

If so, then there is still some value in writing the delete because you get the previous columns written into the file. That allows reconstructing a CDC stream with the delete.

hameizi · 2022-01-04T02:05:53Z

I'm assuming that the equality delete column in your example is the ID. Is that correct?

If so, then there is still some value in writing the delete because you get the previous columns written into the file. That allows reconstructing a CDC stream with the delete.

@rdblue But when we reconstructing a CDC stream with the delete we can't guarantee the order of delete data and append data. So we just can guarantee eventual Consistency. This PR can also guarantee eventual Consistency.

rdblue · 2022-01-24T00:46:16Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+      PathOffset previous = insertedRowMap.get(key);

-      eqDeleteWriter.write(row);
+      if (previous != null){


Style: there should be a space between ) and {.

rdblue · 2022-01-24T00:47:53Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

-        // TODO attach the previous row if has a positional-delete row schema in appender factory.
-        posDeleteWriter.delete(previous.path, previous.rowOffset, null);
-      }
+      insertedRowMap.put(copiedKey, pathOffset);


Why is this changing as well? I think that the logic here was intended to implement upsert just in case there are duplicate inserts without a delete. @openinx, can you take a look at this?

@hameizi can you reply to this comment?

@rdblue yes,as you said that the logic here was intended to implement upsert just in case there are duplicate inserts without a delete. This logic can be retain, just i think duplicate inserts should be avoid by upstream semantic but not writer. If need i will rollback this change.

@rdblue After test i think this is necessary, and i think old logic is error. Because old logic only write pos-delete in write function what should happen in delete function. And below test case is also puzzle:

iceberg/data/src/test/java/org/apache/iceberg/io/TestTaskEqualityDeltaWriter.java

Lines 324 to 348 in 9b6b5e0

public void testUpsertDataWithFullRowSchema() throws IOException {

List<Integer> eqDeleteFieldIds = Lists.newArrayList(dataFieldId);

Schema eqDeleteRowSchema = table.schema();

GenericTaskDeltaWriter deltaWriter = createTaskWriter(eqDeleteFieldIds, eqDeleteRowSchema);

deltaWriter.write(createRecord(1, "aaa"));

deltaWriter.write(createRecord(2, "bbb"));

deltaWriter.write(createRecord(3, "aaa"));

deltaWriter.write(createRecord(3, "ccc"));

deltaWriter.write(createRecord(4, "ccc"));

// Commit the 1th transaction.

WriteResult result = deltaWriter.complete();

Assert.assertEquals("Should have a data file", 1, result.dataFiles().length);

Assert.assertEquals("Should have a pos-delete file for deduplication purpose", 1, result.deleteFiles().length);

Assert.assertEquals("Should be pos-delete file", FileContent.POSITION_DELETES, result.deleteFiles()[0].content());

Assert.assertEquals(1, result.referencedDataFiles().length);

commitTransaction(result);

Assert.assertEquals("Should have expected records", expectedRowSet(ImmutableList.of(

createRecord(2, "bbb"),

createRecord(3, "aaa"),

createRecord(4, "ccc")

)), actualRowSet("*"));

this test write duplicate key return just one line but user should avoid write duplicate key depend on config write.upsert.enable #2863 to execute delete semantic but not write sementic(this is error sementic) when there is duplicate inserts.
So we just need record the postion of key in insertedRowMap when execute write function then write pos-delete file when execute delete function.

Let's remove this change.

The logic here is to opportunistically catch duplicates when there are only inserts. This is not intended to replace the real upsert logic, which requires calling delete as you noted. Instead, it is here because we're updating the insertedRowMap and may get a previous insert location. When that happens, the right thing to do is to delete the duplicate row.

I also just realized that the changes below are incorrect. Instead of calling internalPosDelete(key), this checks the insertedRowMap itself using get. The logic in internalPosDeleteusedremove` so that the entry was removed before the insert occurred and we have a second opportunistic check.

To fix this, you should instead update internalPosDelete to return true if a row was deleted and false otherwise. Then you can update your previous != null check like this:

public void delete(T row) throws IOException { if (!internalPosDelete(structProjection.wrap(asStructLike(row)))) { eqDeleteWriter.write(row); } }

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

rdblue · 2022-01-24T00:49:06Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+      PathOffset previous = insertedRowMap.get(structLikeKey);

-      eqDeleteWriter.write(key);
+      if (previous != null){


Style is incorrect here as well.

hameizi · 2022-01-24T02:47:11Z

@rdblue code style fix in new commit.

hameizi · 2022-02-09T02:56:32Z

@rdblue Is there any more question of this PR?

rdblue · 2022-02-09T22:47:59Z

@hameizi, I'm mainly just waiting for a reply to the comment above. I wanted more information about that change. Thanks!

rdblue · 2022-02-13T21:14:59Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

-                           OutputFileFactory fileFactory, FileIO io, long targetFileSize) {
+  protected BaseTaskWriter(
+      PartitionSpec spec, FileFormat format, FileAppenderFactory<T> appenderFactory,
+      OutputFileFactory fileFactory, FileIO io, long targetFileSize) {


These lines don't need to change. Can you revert this?

hameizi · 2022-02-14T05:52:15Z

@rdblue fix all. Can you help take a look?

rdblue · 2022-02-14T18:56:47Z

flink/v1.13/flink/src/test/java/org/apache/iceberg/flink/sink/TestDeltaTaskWriter.java

-        Sets.newHashSet(result.deleteFiles()[0].content(), result.deleteFiles()[1].content()));
+    Assert.assertEquals(1, result.deleteFiles().length);
+    Assert.assertEquals(Sets.newHashSet(FileContent.POSITION_DELETES),
+            Sets.newHashSet(result.deleteFiles()[0].content()));


Unnecessary whitespace change, and this is not the correct indentation.

rdblue · 2022-02-14T18:58:10Z

Thanks, @hameizi! I merged this since there was only a style problem left.

Xiangakun · 2022-06-14T06:16:12Z

The comment in TestDeltaTaskWriter.testCdcEvents should also be updated since it may confuse the user.

Reference:(cherry picked from commit 2247e77)

github-actions bot added the core label Dec 31, 2021

rdblue reviewed Jan 24, 2022

View reviewed changes

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java Show resolved Hide resolved

rdblue reviewed Jan 24, 2022

View reviewed changes

github-actions bot added data flink labels Feb 10, 2022

hameizi force-pushed the change-delete-logic branch from 0a48c7f to aa179de Compare February 10, 2022 06:23

rdblue reviewed Feb 13, 2022

View reviewed changes

change delete logic

8ad553d

hameizi force-pushed the change-delete-logic branch from aa179de to 8ad553d Compare February 14, 2022 02:21

hameizi added 2 commits February 14, 2022 11:33

fix test

4669132

fix test

99ac296

hameizi requested a review from rdblue February 14, 2022 05:56

rdblue reviewed Feb 14, 2022

View reviewed changes

rdblue approved these changes Feb 14, 2022

View reviewed changes

rdblue merged commit 2247e77 into apache:master Feb 14, 2022

hililiwei mentioned this pull request Mar 18, 2022

Flink - Fix incorrect row being written for delta files when using upsert mode kbendick/iceberg#71

Closed

nastra pushed a commit to nastra/iceberg that referenced this pull request May 16, 2022

Flink: Avoid writing duplicate equality deletes (apache#3834)

75bf69c

nastra mentioned this pull request May 16, 2022

[0.13] Flink upsert delete file metadata backports #4786

Merged

nastra pushed a commit to nastra/iceberg that referenced this pull request May 17, 2022

Flink: Avoid writing duplicate equality deletes (apache#3834)

ba470de

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink: Avoid writing duplicate equality deletes (apache#3834)

d44ef99

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink: Avoid writing duplicate equality deletes (apache#3834)

c97fd88

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink: Avoid writing duplicate equality deletes (apache#3834)

6dcaa9e

hililiwei pushed a commit to hililiwei/iceberg that referenced this pull request Jun 29, 2022

[Backport]Flink: Avoid writing duplicate equality deletes (apache#3834)

379038f

Reference:(cherry picked from commit 2247e77)

	public void testUpsertDataWithFullRowSchema() throws IOException {
	List<Integer> eqDeleteFieldIds = Lists.newArrayList(dataFieldId);
	Schema eqDeleteRowSchema = table.schema();

	GenericTaskDeltaWriter deltaWriter = createTaskWriter(eqDeleteFieldIds, eqDeleteRowSchema);
	deltaWriter.write(createRecord(1, "aaa"));
	deltaWriter.write(createRecord(2, "bbb"));
	deltaWriter.write(createRecord(3, "aaa"));
	deltaWriter.write(createRecord(3, "ccc"));
	deltaWriter.write(createRecord(4, "ccc"));

	// Commit the 1th transaction.
	WriteResult result = deltaWriter.complete();
	Assert.assertEquals("Should have a data file", 1, result.dataFiles().length);
	Assert.assertEquals("Should have a pos-delete file for deduplication purpose", 1, result.deleteFiles().length);
	Assert.assertEquals("Should be pos-delete file", FileContent.POSITION_DELETES, result.deleteFiles()[0].content());
	Assert.assertEquals(1, result.referencedDataFiles().length);
	commitTransaction(result);

	Assert.assertEquals("Should have expected records", expectedRowSet(ImmutableList.of(
	createRecord(2, "bbb"),
	createRecord(3, "aaa"),
	createRecord(4, "ccc")
	)), actualRowSet("*"));

change delete logic #3834

change delete logic #3834

Uh oh!

Conversation

hameizi commented Dec 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hameizi commented Dec 31, 2021

Uh oh!

rdblue commented Jan 3, 2022

Uh oh!

hameizi commented Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi Feb 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi commented Jan 24, 2022

Uh oh!

hameizi commented Feb 9, 2022

Uh oh!

rdblue commented Feb 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Feb 14, 2022

Uh oh!

Xiangakun commented Jun 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hameizi commented Dec 31, 2021 •

edited

Loading

hameizi commented Jan 4, 2022 •

edited

Loading

hameizi Feb 10, 2022 •

edited

Loading

hameizi commented Feb 14, 2022 •

edited

Loading