Skip to content

Conversation

@chenjunjiedada
Copy link
Collaborator

When running unit tests from the previous sequence number PR, the unit test testSequenceNumberForMergeAppend failed due to the manifest entry sequence number is not correct. That is because we rewrite the manifest with unassigned sequence number. To fix that, we could always assign sequence number to the adding manifest from add(ManifestFile) API.

This also includes some tests for sequence number from the previous PR.

@chenjunjiedada chenjunjiedada force-pushed the add-sequence-number-for-manifest branch from e5cc6f0 to 3a49d62 Compare April 27, 2020 13:24
@rdblue
Copy link
Contributor

rdblue commented Apr 27, 2020

The logic from #588 wasn't correct because the sequence number is not yet determined when new manifests are written in a commit. That's why they must be inherited.

For example, if I have 2 concurrent writers appending data after seq 22, both will attempt to commit seq 23 and only one will succeed. The other one needs to retry to commit seq 24. If that writer had written seq 23 in its manifests, then those need to be rewritten. Inheriting avoids needing to rewrite manifests. Only the manifest list is rewritten, which needs to happen after a conflict anyway.

@chenjunjiedada
Copy link
Collaborator Author

chenjunjiedada commented Apr 28, 2020

I see, the problem is addressed in current logic and it can be verified by indexAndValidateSnapshots when committing. But I think the unit tests are still valid for current logic. Could you please take another look? The failed unit test is based on current logic.

@rdblue
Copy link
Contributor

rdblue commented Apr 28, 2020

When I run the tests locally, I see that they fail for the current master, but are fixed in the v2-manifests PR, #963. Looks like the test case is addressed in those updates. I think the tests look good, so let's add those. We can either find what fixed inheritance in the other PR and add it here, or wait for that one to be merged.

@chenjunjiedada
Copy link
Collaborator Author

I prefer to wait for PR to get merged since the unit tests may need to updated as well. I will take a look at the PR as well to find what is fixed.

Does the logic of precondition checking make sense to you? I think that is needed.

@rdblue
Copy link
Contributor

rdblue commented Apr 29, 2020

Yes, I think those precondition checks are good.

@chenjunjiedada
Copy link
Collaborator Author

chenjunjiedada commented May 6, 2020

In #963, the MergingSnapshotProducer#createManifest callswriter.add(entry) which has a wrapper always set the sequence number to null so that inheritance logic could work. While in the current master, the createManifest does not wrap the entry so that the sequence number is inherited from its manifest, which is -1, then the later inheritance logic is bypassed.

@rdblue, I add the fix here and remove unrelated changes, could you please have a look?

@chenjunjiedada chenjunjiedada force-pushed the add-sequence-number-for-manifest branch 2 times, most recently from 04622f3 to 95cd1f0 Compare May 6, 2020 09:43
@chenjunjiedada chenjunjiedada changed the title Always assign sequence number for adding manifest Add unit tests for sequence number May 12, 2020
table.newFastAppend().appendFile(FILE_A).commit();
Assert.assertEquals(1, table.currentSnapshot().sequenceNumber());
ManifestFile manifestFile = table.currentSnapshot().manifests().get(0);
Assert.assertEquals(1, manifestFile.sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions should be improved. Each assertion should have context, like "Current sequence number should match", and the length of all lists should be validated with an assertion to catch unexpected additions

Since this pattern is repeated, I'd recommend a helper method like validateManfiest and validateManifestEntries for the manifest list. And the entries can be validated using a new variation of validateManifestEntries that includes sequence numbers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I add two helper function, one is used to verify data files in one manifest , another is used to verify data files in the table.

}

@Test
public void testSequenceNumberForMergeAppend() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be two cases for this test. One with a default merge setting (no merging) and one with the min merge count changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Assert.assertEquals("the sequence number of manifest should be 3", 3, manifestFile.sequenceNumber());

for (ManifestEntry entry : ManifestFiles.read(manifestFile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted above, I think it would be cleaner to add sequence number of a variant of validateManifestEntries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Assert.assertEquals("the sequence number of snapshot should be 3",
3, table.currentSnapshot().sequenceNumber());

ManifestFile newManifest = table.currentSnapshot().manifests().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on ordering in the manifest list. Can you avoid assumptions like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will use snapshot id to filter.


@Test
public void testCommitConflict() {
Transaction txn = table.newTransaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't use transactions. There is no need for it in a simple conflict test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

table.newFastAppend().appendFile(FILE_B).commit();

AssertHelpers.assertThrows("Should failed due to conflict",
IllegalStateException.class, "last operation has not committed", txn::commitTransaction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some confusion about what this is doing. This IllegalStateException indicates that one of the operations in a transaction was not committed, not that there was a conflict committing to the table. If you want to validate a conflict, then you need to set the number of retries to 0 so that the CommitFailedException is thrown.

Most tests that validate a conflict call apply on the first change, then call commit on the conflicting change, and then call commit on the first change. Retry allows the commit to succeed, but the result of the commit will be different. In this case, you'd have a higher sequence number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

AssertHelpers.assertThrows("Should failed due to conflict",
IllegalStateException.class, "last operation has not committed", txn::commitTransaction);

Assert.assertEquals(1, TestTables.load(tableDir, "test").currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use table, not load.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


@Test
public void testConcurrentCommit() throws InterruptedException {
ExecutorService threadPool = Executors.newFixedThreadPool(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for a thread pool test here, as long as the single-threaded tests are written correctly. We are not trying to test conflict detection -- instead we want to test what happens when there is a conflict and we can simulate that with a specific order of calls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

AppendFiles appendFiles = table.newFastAppend().appendFile(FILE_C);
appendFiles.apply();
table.newFastAppend().appendFile(FILE_D).commit();
appendFiles.commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is more like what I would expect for this test.

for (ManifestEntry entry : ManifestFiles.read(manifestFile, table.io(),
table.ops().current().specsById()).entries()) {
if (entry.file().path().equals(FILE_C.path())) {
Assert.assertEquals(table.currentSnapshot().sequenceNumber(), entry.sequenceNumber().longValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When values are predictable, like sequence numbers, we usually want to test that they are specific values, not that they match some other variable. It would be better to test that FILE_C has sequence number 3.

This also needs to validate that FILE_D was committed first and has sequence number 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


table.manageSnapshots().rollbackTo(snapshotId).commit();

Assert.assertEquals(1, TestTables.load(tableDir, "test").currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to test that the table metadata's last sequence number is still 2. One way to do that is to validate that the next sequence number assigned is 3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

}

@Test
public void testMultipleTxnOperations() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really test much because all of the transactions do not conflict and only contain one operation. I think there should be a few test cases (separate methods):

  1. A transaction with 1 operation
  2. A transaction with 2 operations
  3. A transaction with an operation that doesn't change the sequence number (like expiration)
  4. A transaction with 1 operation that must retry due to conflict

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

table.currentSnapshot().sequenceNumber());

// pick the snapshot that's staged but not committed
Snapshot wapSnapshot = readMetadata().snapshots().get(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always save the current snapshot ID just after a commit and fetch that snapshot by ID instead of accessing the position in a list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can not use currentSnapshot() to get staged snapshot. Can we?

// WAP commit
table.newAppend()
.appendFile(FILE_B)
.set("wap.id", "123456789")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of a WAP ID? The commit is staged, so there is no reason to mix in WAP logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Snapshot wapSnapshot = readMetadata().snapshots().get(1);

Assert.assertEquals("the snapshot sequence number should be 2", 2,
wapSnapshot.sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for WAP, just use stagedSnapshot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

@Test
public void testSequenceNumberForCherryPicking() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to cherry picking, it would be great to have a test case for fast-forward, which would not change the sequence number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

table.manageSnapshots().cherrypick(wapSnapshot.snapshotId()).commit();

Assert.assertEquals("the snapshot sequence number should be 4",
4, table.currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also assert that the picked files also have the new sequence number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs to be updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The picked file is validated in the next line. Do you mean that?

validateDataFiles(files(FILE_A, FILE_B, FILE_C), seqs(1, 4, 3));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry I missed that.

@chenjunjiedada
Copy link
Collaborator Author

@rdblue, The latest commit includes changes from #1038, will rebase if that gets merged.

@chenjunjiedada
Copy link
Collaborator Author

Looks like no need to rebase. @rdblue, would you please help to take another look?

@rdblue
Copy link
Contributor

rdblue commented May 25, 2020

@chenjunjiedada, it looks like the changes from #1038 are included in the diff for this one. Could you rebase to remove them? This may merge cleanly because the changes are identical, but it is harder to review with changes from other PRs mixed in.

}

@Test
public void testFastAppend() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for #1038 was to cover more cases for each action. Instead of having one test case for FastAppend, we should add the necessary assertions to all FastAppend tests. That keeps the tests in one place and increases coverage of situations that validate sequence numbers.

Could you make sure that the cases that you're testing here are covered by assertions in the FastAppend tests so we don't need this case?

Ideally, I'd prefer to add assertions to all of the operation tests instead of this suite, but for now I think it would be reasonable to just update the cases in FastAppend and add the rest of this suite. We can replace test cases here with assertions in the right test suites over a few more PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case could be coved by testEmptyTableAppend and testEmptyTableAppendManifest. Let me remove this.

The idea to move these unit tests to each operation test respectively sounds reasonable to me. I will try to create some following PRs to clean this.

@chenjunjiedada chenjunjiedada force-pushed the add-sequence-number-for-manifest branch from 983bbcd to 97d3b3d Compare May 26, 2020 03:50
validateDataFiles(files(FILE_A, FILE_B), seqs(1, 2));
}

void validateDataFiles(Iterator<DataFile> files, Iterator<Long> expectedSeqs) {
Copy link
Contributor

@rdblue rdblue May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should use the validation methods from TableTestBase instead of adding its own. This method causes the test cases in this PR to be confusing because sometimes the sequence number is for a delete. Instead, the test cases need to be very specific about metadata.

All the metadata for every change should be validated:

  • Each operation that produces a sequence number should assert the number of changed manifests
  • Each changed manifest should have its entries validated, including file status, snapshot id, and sequence number
  • Each operation should validate the lastSequenceNumber in table metadata
  • Operations that do not produce a sequence number should validate that the sequence number did not change using the last sequence number in table metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will update accordingly.

CommitFailedException.class, "Injected failure", txn::commitTransaction);

Assert.assertEquals("Snapshot sequence number should match expected",
1, table.currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion validates that the table's current snapshot didn't change, not that the table did not assign as sequence number to the failed transaction. This should check TableMetadata.lastSequenceNumber instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@Test
public void testMultipleOperationsTransaction() {
Transaction txn = table.newTransaction();
txn.newAppend().appendFile(FILE_A).commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence number of FILE_A needs to be validated, along with the rest of the metadata for this commit. Transactions will still produce a snapshot for every operation, it will just swap the existing state for the final state.

txn.commitTransaction();

Assert.assertEquals("Snapshot sequence number should match expected",
2, table.currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to validate the table's last sequence number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

txn1.commitTransaction();
txn2.commitTransaction();
txn3.commitTransaction();
txn4.commitTransaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table state resulting from each transaction needs to be validated, not just the final state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


table.newFastAppend().appendFile(FILE_C).commit();
Assert.assertEquals("Snapshot sequence number should match expected",
3, table.currentSnapshot().sequenceNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a good test. But you do need to add validations for the rest of the metadata.

@chenjunjiedada
Copy link
Collaborator Author

Interesting, it didn't trigger the CI.

@chenjunjiedada chenjunjiedada force-pushed the add-sequence-number-for-manifest branch from cafa905 to 798d95d Compare June 1, 2020 03:42
@chenjunjiedada
Copy link
Collaborator Author

@rdblue, I 'm going to complement the unit tests in some following PRs for sequence number as you mentioned in sync-up meeting minutes. Do you want the assertions to follow this as well?

All the metadata for every change should be validated:
Each operation that produces a sequence number should assert the number of changed manifests
Each changed manifest should have its entries validated, including file status, snapshot id, and sequence number
Each operation should validate the lastSequenceNumber in table metadata
Operations that do not produce a sequence number should validate that the sequence number did not change using the last sequence number in table metadata.

Are assertions in the latest change of this commit too much?

@rdblue
Copy link
Contributor

rdblue commented Jun 2, 2020

@chenjunjiedada, the tests are looking good. I wouldn't say that there are too many asserts, although I don't think you need to validate both the files in the snapshot and each manifest separately.

The main thing is to make sure any assumption you're making is validated. For example, if you validate the contents of snapshot.manifests().get(0) then you also need to check that there is only one manifest, or that you're checking the manifest you expect.

Also, many of these tests need to check other conditions in addition to the sequence numbers -- that's why it's important to add sequence numbers to the existing tests instead of adding a few new tests. For example, the tests for MergeAppend need to make assertions about what manifests were merged and the contents of each manifest, in addition to the sequence numbers of the manifests and every entry in the manifest (because of the merge).

I'm going to go ahead and merge this into master since it looks like there are a lot of checks. Then we can remove these tests when the operation test suites are updated to cover the same cases.

Thanks for working on this, it's a big help to have someone looking into this area and adding tests for sequence numbers.

@rdblue rdblue merged commit e1399fd into apache:master Jun 2, 2020
@chenjunjiedada
Copy link
Collaborator Author

@rdblue , Thanks for the guidance! Will follow them in the following PRs.

@chenjunjiedada chenjunjiedada deleted the add-sequence-number-for-manifest branch June 4, 2020 12:06
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants