Skip to content

Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style#9670

Merged
nastra merged 11 commits intoapache:mainfrom
tomtongue:mig-junit5-write-extensions
Feb 20, 2024
Merged

Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style#9670
nastra merged 11 commits intoapache:mainfrom
tomtongue:mig-junit5-write-extensions

Conversation

@tomtongue
Copy link
Contributor

@tomtongue tomtongue commented Feb 7, 2024

Migrate the following "Write" sub-classes in spark-extensions to JUnit 5 and AssertJ style along with #9613, and #9086.

Current progress

  • SparkRowLevelOperationsTestBase (extends ExtensionsTestBase)
  • TestDelete (extends SparkRowLevelOperationsTestBase)
    • TestCopyOnWriteDelete
    • TestMergeOnReadDelete
  • TestMerge (extends SparkRowLevelOperationsTestBase)
    • TestCopyOnWriteMerge
    • TestMergeOnReadMerge
  • TestUpdate (extends SparkRowLevelOperationsTestBase)
    • TestMergeOnReadUpdate
    • TestCopyOnWriteUpdate
  • TestConflictValidation
  • TestWriteAborts

@github-actions github-actions bot added the spark label Feb 7, 2024
@tomtongue tomtongue force-pushed the mig-junit5-write-extensions branch from 8b6abaf to e069cee Compare February 7, 2024 06:47
@tomtongue tomtongue changed the title [WIP] Migrate Write sub-classes in spark-extensions to JUnit5 and AsserJ style [WIP] Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style Feb 7, 2024
}

@Rule public TemporaryFolder temp = new TemporaryFolder();
@TempDir private Path temp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogTestBase already has temp, so this shouldn't be required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review. Will remove this in the next commit.

this.planningMode = planningMode;
}
@Parameter(index = 3)
protected String fileFormat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected String fileFormat;
protected FileFormat fileFormat;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let me do this.

@TestTemplate
public void testDeleteFileThenMetadataDelete() throws Exception {
Assume.assumeFalse("Avro does not support metadata delete", fileFormat.equals("avro"));
assumeThat(fileFormat.equalsIgnoreCase("avro"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once fileFormat is of type FileFormat, this check can be changed to assumeThat(fileFormat).as(..).isNotEqualTo(FileFormat.AVRO)

List<DataFile> dataFilesAfter = TestHelpers.dataFiles(table, branch);
Assert.assertTrue(
"Data file should have been removed", dataFilesBefore.size() > dataFilesAfter.size());
assertThat(dataFilesBefore.size() > dataFilesAfter.size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use .hasSizeGreaterThan() here

@TestTemplate
public void testDeleteFromEmptyTable() {
Assume.assumeFalse("Custom branch does not exist for empty table", "test".equals(branch));
assumeThat("test".equals(branch)).as("Custom branch does not exist for empty table").isFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assumeThat("test".equals(branch)).as("Custom branch does not exist for empty table").isFalse();
assumeThat(branch).as("Custom branch does not exist for empty table").isNotEqualTo("test");

@TestTemplate
public void testDeleteFromNonExistingCustomBranch() {
Assume.assumeTrue("Test only applicable to custom branch", "test".equals(branch));
assumeThat("test".equals(branch)).as("Test only applicable to custom branch").isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@TestTemplate
public void testDeleteWithMultipleRowGroupsParquet() throws NoSuchTableException {
Assume.assumeTrue(fileFormat.equalsIgnoreCase("parquet"));
assumeThat(fileFormat.equalsIgnoreCase("parquet")).isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assumeThat(fileFormat.equalsIgnoreCase("parquet")).isTrue();
assumeThat(fileFormat).isEqualTo(FileFormat.PARQUET);

@TestTemplate
public void testDeleteOnNonIcebergTableNotSupported() {
Assume.assumeTrue(catalogName.equalsIgnoreCase("spark_catalog"));
assumeThat(catalogName.equalsIgnoreCase("spark_catalog")).isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

public synchronized void testDeleteWithSerializableIsolation() throws InterruptedException {
// cannot run tests with concurrency for Hadoop tables without atomic renames
Assume.assumeFalse(catalogName.equalsIgnoreCase("testhadoop"));
assumeThat(catalogName.equalsIgnoreCase("testhadoop")).isFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also update all the other places in this PR

@TestTemplate
public void testDeleteToWapBranch() throws NoSuchTableException {
Assume.assumeTrue("WAP branch only works for table identifier without branch", branch == null);
assumeThat(branch == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assumeThat(branch == null)
assumeThat(branch).as(..).isNull();

@TestTemplate
public void testDeleteToWapBranchWithTableBranchIdentifier() throws NoSuchTableException {
Assume.assumeTrue("Test must have branch name part in table identifier", branch != null);
assumeThat(branch != null).as("Test must have branch name part in table identifier").isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use .isNotNull()

@TestTemplate
public void testUpdateEmptyTable() {
Assume.assumeFalse("Custom branch does not exist for empty table", "test".equals(branch));
assumeThat("test".equals(branch)).as("Custom branch does not exist for empty table").isFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previously mentioned. We want to avoid using isFalse / isTrue as much as possible, because it won't print enough context when the assertion/assumption ever fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you!

@tomtongue tomtongue force-pushed the mig-junit5-write-extensions branch from cd62c06 to d3460dc Compare February 13, 2024 12:01
@tomtongue
Copy link
Contributor Author

tomtongue commented Feb 13, 2024

Working on the fix of errors.
When I tested them on local, SnapshotIsolation tests in TestMerge/TestUpdate were sometimes successful, but sometimes failure.

for (int numOperations = 0; numOperations < 20; numOperations++) {
while (shouldAppend.get() && barrier.get() < numOperations * 2) {
sleep(10);
sleep(200);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being flaky and requires a bump in wait time? We might want to update this to use Awaitility in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! yes, the test seems to be flaky. After the updates, the test sometimes failed and sometimes succeeded. I was investigating the issue, but let me update the part with Awaitlity.


appendFiles.commit();
sleep(10);
sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to update this to Awaitility in a follow-up PR


appendFiles.commit();
sleep(10);
sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previously mentioned about using Awaitility here

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomtongue the changes LGTM, I just left a few questions. Did you plan to do anything else because the title still contains WIP?

@tomtongue
Copy link
Contributor Author

tomtongue commented Feb 15, 2024

Thanks for the review @nastra . No, there's one thing that just I update snapshotIsolation tests by adding Awaitility. I'm working on implementing the Awaitility to this PR. (If it should be included in another PR, please let me know).

@tomtongue tomtongue changed the title [WIP] Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style Feb 15, 2024
@github-actions github-actions bot added the build label Feb 18, 2024

appendFiles.commit();
sleep(10);
Awaitility.await().pollInterval(Duration.ofMillis(10)).until(() -> true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes a lot of sense to replace this sleep with Awaitility. I wonder whether this sleep here is even necessary

Copy link
Contributor Author

@tomtongue tomtongue Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me revert this to sleep. Currently the snapshotIsolation tests for TestDelete, TestMerge and TestUpdate are still flaky. From my current investigation, the test failure seems to be related to the appendFiles with other write operations. And the sleep seems to work for waiting for the appendFiles.commit().

Copy link
Contributor Author

@tomtongue tomtongue Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the reason of the test failure. The reason seems to be the conflicts of the appended file names. When I tested using a randomized name for each appendFile, all the tests succeeded (the previous version sets the randomized names). And in this case, as you said, sleep is no longer needed. Let me fix the file name configuration.

protected DataFile writeDataFile(Table table, List<GenericRecord> records) {
try {
OutputFile file = Files.localOutput(temp.newFile());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. Will remove it.

}
int currentNumOperations = numOperations;
Awaitility.await()
.pollInterval(Duration.ofMillis(10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these should have .atMost(5, TimeUnit.SECONDS) because we don't want to wait indefinitely for the condition to happen

Copy link
Contributor Author

@tomtongue tomtongue Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I will add atMost to all of the tests. Updated.

@tomtongue
Copy link
Contributor Author

@nastra Reflected on your comments. If there's any part that I should change, please let me know.

@nastra nastra merged commit c4d827e into apache:main Feb 20, 2024
@tomtongue
Copy link
Contributor Author

Thank you!

@nastra
Copy link
Contributor

nastra commented Feb 20, 2024

thanks for getting this done @tomtongue. Looking forward to other PRs from you for the remaining files in spark-extensions

@tomtongue
Copy link
Contributor Author

Of course, I'm working on migrating procedures and other tests. Will create 2 or 3 PRs later👍

@tomtongue tomtongue deleted the mig-junit5-write-extensions branch February 20, 2024 11:48
bitsondatadev pushed a commit to bitsondatadev/iceberg that referenced this pull request Mar 3, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants