Skip to content

Data, Spark, Flink: Migrate TestAppenderFactory and subclasses to JUnit5#9862

Merged
nastra merged 1 commit intoapache:mainfrom
nk1506:junit5
Mar 11, 2024
Merged

Data, Spark, Flink: Migrate TestAppenderFactory and subclasses to JUnit5#9862
nastra merged 1 commit intoapache:mainfrom
nk1506:junit5

Conversation

@nk1506
Copy link
Contributor

@nk1506 nk1506 commented Mar 4, 2024

Issue #9085

In this patch I planned to replace TestBase with TableTestBase for following base classes.

  1. TestBaseTaskWriter
  2. TestAppenderFactory
  3. TestTaskEqualityDeltaWriter

}

@Before
public TableTestBase() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there are so many subclasses of TableTestBase, we decided to introduce TestBase, which is a JUnit5 version of TableTestBase. That way, we don't have to adjust all subclasses in one go and can therefore split up the required changes into smaller chunks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding, We won't be able to get rid of TableTestBase . TestBase has it's own implementation of ParameterizedTestExtension. Many of the subclass of TableTestBase has different flavour of ParameterizedTestExtension. I will plan to use TestBase wherever it fits. I will also add a javadoc for both class which one should be used when. Meanwhile , Could you please take a highlevel look for the Temp directory or file initialization? Also I will remove all the classes on which @tomtongue was working.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nk1506 TestBase is what needs to be used and it uses the ParameterizedTestExtension that is compatible with JUnit5. The ParameterizedTestExtension used by TableTestBase and its subclasses only works with JUnit4. If you look for example at #9790, you'll see how parameterized JUnit4 tests need to be migrated to parameterized JUnit5 tests.

That being said, every subclass of TableTestBase that is migrated to JUnit5 needs to now extend TestBase and use org.apache.iceberg.ParameterizedTestExtension

Copy link
Contributor Author

@nk1506 nk1506 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra what will you recommend for class like TestAppenderFactory which has different set of parameters.

@Parameterized.Parameters(name = "FileFormat={0}, Partitioned={1}")
  public static Object[] parameters() {
    return new Object[][] {
      new Object[] {"avro", false},
      new Object[] {"avro", true},
      new Object[] {"orc", false},
      new Object[] {"orc", true},
      new Object[] {"parquet", false},
      new Object[] {"parquet", true}
    };
  }

If this class will extend TestBase instead of TableTestBase there will be a conflict on parameters definitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be handled similar to how it's done for TestFlinkMetaDataTable

package org.apache.iceberg;

import static org.apache.iceberg.types.Types.NestedField.required;
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong import


import static org.apache.iceberg.PartitionSpec.unpartitioned;
import static org.apache.iceberg.types.Types.NestedField.required;
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
import static org.assertj.core.api.Assertions.assertThat;

import static org.apache.iceberg.PartitionSpec.unpartitioned;
import static org.apache.iceberg.types.Types.NestedField.required;
import static org.assertj.core.api.AssertionsForClassTypes.assertThat;
import static org.assertj.core.api.AssertionsForClassTypes.assertThatThrownBy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import static org.assertj.core.api.AssertionsForClassTypes.assertThatThrownBy;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

import org.junit.jupiter.api.extension.ExtendWith;

@RunWith(Parameterized.class)
@ExtendWith(ParameterizedTestExtension.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to use ParameterizedTestExtension from the iceberg package. Same for Parameter / Parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am using the same, It might nor be showing with the import . since it's part of same package

@nk1506 nk1506 force-pushed the junit5 branch 3 times, most recently from 310ccdc to 35a0ddf Compare March 6, 2024 08:33
@nk1506 nk1506 marked this pull request as ready for review March 6, 2024 08:53
@nastra
Copy link
Contributor

nastra commented Mar 6, 2024

@nk1506 can you please split this PR up into smaller and more manageble pieces so that it's easier to review the changes? I would suggest to either focus on a subset of tests in iceberg-core or on iceberg-data, but having changes across multiple Gradle modules that aren't related makes it quite difficult to review this PR


List<FlinkInputSplit> splits = generateSplits();
Assert.assertEquals("Should have 10 splits", 10, splits.size());
assertThat(10).as("Should have 10 splits").isEqualTo(splits.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual/expected is the wrong way here. This should be assertThat(splits).hasSize(10).

@nk1506 nk1506 changed the title Core, Spark, Flink: Migrate tests that depend on TableTestBase to JUnit5 Data, Spark, Flink: Migrate tests that depend on TableTestBase to JUnit5 Mar 6, 2024
@nk1506
Copy link
Contributor Author

nk1506 commented Mar 7, 2024

I would suggest to either focus on a subset of tests in iceberg-core or on iceberg-data

@nastra , Most of the changes are coming here because of change in WriterTestBase and impacted subclass . Should I create another PR only with WriterTestBase and it's subclass changes?

@nastra
Copy link
Contributor

nastra commented Mar 7, 2024

@nk1506 if changes to WriterTestBase and its subclasses create so many changes, then yes, it's better to isolate this and handle it in a separate PR as otherwise it's difficult to review a PR with diffs to 40+ files

WriteResult result = writer.complete();
Assert.assertEquals(0, result.dataFiles().length);
Assert.assertEquals(0, result.deleteFiles().length);
assertThat(result.dataFiles().length).isEqualTo(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(result.dataFiles().length).isEqualTo(0);
assertThat(result.dataFiles()).isEmpty();

or assertThat(result.dataFiles()).hasSize(0);

please also update other places in this PR accordingly


for (Path path : files) {
Assert.assertFalse(Files.exists(path));
assertThat(Files.exists(path)).isFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertThat(Files.exists(path)).isFalse();
assertThat(path).doesNotExist();

new Object[] {"parquet", true}
};
}
@Parameter protected FileFormat format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have index = 1 and partitioned should have index = 2. The formatVersion is a parameterized field in TestBase with ìndex = 0, so it needs to be part of parameters()` (similar to how it's done in https://github.com/apache/iceberg/blob/main/data/src/test/java/org/apache/iceberg/io/TestGenericSortedPosDeleteWriter.java#L65-L70)

super(FORMAT_V2);
this.format = FileFormat.fromString(fileFormat);
this.partitioned = partitioned;
@Parameters(name = "FileFormat={0}, partitioned={1}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Parameters(name = "FileFormat={0}, partitioned={1}")
@Parameters(name = "FormatVersion={0}, FileFormat={1}, partitioned={2}")

this.format = FileFormat.fromString(fileFormat);
this.partitioned = partitioned;
@Parameters(name = "FileFormat={0}, partitioned={1}")
public static List<Object> parameters() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static List<Object> parameters() {
protected static List<Object> parameters() {

this.tableDir = Files.createTempDirectory(temp, "junit").toFile();
assertThat(tableDir.delete()).isTrue(); // created by table create

this.formatVersion = FORMAT_V2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatVersion is a parameterized field and needs to be set in parameters() instead of here

@nastra nastra changed the title Data, Spark, Flink: Migrate tests that depend on TableTestBase to JUnit5 Data, Spark, Flink: Migrate TestAppenderFactory and subclasses to JUnit5 Mar 8, 2024
this.tableDir = Files.createTempDirectory(temp, "junit").toFile();
assertThat(tableDir.delete()).isTrue(); // created by table create

this.formatVersion = FORMAT_V2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as I mentioned in the other class. This should be part of parameters()

super(FORMAT_V2);
this.format = FileFormat.fromString(fileFormat);
@Parameters(name = "FileFormat = {0}")
public static List<Object> parameters() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be protected

Assert.assertEquals(
ImmutableList.of(posRecord.copy("file_path", dataFile.path(), "pos", 0L)),
readRecordsAsList(posDeleteSchema, posDeleteFile.path()));
assertThat(FileContent.POSITION_DELETES).isEqualTo(posDeleteFile.content());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actual/expected is the wrong way here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants