Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR moves our snapshot and migrate actions to use the new API.

}

@Override
public SnapshotTable tableLocation(String location) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will have to be refined separately.

public CreateAction withProperties(Map<String, String> properties) {
this.additionalProperties.putAll(properties);
return this;
protected void setDestCatalogAndIdent(CatalogPlugin catalog, Identifier ident) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called from as.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We let you use "as" with Migration? Maybe I'm forgetting how this worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was needed before since Snapshot and Migrate descended from CreateTable Actions, now I don't think you need dest catalog at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the parent class where we need to reference the dest catalog. However, I think your idea makes sense. We can probably remove the dest fields from the parent class as they are initialized differently now. Then we don't have to do that weird if statement in the constructor.

I've pushed an update. Let me know if you prefer the old approach.

@aokolnychyi
Copy link
Contributor Author

cc @RussellSpitzer


@Override
public long importedDataFilesCount() {
return importedDataFilesCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this pr, but now that I think about it we should probably also let the user know how many metadata files were created as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Now that we have the result interface, we can evolve it. Could you create an issue for this, @RussellSpitzer?

@flyrain, would you be interested to pick it up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aokolnychyi
Copy link
Contributor Author

I see more and more Tez-related failures on recent PRs.

org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithEngine > testCBOWithSelectedColumnsOverlapJoin[fileFormat=AVRO, engine=tez, catalog=HIVE_CATALOG] FAILED
    java.lang.IllegalArgumentException: Failed to execute Hive query 'SELECT c.first_name, o.order_id FROM default.orders o JOIN default.customers c ON o.customer_id = c.customer_id ORDER BY o.order_id DESC': Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
        at org.apache.iceberg.mr.hive.TestHiveShell.executeStatement(TestHiveShell.java:151)
        at org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithEngine.testCBOWithSelectedColumnsOverlapJoin(TestHiveIcebergStorageHandlerWithEngine.java:217)

cc @pvary @marton-bod

@pvary
Copy link
Contributor

pvary commented Apr 8, 2021

I see more and more Tez-related failures on recent PRs.

Could you please provide a link for any of the failed test runs? I would like to get the logs for the failed tests.

Thanks,
Peter

@aokolnychyi
Copy link
Contributor Author

@pvary
Copy link
Contributor

pvary commented Apr 9, 2021

@pvary, here is one run

https://github.com/apache/iceberg/pull/2362/checks?check_run_id=2290454777

I see this at the end of the logs:

2021-04-08T01:04:25.1453911Z ##[error]The operation was canceled.

Was it done manually, or it was cancelled automatically because of the failures?

I am asking because some time ago (in #1789) I have introduced a feature to collect the test logs in exactly the same situations that happen here (flaky test failures on CI). This created a test logs artifact for the failed runs. This log is not available for this run.

If the run was cancelled automatically then I have to check what changed around the build process. OTOH if the run was cancelled manually then I need to find a non-cancelled run with the same failures.

Thanks,
Peter

@aokolnychyi
Copy link
Contributor Author

Hmm, I think you analysis is right, @pvary. It looks like we are hitting timeouts, though. The latest CI job on this PR, for example, failed with the timeout of 360 minutes.

@aokolnychyi
Copy link
Contributor Author

I'll try to see what causes the timeout tomorrow.

@aokolnychyi
Copy link
Contributor Author

Looks like the job I pointed to also failed with a timeout.

@pvary
Copy link
Contributor

pvary commented Apr 9, 2021

Looks like the job I pointed to also failed with a timeout.

Maybe this is related to resource problems when trying to create the Tez session?

When we are creating a new TezAM we ask a new container from YARN. Maybe the issue is that we do not get the new Yarn container (because of the missing resources) and we waiting until the timeout is reached. It might be a good idea to add @Timeout for the tests

@aokolnychyi aokolnychyi force-pushed the v2-snapshot-table-action branch from 54ec4bd to 90aacfb Compare April 9, 2021 20:14
@aokolnychyi
Copy link
Contributor Author

Rebased this one. @RussellSpitzer, could you take a look?


Spark3CreateAction(SparkSession spark, CatalogPlugin sourceCatalog, Identifier sourceTableIdent,
CatalogPlugin destCatalog, Identifier destTableIdent) {
BaseTableMigrationSparkAction(SparkSession spark, CatalogPlugin sourceCatalog, Identifier sourceTableIdent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked having this be a non "migrate - snapshot" name here because I didn't want any confusion if there was an error during "snapshots" that made it look like it was doing a migrate because of the trace.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me, My only real issue is I think the namings are a little confusing,

We have
BaseTableMigrationSparkAction
| - BaseMigrateTableSparkAction
| - BaseSparkTableSnapshot Action

I think the base class here should be "Create" or something more visually distinct because currently we have TableMigration and MigrateTable, and as I noted I think we should keep our Snapshot traces clear of any references to migrating.

That said I wouldn't hold this back based on naming, so feel free to ignore.

I still think BaseTableCreationSparkAction is fine :)

@aokolnychyi
Copy link
Contributor Author

Agree, @RussellSpitzer. Let me fix it.

@aokolnychyi aokolnychyi merged commit 6eaabb5 into apache:master Apr 13, 2021
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @RussellSpitzer!

stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants