Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Positional deletes creates partitioned path on unpartitioned tables #7685

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented May 22, 2023

I was doing some work on the Python side:

#6775

But ran into an issue when creating some integration tests for testing the positional deletes. I ended up with double slashes:

s3://warehouse/default/test_positional_mor_deletes/data//00000-32-70be11f7-3c4b-40e0-b35a-334e97ef6554-00001-deletes.parquet

It looks like the Struct is not-null, but the table is not partitioned, therefore it tries to create a partitioned path, but with the empty struct we'll end up with a double slash // that Minio doesn't like.

Outputfactory.java

  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) { // partition is a StructCopy
    String newDataLocation = locations.newDataLocation(spec, partition, generateFilename());
    OutputFile rawOutputFile = io.newOutputFile(newDataLocation);
    return encryptionManager.encrypt(rawOutputFile);
  }

ClusteredWriter.java

      // copy the partition key as the key object may be reused
      this.currentPartition = StructCopy.copy(partition);  // partition is a StructProjection
      this.currentWriter = newWriter(currentSpec, currentPartition);

I still have to dig into why there is a StructProjection.

Resolves #7678

Can be reproduced by running the following:

CREATE OR REPLACE TABLE default.test_positional_mor_deletes (
    number integer,
    letter string
)
USING iceberg
TBLPROPERTIES (
    'write.delete.mode'='merge-on-read',
    'write.update.mode'='merge-on-read',
    'write.merge.mode'='merge-on-read',
    'format-version'='2'
);


INSERT INTO default.test_positional_mor_deletes
VALUES
    (1, 'a'),
    (2, 'b'),
    (3, 'c'),
    (4, 'd'),
    (5, 'e'),
    (6, 'f'),
    (7, 'g'),
    (8, 'h'),
    (9, 'i'),
    (10, 'j'),
    (11, 'k'),
    (12, 'l');


DELETE FROM default.test_positional_mor_deletes WHERE number % 2 = 0;

@github-actions github-actions bot added the core label May 22, 2023
@Fokko
Copy link
Contributor Author

Fokko commented May 22, 2023

Full stack trace:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Time taken: 6.282 seconds
23/05/22 18:05:13 WARN OptimizeMetadataOnlyDeleteFromIcebergTable: Cannot translate expression to source filter: ((number#8 % 2) = 0)
23/05/22 18:05:14 ERROR Utils: Aborting task
software.amazon.awssdk.services.s3.model.S3Exception: Object name contains unsupported characters. (Service: S3, Status Code: 400, Request ID: 176199AF6FDBE35D, Extended Request ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1204)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:76)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
	at org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:119)
	at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:111)
	at org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
	at org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
23/05/22 18:05:14 ERROR DeltaWithMetadataWritingSparkTask: Aborting commit for partition 191 (task 11, attempt 0, stage 3.0)
23/05/22 18:05:14 WARN S3FileIO: Failed to delete object at path s3://warehouse/default/test_positional_mor_deletes/data//00191-11-ddb8bffc-85e8-4ace-b7f0-ec261db6f663-00001-deletes.parquet
23/05/22 18:05:14 WARN SparkCleanupUtil: Deleted only 0 of 1 file(s) using bulk deletes (partition 191 (task 11, attempt 0, stage 3.0))
23/05/22 18:05:14 ERROR DeltaWithMetadataWritingSparkTask: Aborted commit for partition 191 (task 11, attempt 0, stage 3.0)
23/05/22 18:05:14 ERROR Executor: Exception in task 191.0 in stage 3.0 (TID 11)
software.amazon.awssdk.services.s3.model.S3Exception: Object name contains unsupported characters. (Service: S3, Status Code: 400, Request ID: 176199AF6FDBE35D, Extended Request ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1204)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:76)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
	at org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:119)
	at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:111)
	at org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
	at org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
23/05/22 18:05:14 WARN TaskSetManager: Lost task 191.0 in stage 3.0 (TID 11) (10.201.34.222 executor driver): software.amazon.awssdk.services.s3.model.S3Exception: Object name contains unsupported characters. (Service: S3, Status Code: 400, Request ID: 176199AF6FDBE35D, Extended Request ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1204)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:76)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
	at org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:119)
	at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:111)
	at org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
	at org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

23/05/22 18:05:14 ERROR TaskSetManager: Task 191 in stage 3.0 failed 1 times; aborting job
23/05/22 18:05:14 ERROR WriteDeltaExec: Data source write support org.apache.iceberg.spark.source.SparkPositionDeltaWrite$PositionDeltaBatchWrite@425b142e is aborting.
23/05/22 18:05:14 ERROR WriteDeltaExec: Data source write support org.apache.iceberg.spark.source.SparkPositionDeltaWrite$PositionDeltaBatchWrite@425b142e aborted.
org.apache.spark.SparkException: Writing job aborted
	at org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:767)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.writeWithV2(WriteDeltaExec.scala:129)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.writeWithV2$(WriteDeltaExec.scala:72)
	at org.apache.spark.sql.execution.datasources.v2.WriteDeltaExec.writeWithV2(WriteDeltaExec.scala:50)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:332)
	at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:331)
	at org.apache.spark.sql.execution.datasources.v2.WriteDeltaExec.run(WriteDeltaExec.scala:50)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:212)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 191 in stage 3.0 failed 1 times, most recent failure: Lost task 191.0 in stage 3.0 (TID 11) (10.201.34.222 executor driver): software.amazon.awssdk.services.s3.model.S3Exception: Object name contains unsupported characters. (Service: S3, Status Code: 400, Request ID: 176199AF6FDBE35D, Extended Request ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1204)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:76)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
	at org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:119)
	at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:111)
	at org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
	at org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.writeWithV2(WriteDeltaExec.scala:98)
	... 64 more
Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Object name contains unsupported characters. (Service: S3, Status Code: 400, Request ID: 176199AF6FDBE35D, Extended Request ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
	at org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1204)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:76)
	at org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
	at org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
	at org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
	at org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:119)
	at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:111)
	at org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
	at org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
	at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
	at org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
	at org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

@Fokko Fokko requested a review from aokolnychyi May 22, 2023 23:55
@github-actions github-actions bot added the spark label May 22, 2023
@@ -435,7 +443,10 @@ public void delete(InternalRow metadata, InternalRow id) throws IOException {

InternalRow partition = metadata.getStruct(partitionOrdinal, partitionRowWrapper.size());
StructProjection partitionProjection = partitionProjections.get(specId);
partitionProjection.wrap(partitionRowWrapper.wrap(partition));

if (partitionProjection != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a separate interface for partition projections? That way, we can avoid an extra if branch in the code that gets invoked in a tight loop. We would also handle other places. This PR updates DELETE commands but we have a similar problem in UPDATE and MERGE as those can write deletes against an upartitioned spec.

interface PartitionProjection implements StructLike {

  PartitionProjection create(PartitionSpec spec, StructType commonPartitionType) {
    ...
  }

  PartitionProjection wrap(StructLike newStruct);

  private static class UnpartitionedSpecProjection implements PartitionProjection {
    // something always returns null
  }

  private static class PartitionedSpecProjection implements PartitionProjection {
    // something that delegates to StructProjection and uses spec.partitionType()
  }
}

The naming above is totally random.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought @stevenzwu already made the change that projections of null just return null, or did I remember that wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RussellSpitzer for jumping in. It looks like it is related to nestesd structs: #7507

I was able to reproduce this on master. @aokolnychyi I've updated the PR, let me know what you think

Copy link
Contributor

@stevenzwu stevenzwu May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer PR #7507 limited its scope to only nested null struct. Originally, I was also thinking about failing the wrap call if the root struct is null to avoid the if-null check on the root struct object. Ryan made a valid point that get methods can be called before wrap is ever called. Hence we still need the null check. Also Ryan is thinking maybe null root struct value is possible in a collection. Hence we kept the old behavior of StructProjection wrapping a null root value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @stevenzwu. I think this is different since it affects the partitions. cc @rdblue

@Fokko Fokko changed the title Spark: Positional deletes creates unpartitioned path Spark: Positional deletes creates partitioned path on unpartitioned tables May 23, 2023

@Override
public PartitionProjection wrap(StructLike newStruct) {
return null;
Copy link
Contributor

@stevenzwu stevenzwu May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced through the code. this seems to be the relevant fix.

It tries to influence the OutputFileFactory, which then calls LocationProvider interface.

  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) {
    String newDataLocation = locations.newDataLocation(spec, partition, generateFilename());
    OutputFile rawOutputFile = io.newOutputFile(newDataLocation);
    return encryptionManager.encrypt(rawOutputFile);
  }

Location provider impl class (e.g. default) would create the empty string for unpartitioned table.

    @Override
    public String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename) {
      return String.format("%s/%s/%s", dataLocation, spec.partitionToPath(partitionData), filename);
    }

I am wondering if the fix should be done in OutputFileFactory. If it is unpartitioned spec, it should call the other newDataLocation method without partition from LocationProvider interface.

  /**
   * Return a fully-qualified data file location for the given filename.
   *
   * @param filename a file name
   * @return a fully-qualified location URI for a data file
   */
  String newDataLocation(String filename);

  /**
   * Return a fully-qualified data file location for the given partition and filename.
   *
   * @param spec a partition spec
   * @param partitionData a tuple of partition data for data in the file, matching the given spec
   * @param filename a file name
   * @return a fully-qualified location URI for a data file
   */
  String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I am saying the 2nd newOutputFile from OutputFileFactory can behave the same as the first method if the partition object is unpartitioned.

  /** Generates an {@link EncryptedOutputFile} for unpartitioned writes. */
  public EncryptedOutputFile newOutputFile() {
    OutputFile file = io.newOutputFile(locations.newDataLocation(generateFilename()));
    return encryptionManager.encrypt(file);
  }

  /** Generates an {@link EncryptedOutputFile} for partitioned writes in a given spec. */
  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) {
    String newDataLocation = locations.newDataLocation(spec, partition, generateFilename());
    OutputFile rawOutputFile = io.newOutputFile(newDataLocation);
    return encryptionManager.encrypt(rawOutputFile);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenzwu that's similar to what I had before: ca63c4b

I think we expect the struct to be null when there is not partition. It looks like this is expected in many parts of the code. I'm fine either way, but I'd love to get the opinion of @rdblue or @aokolnychyi

Copy link
Contributor

@stevenzwu stevenzwu May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the context. I missed the earlier part of the discussion.

what about the one-liner fix in the first commit? does it cause failure in some other code paths?
880cb49

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other fix also fixed it for the positional deletes. But as Anton mentioned, the fix that we have right now also covers the UPDATE and MERGE path.

Copy link
Contributor

@aokolnychyi aokolnychyi May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if passing an empty struct instead of null will have any impact in other places. For instance, DataFiles and FileMetadata builders explicitly nullify partitions if the spec is unpartitioned, so even we pass an empty struct, it will use null if the spec is unpartitioned (which is good). Let me check other places. If there is no issue, maybe, we can go ahead and fix the output factory/writer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want always null, I think adding PartitionProjection is reasonable to avoid if/else in tight loops and to reuse this logic in other places. That said, we know our file builders will always set null for unpartitioned tables. We only need to fix our writers.

What if we fix RollingFileWriter? It currently checks partition == null.

private EncryptedOutputFile newFile() {
  if (partition == null) {
    return fileFactory.newOutputFile();
  } else {
    return fileFactory.newOutputFile(spec, partition);
  }
}

We could replace it with spec.isUnpartitioned().

I also don't mind adding an extra branch to OutputFileFactory to be safe but I guess that's optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing the writers seems to be a smaller change, so maybe we can start with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aokolnychyi That works for me. Let me update the PR

@aokolnychyi
Copy link
Contributor

Let me catch up today. Sorry for the delay.

I was doing some work on the Python side:

apache#6775

But ran into an issue when creating some integration tests
for testing the positional deletes. I ended up with double
slashes:

s3://warehouse/default/test_positional_mor_deletes/data//00000-32-70be11f7-3c4b-40e0-b35a-334e97ef6554-00001-deletes.parquet

It looks like the Struct is not-null, but the partition
not partitioned, therefore it creates a partitioned path,
but with the empty struct we'll end up with a double slash `//`
that Minio doesn't like.

Outputfactory.java
```java
  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) { // partition is a StructCopy
    String newDataLocation = locations.newDataLocation(spec, partition, generateFilename());
    OutputFile rawOutputFile = io.newOutputFile(newDataLocation);
    return encryptionManager.encrypt(rawOutputFile);
  }
```

ClusteredWriter.java
```java
      // copy the partition key as the key object may be reused
      this.currentPartition = StructCopy.copy(partition);  // partition is a StructProjection
      this.currentWriter = newWriter(currentSpec, currentPartition);
```

Resolves apache#7678
@@ -109,7 +109,7 @@ protected void openCurrentWriter() {
}

private EncryptedOutputFile newFile() {
if (partition == null) {
if (spec.isUnpartitioned() || partition == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check needs to be added to other writers, like FanoutWriter, ClusteredWriter. They all have the same pattern.

    return partition == null
        ? fileFactory.newOutputFile()
        : fileFactory.newOutputFile(spec, partition);
  }

Originally I was wondering if this check should be done in the method below of OutputFileFactory class. But I see it is not appropriate, as the Java doc clearly indicates that this method should be used for partitioned writes. Hence caller needs to separate out unpartitioned vs partitioned writes.

  /** Generates an {@link EncryptedOutputFile} for partitioned writes in a given spec. */
  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with adding those as well. WDYT @aokolnychyi

@Fokko Fokko marked this pull request as draft June 5, 2023 14:58
@Fokko Fokko marked this pull request as ready for review June 5, 2023 14:58
@@ -137,7 +137,7 @@ protected EncryptedOutputFile newOutputFile(
Preconditions.checkArgument(
spec.isUnpartitioned() || partition != null,
"Partition must not be null when creating output file for partitioned spec");
return partition == null
return spec.isUnpartitioned() || partition == null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not caused by this PR. but since we are touching the code, should we use the same style of if-else.

I am fine with either if-else or the ternary operator. I know @aokolnychyi prefers if-else for multi-line statements based on a previous review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @stevenzwu. Consistency is king 👍🏻

@Fokko Fokko requested a review from aokolnychyi June 5, 2023 22:11
@aokolnychyi
Copy link
Contributor

Let me take another look in a moment.

@aokolnychyi aokolnychyi merged commit 18c343b into apache:master Jun 6, 2023
@Fokko Fokko deleted the fd-fix-bug branch June 6, 2023 22:13
@aokolnychyi
Copy link
Contributor

aokolnychyi commented Jun 6, 2023

Thanks, @Fokko! Thanks for reviewing, @stevenzwu!

Could we also follow up to remove newOutputFile methods in ClusteredWriter and FanoutWriter? I don't think those are used anymore.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 6, 2023

Thanks for the review @aokolnychyi and @stevenzwu!

Could we also follow up to remove newOutputFile methods in ClusteredWriter and FanoutWriter? I don't think those are used anymore.

These are public, I think we have to deprecate them first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Object name contains unsupported characters
4 participants