Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS SDK isn't bundled with the application #303

Open
prakash-42 opened this issue Apr 7, 2024 · 3 comments
Open

AWS SDK isn't bundled with the application #303

prakash-42 opened this issue Apr 7, 2024 · 3 comments
Labels
question Further information is requested

Comments

@prakash-42
Copy link

Hi! I wasn't sure about the correct forum for asking my question, hope this is the right place.

When I tried to package and run the application (following the steps in the README), I got the following error:

java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException

I think AWS SDK isn't bundled by default with the application. Do I need to add this dependency myself (by modifying project's pom.xml), or is there a different recommended way for getting the AWS SDK libraries at the runtime?

I did notice that PR for issue #201 explicitly removes the AWS SDK, but I couldn't understand the motivation behind that. Please guide me on this, thank you!

@ismailsimsek
Copy link
Member

ismailsimsek commented Apr 7, 2024

@prakash-42 if you use org.apache.iceberg.aws.s3.S3FileIO you don't need the aws bundle. thats the recommended fileIo to use for aws/s3

example setup below:

debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO
debezium.sink.iceberg.s3.endpoint=http://minio:9000
debezium.sink.iceberg.s3.path-style-access=true
debezium.sink.iceberg.s3.access-key-id=admin
debezium.sink.iceberg.s3.secret-access-key=password

further details in iceberg documentation

@ismailsimsek ismailsimsek added the question Further information is requested label Apr 8, 2024
@prakash-42
Copy link
Author

Thanks for your response @ismailsimsek . The error went away after I switched to using the S3FileIO instead of org.apache.hadoop.fs.s3a.S3AFileSystem. I have however run into a different problem after this.

I am trying to setup this project with the catalog-impl as org.apache.iceberg.aws.glue.GlueCatalog. Here's my configuration properties for the same:

# Iceberg sink config
debezium.sink.iceberg.table-prefix=debeziumcdc_
debezium.sink.iceberg.upsert=true
debezium.sink.iceberg.upsert-keep-deletes=true
debezium.sink.iceberg.write.format.default=parquet
debezium.sink.iceberg.catalog-name=mycatalog

# S3 config using Glue catalog And S3FileIO
debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.io-impl=org.apache.iceberg.aws.s3.S3FileIO
debezium.sink.iceberg.warehouse=s3://poc_bucket/icebergcatalog
# debezium.sink.iceberg.type=iceberg # Gives error
debezium.sink.iceberg.catalog-type=hadoop
debezium.sink.iceberg.format-version=2

When I try to run the application, it fails on startup with the following error:

Caused by: org.apache.iceberg.exceptions.ValidationException: Invalid S3 URI, cannot determine scheme: file:/home/glue_use
r/workspace/spark-warehouse/debezium_offset_storage_custom_table/metadata/00000-2a2503fc-a6db-47f2-9ac9-ce21a29322cb.metadata.json
        at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
        at org.apache.iceberg.aws.s3.S3URI.<init>(S3URI.java:72)

I'm not sure what property I should set so that it creates paths like s3:// instead of file:/. (I thought that the debezium.sink.iceberg.warehouse should control this part, but now I'm not sure). Can you suggest me any tips for debugging this? Sorry for pestering you, I think this tool can greatly simplify our data lake's CDC process and hence wanted to set it up.

@ismailsimsek
Copy link
Member

@prakash-42 you dont need second line below, this two are same and setting the catalog

debezium.sink.iceberg.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
debezium.sink.iceberg.catalog-type=hadoop

outside of that config looks correct to me.

leaving here documentation for aws iceberg integration https://iceberg.apache.org/docs/1.5.0/aws/#glue-catalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants