Skip to content

Conversation

@rajarshisarkar
Copy link
Contributor

@rajarshisarkar rajarshisarkar commented Mar 16, 2022

This change adds S3 Tags to the objects while deleting using S3FileIO. Users can pass their custom tags as part of the catalog properties. This would allow the users to manage their storage lifecycle.

Spark SQL launch command:

sh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \
    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key=my_val \
    --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2 \
    --conf spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3 \
    --conf spark.sql.catalog.my_catalog.s3.delete-enabled=false

Tags in S3 after delete:

aws s3api get-object-tagging --bucket iceberg-warehouse --key s3-tagging/metadata/00000-5d37f925-be01-44d0-87fd-15513606ff6b.metadata.json
{
    "TagSet": [
        {
            "Key": "my_key2",
            "Value": "my_val2"
        },
        {
            "Key": "my_key3",
            "Value": "my_val3"
        },
        {
            "Key": "my_key",
            "Value": "my_val"
        }
    ]
}

Note: my_key=my_val and my_key2=my_val2 are the tags when the object was written.


cc: @jackye1995 @arminnajafi @singhpk234 @amogh-jahagirdar @xiaoxuandev @yyanyy

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajarshisarkar !! for the change

@rajarshisarkar rajarshisarkar marked this pull request as ready for review March 18, 2022 03:24
Comment on lines 145 to 172
Tasks.foreach(paths)
.noRetry()
.suppressFailureWhenFinished()
.onFailure((path, exc) -> LOG.warn("Failed to add delete tags: {}", path, exc))
.run(this::doSoftDelete);
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a configurable threadpool here? So we can get better throughput on the fetching and updating of tags. I've been thinking S3FileIO should implicitly have a threadpool but maybe it does not fit in the FileIO model

 Tasks.foreach(paths)
          .noRetry()
          .executeWith(s3FileIoThreadPool)
          .suppressFailureWhenFinished()
          .onFailure((path, exc) -> LOG.warn("Failed to add delete tags: {}", path, exc))
          .run(this::doSoftDelete);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have introduced the executor service configurable via s3.delete.tags.num-threads.

.getObjectTagging(getObjectTaggingRequest);
// Get existing tags, if any and then add the delete tags
Set<Tag> tags = Sets.newHashSet();
if (getObjectTaggingResponse != null && getObjectTaggingResponse.hasTagSet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getObjectTaggingResponse would ever be null, we can at least trust AWS client with that :)

if (getObjectTaggingResponse != null && getObjectTaggingResponse.hasTagSet()) {
tags.addAll(getObjectTaggingResponse.tagSet());
}
tags.addAll(awsProperties.s3DeleteTags());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline after if block

@danielcweeks
Copy link
Contributor

I'm not sure I really follow why we would do this. It seems like we're trying to overlap use cases by highjacking delete behavior.

This also requires that the configuration of the bucket aligns with the tags to actually achieve the desired behavior.

Overall, it feels like this is a bit of a reach in terms of functionality.

@jackye1995
Copy link
Contributor

jackye1995 commented Mar 20, 2022

It seems like we're trying to overlap use cases by highjacking delete behavior.

Thanks for the feedback Daniel! I think this is not trying to highjack the delete behavior. Based on what we've heard from some key customers, tagging an object is now the preferred S3 delete implementation in many companies. It is typically a sev0 in a corporation for data loss, so the data platform team prefers to gradually retire the delete files to different tiers and let S3 handle the lifecycle transition.

Note that this is not a discussion of table deletion/snapshot expiration with purge or not. For platforms that allow people to do table deletion with purge, I would imagine this feature to be preferred where users could accidentally run a DROP TABLE. And even for platform that does deletion without purge, eventually there is a team who does the "purge" part, and based on the customer discussions we have, this is typically what they do by tagging files and then define lifecycle policy so S3 takes care of the transition of files.

This also requires that the configuration of the bucket aligns with the tags to actually achieve the desired behavior.

This can be explained in 3 parts:

  1. ironically, to satisfy the use case i described, many teams today have to enable S3 object versioning to not hard-delete data files. However, it becomes a huge increase in cost and users have to implement custom workflows then to deal with versioned objects. So some sort of bucket configuration is always needed for enterprise data lake setup, and this tagged approach is going to save cost rather than increase cost for those users.

  2. there are many features that won't work with the default S3 behavior and need bucket level settings, some other features we plan to add like S3 acceleration mode, S3 dual stack are all like that. It's mostly done for backwards compatibility. We see a strong adoption of using a lifecycle policy to soft-delete data instead of doing a direct s3 object deletion. There are recently also threads in similar products like Hudi and Delta for such use cases. I would consider this feature to fall into such category of "new features that users have to opt-in" on S3 side.

  3. From Iceberg perspective, a file is no longer a part of the table if it is not a part of the Iceberg metadata tree. FileIO.delete is always used AFTER a table level commit that removes the file out of the metadata tree (in the full table deletion case it is also done after catalog operation that removes the table in catalog), so this semantic is still intact even if a file is not actually deleted for people would like to use such a feature.

Overall, it feels like this is a bit of a reach in terms of functionality.

Let me know what do you think about this area of topic, if you have any better solutions than the proposed one. I think this is a small feature that could enable a lot of valuable use cases. Similar to the access point feature we are adding in S3FileIO, we consider that an intermediate solution which could be solved also at Iceberg level with larger spec change to support relative file path, I think there is value in offering this feature for people who knows how to use it, while we think of anything better that could solve such retention at Iceberg spec level.

@rdblue
Copy link
Contributor

rdblue commented Mar 20, 2022

@danielcweeks, @jackye1995, I think there's definitely value in being able to add tags to let lifecycle policies handle this. But I find it really confusing that adding a delete tag prevents the object from being deleted and instead just tags it.

I think that tagging an object before deleting it and skipping the delete because the tag takes care of it should be separate configurations. It's reasonable for someone to replace deletes with tags, or tag an object and still delete it. I can also imagine a situations where client-side deletes aren't allowed and the warehouse relies on an orphan files service. That means all of the combinations of "skip delete" and "tag before delete" are valid use cases, so we should separate those.

How about implementing this with a s3.delete-enabled flag and the s3.delete.tags config?

@jackye1995
Copy link
Contributor

I can also imagine a situations where client-side deletes aren't allowed and the warehouse relies on an orphan files service. That means all of the combinations of "skip delete" and "tag before delete" are valid use cases, so we should separate those.

Yes totally agree. However, should it be s3.delete-enabled, or a broader io.delete-enabled feature? If a client-side delete is not allowed, it should be applied to all the FileIOs instead of just S3.

I thought about the 4 possible combinations, and I did not ask for an additional -enabled flag because:

  1. delete enabled + tags set: this case does not really exist, because once the object is deleted in S3 there is no tag anymore.
  2. delete disabled + tags set: this is the feature in the PR
  3. delete enabled + no tag: this is the existing behavior
  4. delete disabled + no tag: this is the case we agree that an orphan file service would take care of everything, and goes back to my question of should this be a feature flag across all FileIOs? If so we can publish a follow-up PR for that.

@rdblue
Copy link
Contributor

rdblue commented Mar 20, 2022

@jackye1995, for case 1, I thought tags would remain on objects in versioned buckets that are deleted with a delete marker. So you could use a tag to control when the old version gets cleaned up, but logically delete it immediately. Isn't that a valid case?

For the question about s3.delete-enabled or io.delete-enabled, I would go with s3.delete-enabled although I could be convinced otherwise.

First, if the intent is to replace the S3 delete with a tag operation instead, then it seems like you wouldn't want to disable deletes across all storage for a table. If you have files stored across multiple schemes, then it doesn't make sense to disable deletes for other schemes.

Second, io.delete-enabled would address just use case 4. In that case, why delegate to FileIO to skip the delete rather than just not calling delete in the library? I don't think we would want io.delete-enabled because it would be impossible to make sure all of the FileIO implementations implement it consistently. Plus, you'd still have the problem where you probably don't want to turn it on in all object stores at once. You may have a different orphan file cleanup procedure for each one, or maybe for one you don't have a reliable prefix to list.

@rajarshisarkar
Copy link
Contributor Author

I have implemented the s3.delete-enabled flag which is different from the s3.delete.tags config. This should satisfy the "skip delete" and "tag before delete" use cases.

@rajarshisarkar rajarshisarkar force-pushed the s3-delete-tags branch 3 times, most recently from f0c8f91 to fc553b4 Compare March 21, 2022 12:47
Comment on lines 110 to 122
if (executorService == null) {
synchronized (S3OutputStream.class) {
if (executorService == null) {
executorService = MoreExecutors.getExitingExecutorService(
(ThreadPoolExecutor) Executors.newFixedThreadPool(
awsProperties.s3FileIoDeleteTagThreads(),
new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat("iceberg-s3fileio-deletetags-%d")
.build()));
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shift this to a private function

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a utility for this that we can leverage. ThreadPools.newWorkerPool(String namePrefix, int poolSize)

https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ThreadPools.java#L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, for letting me know about the utility method.

.suppressFailureWhenFinished()
.onFailure((path, exc) -> LOG.warn("Failed to add delete tags: {}", path, exc))
.run(this::doSoftDelete);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not return here as the flow will newer reach below if deleteTagging is enabled

private transient S3Client client;
private MetricsContext metrics = MetricsContext.nullMetrics();
private final AtomicBoolean isResourceClosed = new AtomicBoolean(false);
private static volatile ExecutorService executorService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Any reason, executor service need's to static ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static variable should be placed before instance variables


/**
* Used by {@link S3FileIO} to tag objects when deleting. When this config is set, objects are
* tagged with the configured key-value pairs instead of being hard-deleted. This is considered a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the new behavior, the line here should say "objects are tagged with the configured key-value pairs before deletion"

public static final String S3FILEIO_DELETE_TAGS_THREADS = "s3.delete.tags.num-threads";

/**
* Used by {@link S3FileIO} to delete files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here is not accurate, should say something like "Determines if S3FileIO delete the object when io.delete() is called, default to true. Once disabled, users are expected to set tags through S3_DELETE_TAGS_PREFIX and manage deleted files through S3 lifecycle policy"

}
}

private void doSoftDelete(String path) throws S3Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method should now be named "tagFileToDelete"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a generic (still helper) tagFile(String path, List deleteTags). the caller, in this case deleteFiles, would be responsible for passing in the deletion tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes. Named it as tagFileToDelete(String path, Set<Tag> deleteTags).

this.s3 = s3;
this.awsProperties = awsProperties;
if (executorService == null) {
synchronized (S3OutputStream.class) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be S3FileIO.class

@danielcweeks
Copy link
Contributor

@jackye1995 I think we need to really clarify the use cases and make sure we're coming up with a solution that really addresses those. This may be a good topic for the next sync as well, since it's getting a little involved in terms of object lifecycle management in S3.

I would say the typical uses cases would be:

Versioned Bucket: Life-cycle policy for non-current versions+delete markers with direct delete as there's no requirement for tags.

Non-Versioned Bucket: Life-cycle policy for tagged objects and no direct delete (deleting after tagging doesn't really make sense).

I'm not sure if there's a use case for both versioned bucket and tagging for delete.

I would also question the use of life-cycle for transitioning tagged objects since there's a cost associated with the tagging, the transition and, depending on the tier, a minimum charge. It feels like that the use case where it becomes more cost effective gets narrow pretty quickly (as compared to just versioned bucket).

Given that there are these different strategies, we might actually want to separate this behavior into an interface for DeletionStrategy where we can provide implementations that are targeted to the use case. That would also allow us to isolate some things (like the executor service which is currently created whether it's used or not).

@rdblue
Copy link
Contributor

rdblue commented Mar 23, 2022

I talked with Dan about this and he made me question my take on io.deletes-enabled vs s3.deletes-enabled. I'm good either way.

@jackye1995
Copy link
Contributor

I'm not sure if there's a use case for both versioned bucket and tagging for delete.

Yes, after doing some more research I think this is the right conclusion. I thought there was such a use case based on what @rdblue describe, but I've never heard of any.

Regarding having a DeleteStrategy, I don't have a strong opinion for the interface either. The original proposal of introducing s3.delete.tags is a minimum integration, but if some other approaches are easier for your integration use case we can totally go with that.

I don't know how this interface would look like though, do you have anything in your mind? Because unlike bulk operation, this is just conceptually a "delete" in the user's mind, which should ideally reuse the existing delete interface. So the strategy feels more like a config to me, where we can have s3.delete.strategy=delete(default)|tagging to choose from.


/**
* Determines if S3FileIO delete the object when io.delete() is called, default to true. Once
* disabled, users are expected to set tags through S3_DELETE_TAGS_PREFIX and manage deleted files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: {@link #S3_DELETE_TAGS_PREFIX}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes.

@jackye1995
Copy link
Contributor

@danielcweeks do you have any additional suggestions? If not I think the current approach that Ryan suggested and the current code path looks good to me.

public static final String S3FILEIO_DELETE_THREADS = "s3.delete.num-threads";

/**
* Determines if S3FileIO delete the object when io.delete() is called, default to true. Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
* Determines if S3FileIO delete the object when io.delete() is called, default to true. Once
* Determines if S3FileIO deletes the object when io.delete() is called, default to true. Once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes.

try {
tagFileToDelete(path, awsProperties.s3DeleteTags());
} catch (S3Exception e) {
LOG.warn("Failed to add delete tags: {}", path, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] should we log both deleteTag as well as path :

Suggested change
LOG.warn("Failed to add delete tags: {}", path, e);
LOG.warn("Failed to add delete tags: {} to {}", awsProperties.s3DeleteTags().toString(), path, e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes.

}

private void tagFileToDelete(String path, Set<Tag> deleteTags) throws S3Exception {
S3URI location = new S3URI(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we wire in s3BucketToAccessPointMapping, to use access point in tag manipulation.

Adding Tag's via access point is an allowed operation : https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-usage-examples.html#add-tag-set-ap i.e both GetObjectTagging as well as PutObjectTagging operations are allowed

Copy link
Contributor Author

@rajarshisarkar rajarshisarkar Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, missed on adding this after merging master.

return client;
}

private ExecutorService executorService() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably shutdown the ExecutorService on close of the FileIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have made the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot close it, because it's a static variable and shared across FileIOs? Which is similar to the one in S3OutputStream. We either not close it, or make it an instance variable instead. I think not closing it is fine given we are already doing that in S3OutputStream, any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel not closing is fine. We can keep it as static considering parity with S3OutputStream.

@danielcweeks
Copy link
Contributor

@rajarshisarkar and @jackye1995 One small comment about proper shutdown above, but other than that I'm generally ok with this. I think we can tackle refactoring some of this logic later (the S3FileIO is now mostly bulk/delete/tagging, so I think there's an opportunity to clean this up).

It is important to note that for the bulk use cases, tagging will be quite a bit slower since we have to tag each object individually. That may not scale particularly well, but at least the option to tag is there.

@jackye1995
Copy link
Contributor

This looks good to me, the only point with concern now is the executor service. I think we can merge this, and address the issue if there is a better way from @danielcweeks, otherwise the current implementation is consistent with what we had in S3OutputStream. Thanks for the work @rajarshisarkar, and thanks for the review @rdblue @danielcweeks @singhpk234 @amogh-jahagirdar !

@jackye1995 jackye1995 merged commit 605bffb into apache:master Mar 30, 2022
@rdblue
Copy link
Contributor

rdblue commented Mar 30, 2022

Thanks for getting this in, @amogh-jahagirdar and @jackye1995!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants