Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Attributes API Exposing Broader Set of Object Metadata #5334

Open
tustvold opened this issue Jan 26, 2024 · 14 comments
Open

Add Attributes API Exposing Broader Set of Object Metadata #5334

tustvold opened this issue Jan 26, 2024 · 14 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted object-store Object Store Interface

Comments

@tustvold
Copy link
Contributor

tustvold commented Jan 26, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The current definition of ObjectMeta reflects the information returned by the various store listing APIs, however, this is an incomplete subset of the attributes of an object.

A number of requests have been made to allow reading and writing a broader set of this data:

Describe the solution you'd like

I would like a new struct called Attributes that is included on PutOptions and GetResult, containing an iterable collection of key-value pairs. The values would be typed as String, and the keys would be a new enumeration called Attribute:

#[non_exhaustive] // This will allow adding new attributes without it being a breaking change
enum Attribute {
    /// The MIME type of the object
    ContentType,
    /// An identifier for a managed encryption key
    EncryptionKey,
    /// User-defined object metadata
    Metadata(String),
    ...
}

ObjectStore implementations would be expected to error on attributes they don't understand / aren't supported, and all attributes would be expected to round-trip. It is an expectation that Attribute would be supported by a majority of stores, i.e. we wouldn't look to support store-specific functionality through this mechanism.

An optional feature flag would enable support for attributes in LocalFileSystem using xattr.

Describe alternatives you've considered

I originally debated adding the ability to specify arbitrary headers in PutOptions, but this broke encapsulation and made it hard to use ObjectStore without knowing the concrete backing implementation. Whilst the Attribute mechanism doesn't eliminate this issue, as not all implementations will support all Attribute, but it at least avoids implementation code needing to know the particulars of how Azure vs AWS encode encryption information in request/response headers, or how Azure uses a different header to write vs read Content-Disposition (I have no idea why), etc...

Additional context

We can't add Attributes to ObjectMeta as the listing APIs do not return this information.

We can't merge TagSet with Attributes as the tag set is only returned by special API calls.

@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog object-store Object Store Interface labels Jan 26, 2024
@tustvold tustvold changed the title Add ObjectAttrs Add Attributes API Exposing Broader Set of Object Metadata Jan 26, 2024
@tustvold
Copy link
Contributor Author

tustvold commented Jan 26, 2024

Any thoughts @Xuanwo @wjones127 @Turbo87 @roeap @thinkharderdev @alamb?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 26, 2024

Interesting design!

I would like a new struct called Attributes that is included on PutOptions and GetResult, containing an iterable collection of key-value pairs.

If I have understood correctly, this collection doesn't provide api like get and users will need to iterate the entire collection to get content_type?

Would you like to share more motivation about this design? I'm guessing you are trying to keep the struct size low.

@tustvold
Copy link
Contributor Author

If I have understood correctly, this collection doesn't provide api like get and users will need to iterate the entire collection to get content_type?

Sorry that's an omission on my part, we definitely would provide such an API, I was just thinking from the perspective of the implementations, who will just enumerate the collection

@Turbo87
Copy link
Contributor

Turbo87 commented Jan 26, 2024

Looks good in general, but I'm a little worried about the "store errors if it can't handle the attribute" part.

In the crates.io codebase we're primarily using the S3 store, but for local dev we use the file store instead, and for testing we use in-memory. Cache-Control is probably not something the file store would support though, so in that case the local dev setup would fail to work.

I'm wondering if something like an ignore_unexpected_attributes: true option might be possible to allow use cases like this?

@tustvold
Copy link
Contributor Author

tustvold commented Jan 26, 2024

The intention is all attributes would be supported for all first-party stores. The erroring is only to give us space to add more without it being a breaking API change, or for making xattr an optional feature

@Turbo87
Copy link
Contributor

Turbo87 commented Jan 26, 2024

I see, sounds good to me then :)

@roeap
Copy link
Contributor

roeap commented Jan 26, 2024

I would support this feature as well as I have seen some cases where people make heavy use of custom metadata who right would be blocked from adopting object store.

@alamb
Copy link
Contributor

alamb commented Jan 26, 2024

It seems reasonable to be given what I understand of it -- maybe it would be worth a quick API PR sketch to see how major a change / breaking change it would be 🤔

@wjones127
Copy link
Member

I think this seems reasonable. I'm particularly interested in the encryption part at the moment.

it at least avoids implementation code needing to know the particulars of how Azure vs AWS encode encryption information in request/response headers

This is appealing. I'm sure downstream libraries, including ours, would appreciate being able to add support for object store encryption without having to do this piecemeal for each cloud.

IIUC, this API would require passing in encryption information for each request. In Lance, we might have this configured at the client level, but it doesn't seem unreasonable that we could do that in our ObjectStore wrapper.

@tustvold
Copy link
Contributor Author

IIUC, this API would require passing in encryption information for each request. In Lance, we might have this configured at the client level, but it doesn't seem unreasonable that we could do that in our ObjectStore wrapper.

We could definitely also support encryption options on the stores as well, or instead, I don't feel strongly

@thinkharderdev
Copy link
Contributor

Sounds good in general

We can't add Attributes to ObjectMeta as the listing APIs do not return this information.

Would we add another method like get_object_attributes(&self,object_meta: &ObjectMeta) -> Result<Attributes>?

Also, certain attributes may be returned by head (thinking specifically of storage class in AWS, which is something we've had to deal with recently), would those be added to the ObjectMeta.

Whilst the Attribute mechanism doesn't eliminate this issue, as not all implementations will support all Attribute

Is the idea that Attributes will encode a superset of metadata across all first-party store implementations?

It's not the most elegent thing in the world but an explicit API for supports_attribute(&self, attribute: Attribute) -> bool could be useful for writing store-implementation-independent code since you can write branching logic based on what attributes are supported on the application side.

@tustvold
Copy link
Contributor Author

tustvold commented Jan 27, 2024

Would we add another method like

There wouldn't be a need, get_opts with GetOptions::head set to true will return a GetResult with the Attributes

Is the idea that Attributes will encode a superset of metadata across all first-party store implementations?

They're actually remarkably consistent w.r.t this, in part as many are standard HTTP concepts, so it would just be this shared set.

@flokli
Copy link

flokli commented Feb 21, 2024

@tustvold What's the status of this? Is there some draft implementation to play with? Or should someone give writing it a try?

@tustvold
Copy link
Contributor Author

I've not had time to work on this, and am unlikely to have time in the next few weeks. Happy for someone else to pick this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted object-store Object Store Interface
Projects
None yet
Development

No branches or pull requests

8 participants