Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reconstructed ListBlobs marker to provide list offset support in MicrosoftAzure store #6174

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

andrebsguedes
Copy link
Contributor

@andrebsguedes andrebsguedes commented Aug 1, 2024

Which issue does this PR close?

Closes #6173.

Rationale for this change

The opaque token provided as a marker to ListBlobs is a trivial encoding of the key to start listing from (see marker_for_offset code comment). In this PR we propose an experimental feature flag that implements offset behavior for listing by relying on this fact.

@github-actions github-actions bot added the object-store Object Store Interface label Aug 1, 2024
@tustvold
Copy link
Contributor

tustvold commented Aug 1, 2024

Are there any official SDKs that implement this, this would give some measure of confidence that this is both correct and likely to remain well supported? I also wonder which of Azure's three blob storage "flavors" supports this, it would be very out of character for them to actually be consistent...

FYI @alexwilcoxson-rel who filed a related #5653 but was unable to get it to work

@andrebsguedes
Copy link
Contributor Author

@tustvold Well, to be fair I was not even aware that hadoop was also relying on this (thanks for the pointer), so no, I have no knowledge of official SDKs that implement this hack haha, and you are probably right that this breaks under hierarchical namespaces. That is why my goal was to introduce it under a experimental/unstable flag to shift the responsibility of relying on this to the user (in this case: me) while still allowing for the functionality.

This is something I would keep as a patch on my side but thought that maybe others could potentially benefit from it too if they know what they are doing : )

I can totally understand if this is deemed too hacky (even if under a flag)

@alexwilcoxson-rel
Copy link
Contributor

Yeah before we went down the road of impl in object_store we tried doing what hadoop code is doing via python and the different blob apis (blob.core.windows.net and dfs.core.windows.net). Also the hadoop code does differentiate between hierarchical namespace and not.

Furthermore we have been in talks with Microsoft and the Azure Storage team about this. Will follow up once they get back to us about whether this will be documented and supported rather than requiring this type of workaround.

@Xuanwo
Copy link
Member

Xuanwo commented Aug 2, 2024

Furthermore we have been in talks with Microsoft and the Azure Storage team about this. Will follow up once they get back to us about whether this will be documented and supported rather than requiring this type of workaround.

That will be great for the entire community. Thanks in advance.

@anihitk07
Copy link

Yeah before we went down the road of impl in object_store we tried doing what hadoop code is doing via python and the different blob apis (blob.core.windows.net and dfs.core.windows.net). Also the hadoop code does differentiate between hierarchical namespace and not.

Furthermore we have been in talks with Microsoft and the Azure Storage team about this. Will follow up once they get back to us about whether this will be documented and supported rather than requiring this type of workaround.

Hey - One of my Customers believe this could potentially unblock few challenges on their side. May I know whom are you talking to within MS from both SDK & Storage engineering side?

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2024

I'm going to mark this as a draft whilst we wait to hear back from MS about first-party support for this

@alexwilcoxson-rel
Copy link
Contributor

Last I heard they are still getting the private preview available with this feature.

@alexwilcoxson-rel
Copy link
Contributor

Latest update. I have tested list with a startFrom parameter against a private preview storage account Microsoft provided us. However it won't be generally available until sometime next year.

In the mean time they once again directed us towards what hadoop is doing

This workaround only works against the dfs.core.windows.net and associated endpoints.

I tested it against a patched version of OpenDAL's azdls (which has an impl of ObjectStore trait) with success: apache/opendal#5242

@Xuanwo
Copy link
Member

Xuanwo commented Oct 25, 2024

Latest update. I have tested list with a startFrom parameter against a private preview storage account Microsoft provided us. However it won't be generally available until sometime next year.

That's really nice. Happy to know.

I tested it against a patched version of OpenDAL's azdls (which has an impl of ObjectStore trait) with success: apache/opendal#5242

Thank you for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use reconstructed ListBlobs marker to provide list offset support in MicrosoftAzure store
5 participants