Replies: 2 comments 3 replies
-
| 
         In general object_store intentionally tries to abstract away the notion of directories as not only are they a portability headache, as many stores don't support them, the vast majority of workloads actively don't want to care about them - instead delegating catalog responsibilities to something better suited to this, e.g. an RDBMS, Deltalake, iceberg, etc... I'm therefore not sure how useful this would be. Edit: see also #284 
 Why would it provide better performance in this case?  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         Hi @tustvold 
 Because the DFS endpoint exposes a filesystem-like API that supports recursive listing in one call. The Blob endpoint is object-store oriented: it only lists a flat set of objects under a given prefix. To traverse a hierarchy recursively with the Blob API, the client must repeatedly enumerate each "folder level" and stitch results together. That means many network round-trips and higher latency. The DFS endpoint, by contrast, is built for hierarchical namespaces. Its   | 
  
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Azure Storage comes in different flavors. A "regular" Azure Storage account is a classic object store and it comes with blob endpoint (
blob.core.windows.net,blob.fabric.windows.com).Azure Storage Account with hierarchical namespaces enabled (aka. Azure Data Lake Gen2, Fabric OneLake) come with a second "DFS" (Distributed Filesystem) endpoint:
dfs.core.windows.net(dfs.fabric.windows.com) that has it's own REST-API: https://learn.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2This endpoint allows for potentially massive performance improvements in the following scenarios:
These are single atomic operations via DFS. With the Blob endpoint they require many individual blob operations and are therefore slower.
To my understanding, DFS-URLs are already accepted but ultimately only the blob endpoint is used by object_store.
I propose implementing the DFS endpoint in object_store and introducing a feature-flag allowing user to specify if DFS endpoint should be used.
Beta Was this translation helpful? Give feedback.
All reactions