find Method Implementation for HNS Buckets #735
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hybrid Approach for HNS Buckets: The find method uses a hybrid strategy for HNS-enabled buckets. It concurrently fetches all files using a standard recursive API call and all folder objects via the Storage Control API. This approach is essential to discover and include empty folders in the output when withdirs=True.
Fallback and Error Handling: For non-HNS (flat) buckets, the method gracefully falls back to the parent find implementation. Similarly, if there's an error in determining the bucket's HNS status (e.g., due to permissions), it logs a warning and defaults to the standard, non-HNS behavior to ensure functionality is maintained.
Caching Strategy: To make subsequent ls() calls highly efficient, the directory cache (dircache) is populated with a complete list of both files and folders discovered during the find operation. This caching occurs irrespective of the withdirs parameter, ensuring the cache is always comprehensive. However, this caching is skipped when a prefix is used to prevent storing partial and potentially misleading directory listings.
Testing: The new functionality is validated through a new integration test suite (gcsfs/tests/integration/test_extended_hns.py) that runs against a real GCS HNS bucket. These tests are also integrated into the Cloud Build CI pipeline.