Skip to content

Conversation

@Mahalaxmibejugam
Copy link
Contributor

Hybrid Approach for HNS Buckets: The find method uses a hybrid strategy for HNS-enabled buckets. It concurrently fetches all files using a standard recursive API call and all folder objects via the Storage Control API. This approach is essential to discover and include empty folders in the output when withdirs=True.

Fallback and Error Handling: For non-HNS (flat) buckets, the method gracefully falls back to the parent find implementation. Similarly, if there's an error in determining the bucket's HNS status (e.g., due to permissions), it logs a warning and defaults to the standard, non-HNS behavior to ensure functionality is maintained.

Caching Strategy: To make subsequent ls() calls highly efficient, the directory cache (dircache) is populated with a complete list of both files and folders discovered during the find operation. This caching occurs irrespective of the withdirs parameter, ensuring the cache is always comprehensive. However, this caching is skipped when a prefix is used to prevent storing partial and potentially misleading directory listings.

Testing: The new functionality is validated through a new integration test suite (gcsfs/tests/integration/test_extended_hns.py) that runs against a real GCS HNS bucket. These tests are also integrated into the Cloud Build CI pipeline.

@ankitaluthra1
Copy link
Collaborator

/gcbrun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants