Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS High Performance Parallel Listing #567

Open
hanseaston opened this issue Jul 20, 2023 · 0 comments
Open

GCS High Performance Parallel Listing #567

hanseaston opened this issue Jul 20, 2023 · 0 comments

Comments

@hanseaston
Copy link
Contributor

hanseaston commented Jul 20, 2023

Hello Martin,

I am currently a Google SWE intern, and I am working as a part of the GCS team.

We are thinking of optimizing the listing operation in GCSFS, and want to get your initial approval on this.

In particular, we are thinking of utilizing multiple processes (using the concurrency.future library) and an optional GCS Insights Service to speed up the listing operation. Currently, the listing operation takes about 300 seconds to list 1 million objects. With the optimization, the listing operation can speed up 10x using the same setup. We have already experimented internally.

The Insights Service will be an optional configuration that the client can pass into the GCSFS listing call, so everything should be backwards compatible.

Let me know your initial thoughts. I will be working on the codebase and should hopefully have a PR out in the next few weeks.

My email is open: [email protected], if you ever have anything you wish to discuss offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant