-
Notifications
You must be signed in to change notification settings - Fork 84
feat: use only loop executor for fsspec source
#999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ab20f43 to
e397896
Compare
fsspec optional backends such as s3fs
…ec-optional-backends * origin/source-futures-submit: Future init do not use named arguments for the path/url annotation update submit interface
fsspec optional backends such as s3fsfsspec source
* origin/main: fix: url and object splitting for local files (#1007)
|
Trace for the error: |
jpivarski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good simplification! I see that you need work-arounds for fsspec backends that don't have async and Python 3.8, which doesn't have to_thread, and that's okay.
Although sshfs has been added to the test dependencies, I think the only ssh test is disabled. I wonder if running sshd on the test-runner and connecting to
ssh `whoami`@localhostwould be an option? It's not a big deal.
Isolating the glitchiness of the test to S3 is good—we can provide the functionality without testing it because it's one of the things fsspec is supposed to do on its own. (We should only be responsible for using the fsspec API correctly in Uproot—there's a "separation of concerns.") It's too bad that it can't be reproduced outside of Uproot, but I know you put a lot of time into trying to do that.
I think this PR is ready to go as-is! Thanks!
Good idea I can try this in a different PR, I can use the cache directory for
I'm 90% sure it's |
This PR removes the support for different types of executors for the
fsspecsource. Theuse_threadsandnum_workersoptions are also dropped.The loop executor exists only for compatibility with the remaining sources and does not hold any resources. The loop is accessed directly from
fsspec.asynandsubmitis basically justasyncio.run_coroutine_threadsafe.With this PR, whenever the
fsspecfilesystem does not supportasynccalls, the methods are wrapped into a coroutine that runs them in a separate thread so they are not blocking. This may spawn too many short-lived threads but from limited testing I haven't found this to be an issue. The alternative would be to run them as coroutines in thefsspecloop but they wouldn't run concurrently as they are blocking calls. Another alternative would be to use something likehttps://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executorto run the coroutines in an executor. Edit: #999 (comment)This PR took a long time to finish mainly due to an intermittent error in the CI. After much testing I am still not sure what the cause of this is, but I think it has to do with
s3fsspecifically. I could not reproduce this error outside of pytest running on the uproot repo, I even created another repository to test this but could not reproduce it using the same pytest code and uproot version. Since it's proven so hard to trigger and I don't think this PR causes this error (since it can also be reproduced on the main branch of uproot), I think we can merge this PR and I will try to fix / understand the error in another PR where I enable thes3fstests.The code I used to reproduce this:
Notice in the start where I have defined multiple pairs of
fs, paththen I get some bytes from the file. Only when I run this line with thes3backend it produces the error. Using a sync-only backend such asgithubor another async one such ashttpsdoes not cause the error to trigger, so this is why I thinks3fsis the culprit. I disabled the test that useds3fsand the error also disappeared.