-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with Python multiprocessing #72
Comments
This library is thread-safe but isn't safe to fork/access from multiple different processes so your approach of creating a separate instance per forked process is correct. What configuration were you using in 7.17 that was working but now isn't in 8.0+? |
@sethmlarson The earlier approach was to have a single global Would it be possible to add information regarding thread/process safety to the manual? |
@redbaron4 I understand. Could you copy and paste the code you were using in 7.17 so I can see how the client was configured and try to reproduce the problem? |
Sorry I misunderstood your earlier comment. Here's how I configure the client
where The usage pattern is that this is part of function
Any function that needs to use the client calls
This is the function which is getting garbled response when |
We have a custom Python script that is used to perform some calculations on elements on an index. During course of calculations, it is necessary to fetch list of timestamps for a search criteria from a backing index. This is done using the
helpers.scan()
paradigm.Since search can take long time (we are searching among millions of documents), our idea was to create a
multiprocessing.Pool
and then use amap
to perform the search so that parallel searches can be performed (We use 3 workers).The scheme worked till Elasticsearch-7.17. After upgrade to Elasticsearch-8.1.0, we updated the script dependency (
elasticsearch-py
to 8.1.0) and noticed that random searches began failing with aunable to deserialize
error.Notice that it seems like the response has body of another response tacked to it (which is probably what is causing the error).
There is no error if we set number of workers to 1 which makes me suspect that transport is not playing well with multiple workers spawned using
multiprocessing
.We initialize the elasticsearch instance once globally and then each "worker" uses that instance to perform the search. Any ideas hw we can make this scheme play well with transport library?
UPDATE
We modified the script so that each worker creates its own
ElasticSearch()
instance at spawn time (in addtion to one created by script). The workers only ever use their own instance and now the script is working correctly.The text was updated successfully, but these errors were encountered: