Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send task's functions separately with send_object() #42

Open
rafa-be opened this issue Nov 8, 2024 · 1 comment
Open

Send task's functions separately with send_object() #42

rafa-be opened this issue Nov 8, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@rafa-be
Copy link
Collaborator

rafa-be commented Nov 8, 2024

Client.send_object() allows constant arguments to be sent only once when calling Client.submit() multiple time.

It would be nice to also allow functions to be serialized and send once using the same mechanism:

func_ref = client.send_object(my_function)

fut_1 = client.submit(func_ref, arg_1)
fut_2 = client.submit(func_ref, arg_2)
@rafa-be rafa-be added the enhancement New feature or request label Nov 8, 2024
@rafa-be
Copy link
Collaborator Author

rafa-be commented Feb 27, 2025

I had some time to experiment with this. Sadly, the gain is way less than I expected (0.05 to 0.1 ms per call to submit()).

That's because the client is already caching the function, thanks to how the function ID is generated (using an MD5 hash).

Thus the only benefit is that the function is only serialized once. But Cloudpickle's function serialization is pretty fast.

>>> c = Client("tcp://127.0.0.1:1234")

>>> sqrt_ref = c.send_object(math.sqrt)

>>> %timeit fut = c.submit(math.sqrt, 16)
74.3 μs ± 325 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

>>> %timeit fut = c.submit(sqrt_ref, 16)
39.4 μs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

>>> %timeit c.submit(math.sqrt, 16).result()
1.98 ms ± 39.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

>>> %timeit c.submit(sqrt_ref, 16).result()
1.87 ms ± 41.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

I'd not say that the gain is negligible, but I'm not sure it makes sense considering it adds some (limited) complexity to the client's code.

See rafa-be/scaler@df430e6 for the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant