Add ability to load multiple copies of a model across processes #31052

damccorm · 2024-04-19T12:53:53Z

Sometimes loading one model per process isn't feasible, but you still want multiple models loaded (e.g. if you have a GPU that can hold 3 copies of the model). This gives users the ability to express this.

Design doc - https://docs.google.com/document/d/1FmKrBHkb8YTYz_Dcec7JlTqXwy382ar8Gxicr_s13c0/edit

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

damccorm · 2024-04-22T19:42:41Z

R: @tvalentyn @liferoad

github-actions · 2024-04-22T19:43:54Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

sdks/python/apache_beam/ml/inference/base.py

sdks/python/apache_beam/ml/inference/base_test.py

sdks/python/apache_beam/ml/inference/base.py

tvalentyn · 2024-04-22T23:15:54Z

sdks/python/apache_beam/ml/inference/pytorch_inference.py

@@ -234,6 +235,9 @@ def __init__(
        memory pressure if you load multiple copies. Given a model that
        consumes N memory and a machine with W cores and M memory, you should
        set this to True if N*W > M.
+      model_copies: The exact number of models that you would like loaded
+        onto your machine. This can be useful if you exactly know your CPU or


Possible wording suggestion:

This can be useful if you exactly know your CPU or GPU capacity and want to maximize resource utilization.

If set, large_model becomes a no-op.

Maybe this should be a ValueError if a user specifies both?

Possible wording suggestion:

Updated

Maybe this should be a ValueError if a user specifies both?

I don't think it should be a ValueError - if you change it to True and you set this param, that is kinda reasonable and a no-op makes sense IMO since we're still honoring your choice.

ah my concern was not an incorrect configuration but cognitive burden for users: would they be thinking if they should set only one param, or both in their use case, while in the end it doesn't matter. but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

Possibly except the fact that using model_copies is currently disallowed with KeyedMH, and large_model might still allow that.

but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

That's right, though I think long term I would like for us to do smart things here (e.g. large_model becomes "pack as many models as you can fit). There's some conversation on this general idea in the design doc

tvalentyn

left one more comment on the design doc

tvalentyn · 2024-04-23T00:40:48Z

sdks/python/apache_beam/ml/inference/pytorch_inference.py

@@ -234,6 +235,9 @@ def __init__(
        memory pressure if you load multiple copies. Given a model that
        consumes N memory and a machine with W cores and M memory, you should
        set this to True if N*W > M.
+      model_copies: The exact number of models that you would like loaded
+        onto your machine. This can be useful if you exactly know your CPU or


ah my concern was not an incorrect configuration but cognitive burden for users: would they be thinking if they should set only one param, or both in their use case, while in the end it doesn't matter. but now it also seems that large_model becomes redundant as it is equivalent to passing model_copies = 1, right?

Possibly except the fact that using model_copies is currently disallowed with KeyedMH, and large_model might still allow that.

tvalentyn · 2024-04-23T01:20:29Z

sdks/python/apache_beam/ml/inference/base.py

@@ -952,6 +977,12 @@ def get_preprocess_fns(self) -> Iterable[Callable[[Any], Any]]:
  def should_skip_batching(self) -> bool:
    return True

+  def share_model_across_processes(self) -> bool:


(cleanup, can be deferred)

we can leverage reflection here and delegate calls to base via __getattr__ like in

https://github.com/apache/beam/blob/37609ba70fab2216edc338121bf2f3a056a1e490/sdks/python/apache_beam/internal/gcp/auth.py

Per https://stackoverflow.com/questions/2405590/how-do-i-override-getattr-without-breaking-the-default-behavior, explicitly defined methods should take priority.

This is a good idea. I think we should do it, but agree deferring is a good idea to keep the PR to a single purpose

sdks/python/apache_beam/ml/inference/base.py

* Add ability to load multiple copies of a model across processes * push changes I had locally not remotely * Lint * naming + lint * Changes from feedback

…) (#31104) * Add ability to load multiple copies of a model across processes * push changes I had locally not remotely * Lint * naming + lint * Changes from feedback

Add ability to load multiple copies of a model across processes

41c1130

github-actions bot added the python label Apr 19, 2024

damccorm marked this pull request as ready for review April 22, 2024 19:42

damccorm added 3 commits April 22, 2024 15:59

push changes I had locally not remotely

422736d

Lint

c848d6c

naming + lint

e7dc797

tvalentyn reviewed Apr 22, 2024

View reviewed changes

Changes from feedback

aa4c4ff

tvalentyn reviewed Apr 23, 2024

View reviewed changes

sdks/python/apache_beam/ml/inference/base.py Show resolved Hide resolved

tvalentyn approved these changes Apr 23, 2024

View reviewed changes

damccorm merged commit 3c8a881 into apache:master Apr 25, 2024
73 checks passed

damccorm deleted the users/damccorm/runInferenceMultiProcess branch April 25, 2024 15:32

damccorm mentioned this pull request May 1, 2024

Add controlling multiple models across processes to CHANGES #31147

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to load multiple copies of a model across processes #31052

Add ability to load multiple copies of a model across processes #31052

damccorm commented Apr 19, 2024 •

edited

Loading

damccorm commented Apr 22, 2024 •

edited

Loading

github-actions bot commented Apr 22, 2024

tvalentyn Apr 22, 2024

damccorm Apr 22, 2024

tvalentyn Apr 23, 2024 •

edited

Loading

damccorm Apr 23, 2024

tvalentyn left a comment

tvalentyn Apr 23, 2024 •

edited

Loading

tvalentyn Apr 23, 2024

damccorm Apr 23, 2024

Add ability to load multiple copies of a model across processes #31052

Add ability to load multiple copies of a model across processes #31052

Conversation

damccorm commented Apr 19, 2024 • edited Loading

GitHub Actions Tests Status (on master branch)

damccorm commented Apr 22, 2024 • edited Loading

github-actions bot commented Apr 22, 2024

tvalentyn Apr 22, 2024

Choose a reason for hiding this comment

damccorm Apr 22, 2024

Choose a reason for hiding this comment

tvalentyn Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

damccorm Apr 23, 2024

Choose a reason for hiding this comment

tvalentyn left a comment

Choose a reason for hiding this comment

tvalentyn Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

tvalentyn Apr 23, 2024

Choose a reason for hiding this comment

damccorm Apr 23, 2024

Choose a reason for hiding this comment

damccorm commented Apr 19, 2024 •

edited

Loading

damccorm commented Apr 22, 2024 •

edited

Loading

tvalentyn Apr 23, 2024 •

edited

Loading

tvalentyn Apr 23, 2024 •

edited

Loading