Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Model merging with delta objects #4177

Merged
merged 14 commits into from
Oct 6, 2022

Conversation

byronxu99
Copy link
Collaborator

@byronxu99 byronxu99 commented Sep 29, 2022

  • Add a new VW::model_delta object that internally keeps a VW::workspace
  • Define operators + and - such that deltas are created by subtracting two workspaces, and can be added to update a workspace
  • Add a merge_delta function that combines multiple deltas into a single delta
  • Refactor model merging to be implemented via delta merging

@byronxu99 byronxu99 marked this pull request as ready for review September 30, 2022 15:05
Copy link
Collaborator

@olgavrou olgavrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like model_delta is just a wrapper around a workspace that is used to communicate that this holds a delta

Wondering if the +/- operators should be operators of the workspace class leading to a more intuitive API for those operators, and it would also mean that we avoid wrapping workspaces into deltas in order to merge them

If we need to indicate that a workspace is a diff we could add a bool field indicating that it is a diff, or maybe model_delta could be derived from workpace

Thoughts?

@byronxu99
Copy link
Collaborator Author

Yes, as it is right now model_delta simply is a wrapper around VW::workspace that makes types explicit and specifies ownership. I think this provides a greater level of safety than the alternative of having an is_delta flag inside of workspace.

The - operator already operates on the workspace class, but returns a model_delta. The + operator I've defined as an "update a workspace with new changes" operation, and thus requires a workspace and a model_delta.

I specifically do not want + to be a "merge" operation, that is, to work on two workspaces or two deltas. The reason is that it won't work for merging GD with save_resume, which must have access to the set of all workspaces being merged at once, instead of just two of them at a time. Otherwise, we could have implemented the merge function in terms of +, -, and a "multiply by scale factor" operation, which I would find more intuitive.

The case of "wrapping workspaces into deltas in order to merge them" isn't as bad as it seems as first. In federated learning, we would be directly receiving deltas to pass to the merge function - having to merge workspaces via deltas would be the less common use case. It also simplifies the merge logic inside of each and every reduction, because now there's no longer a requirement to check for a base workspace and subtract it if it exists.

@byronxu99 byronxu99 merged commit c2d0f6c into VowpalWabbit:master Oct 6, 2022
@byronxu99 byronxu99 deleted the model_delta_merging branch October 6, 2022 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants