Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Support remote mass updates #3203

Merged
merged 26 commits into from
Aug 14, 2024
Merged

Conversation

peter-sk
Copy link
Contributor

@peter-sk peter-sk commented Aug 3, 2024

For tracked runs with long sequences (think millions of steps), the performance when copying/moving/syncing runs over to a remote repository is prohibitive (think 20 minutes to several hours for two computers in different organization with 1 GBit/s to the backbone). The reason is the overhead incurred by the many RPC calls.

This PR adds a mass update function "update" that allows to set multiple (key/path, value) tuples in one (RPC) call.

The result is that copying a run with millions of steps now takes between a few seconds and a couple of minutes, i.e., a speed-up of approx. 100x when copying with a chunk size of 128.

When this PR gets accepted (or similar functionality integrated otherwise), we would be happy to open another PR that integrate it into the "aim runs cp", "aim runs mv" etc. commands AND adds a "aim runs sync" command, which copies only new data, allowing for efficient close-to-real-time replication of repositories.

@CLAassistant
Copy link

CLAassistant commented Aug 3, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Prof. Peter Schneider-Kamp seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@@ -183,7 +183,7 @@ def index(
).subtree('meta')
meta_run_tree = meta_tree.subtree('chunks').subtree(run_hash)
meta_run_tree.finalize(index=index)
if meta_run_tree['end_time'] is None:
if meta_run_tree.get('end_time') is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in the current aim code. I f there is no value for 'end_time', the code raises an exception instead of returning None.

@@ -1,4 +1,4 @@
from typing import Any, Iterator, Tuple, Union
from typing import Any, Iterator, List, Tuple, Union
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add List for type annotation of update function below.

@@ -115,6 +115,20 @@ def items(self, path: Union[AimObjectKey, AimObjectPath] = ()) -> Iterator[Tuple
(key,) = path
yield key, value

def update(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update from list of (path/key, object) tuples.

@@ -31,6 +31,9 @@ def values(self) -> Iterator[Any]:
def items(self) -> Iterator[Tuple[int, Any]]:
yield from self.tree.items()

def update(self, values: List[Tuple[int, Any]]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send call down to tree.

@@ -98,6 +98,13 @@ def items_eager(self, path: Union[AimObjectKey, AimObjectPath] = ()) -> List[Tup
def items(self, path: Union[AimObjectKey, AimObjectPath] = ()) -> Iterator[Tuple[AimObjectKey, AimObject]]:
return self.items_eager(path)

def update(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mass update via RPC.

@@ -163,6 +170,13 @@ def keys(
def items(self, path: Union[AimObjectKey, AimObjectPath] = ()) -> Iterator[Tuple[AimObjectKey, AimObject]]:
return self.tree.items(self.absolute_path(path))

def update(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send call down to tree.

@peter-sk peter-sk marked this pull request as ready for review August 3, 2024 20:10
@alberttorosyan alberttorosyan changed the title Support remote mass updates [feat] Support remote mass updates Aug 12, 2024
Copy link
Member

@alberttorosyan alberttorosyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-sk, thanks for your contribution. The changes look good!

Would you mind adding the entry in the CHANGELOG.md file?

On a separate note; the update performance can be increased further by implementing batch update in lower-level storage classes (Container/RocksContainer). Once this PR is merged, we can implement that or assist you if you're interested in yet another contribution.

@alberttorosyan alberttorosyan merged commit 424b624 into aimhubio:main Aug 14, 2024
9 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants