You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using AIM to track item detection experiments. We have a back-end running in one of our remote servers we use to track our training and evaluation data. The data consists either of float or image data (mostly numpy.NDARRAY[numpy.uint8]. I have observed massive performance differences between tracking data to a remote AIM server or to a local AIM server running on my laptop.
For instance, tracking a json file with 3000 lines (see attachment in the To reproduce section) takes more than 15minutes to push to the remote server while it takes less than 10 seconds to do the exact same job locally(!).
I have tried to debug this by pushing batches of data instead of doing one call per metric, but nothing seems to make a difference. To add more unexpected information to the picture, tracking 95 images (each app 4MB) to the exact same server took only one minute. I think this means that the delay is not related with the size of the data being tracked (the images are almost 400Mbs while the raw json data is 4.6Mb) 🤷♂️
I would really appreciate if someone could cast some light on this, if this difference in performance is expected or if there are any optimizations in terms of tracking/hardware... we could use to speed it up, because how it works not it is really not usable.
@diogo-sr thanks for raising this issue. Performance in general and of the tracking server is a priority for the team. @mihran113, could you please take a look? Could you please share the results we got after re-implementing the tracking server?
Slightly related to PR #3203. Copying a few megabytes worth of 1M steps tracked sequence is really slow. The PR updates aim to be able to update the remote tree in chunks.
I am not sure how easy this is to integrate in direct tracking to a remote repository. But we are now tracking to a local repository and syncing the runs to a remote repository in close-to-real time using custom sync code, which we will be happy to contribute to aim once the aim backend has support for chunk updates.
🐛 Bug
I have been using AIM to track item detection experiments. We have a back-end running in one of our remote servers we use to track our training and evaluation data. The data consists either of
float
or image data (mostlynumpy.NDARRAY[numpy.uint8]
. I have observed massive performance differences between tracking data to a remote AIM server or to a local AIM server running on my laptop.For instance, tracking a
json
file with 3000 lines (see attachment in the To reproduce section) takes more than 15minutes to push to the remote server while it takes less than 10 seconds to do the exact same job locally(!).I have tried to debug this by pushing batches of data instead of doing one call per metric, but nothing seems to make a difference. To add more unexpected information to the picture, tracking 95 images (each app 4MB) to the exact same server took only one minute. I think this means that the delay is not related with the size of the data being tracked (the images are almost 400Mbs while the raw json data is 4.6Mb) 🤷♂️
I would really appreciate if someone could cast some light on this, if this difference in performance is expected or if there are any optimizations in terms of tracking/hardware... we could use to speed it up, because how it works not it is really not usable.
To reproduce
Expected behavior
Pushing the metrics should not take more than 15minutes
Environment
The text was updated successfully, but these errors were encountered: