You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
first of all, thanks for this great tool. It is a pleasure to use.
I have this specific project though where I am training a network on very many tasks (1000's). So what I do that works really well: log a distribution of metrics (say a distribution of correlation coefficients over the different tasks).
So now what does NOT work well: I want to log individual metrics for these tasks so that I can find out easily which tasks train well and which have more problems. I don't get that information from looking at the distribution plot. I don't necessarily want to look at all the individual metric curves for each task, but storing the numbers in aim is really useful as it is also great to programmatically access them.
Logging many tasks to aim does not seem to be a problem in itself. But when I open the UI, it completely stops working (especially if I "accidentally" click the metrics tab). There are of course some metrics (like aggregated metrics and the loss curve) that I am interested in seeing though. So clicking that tab is not completely "accidental".
I did see the issues about performance issues in the case of very many runs:
The runs window loads really slowly and it has problems displaying the table. The metrics tab basically completely blocks.
Expected behavior
either:
the UI would somehow be able to deal with this number of metrics (by lazy loading or something, which the aim UI actually already seems to do for large parts, but somehow not enough).
or:
let me declare while logging that this some metrics are aggregated (just a normal metric) and some others to be part of a collection where each individual scalar belongs to one task, basically a vector of metrics. This could help decide how to treat them in the UI. Note that, at least in my use case, these tasks have names, not just a position in the vector, so it's more like a dict actually.
I realize that option number 2 is a feature request and not a bug report. In which case, my excuses.
Environment
Aim Version: v3.24.0
Python version: 3.12.5
pip version: 24.2
OS: linux/OSX
Any other relevant information
The text was updated successfully, but these errors were encountered:
🐛 Bug
Hi,
first of all, thanks for this great tool. It is a pleasure to use.
I have this specific project though where I am training a network on very many tasks (1000's). So what I do that works really well: log a distribution of metrics (say a distribution of correlation coefficients over the different tasks).
So now what does NOT work well: I want to log individual metrics for these tasks so that I can find out easily which tasks train well and which have more problems. I don't get that information from looking at the distribution plot. I don't necessarily want to look at all the individual metric curves for each task, but storing the numbers in aim is really useful as it is also great to programmatically access them.
Logging many tasks to aim does not seem to be a problem in itself. But when I open the UI, it completely stops working (especially if I "accidentally" click the metrics tab). There are of course some metrics (like aggregated metrics and the loss curve) that I am interested in seeing though. So clicking that tab is not completely "accidental".
I did see the issues about performance issues in the case of very many runs:
But I think this is orthogonal to that in some sense, therefore making this separate issue.
To reproduce
create a new aim repo:
run this python script simulating logging 5 epochs with 5000 tasks:
Spin up the UI and click around a bit:
The runs window loads really slowly and it has problems displaying the table. The metrics tab basically completely blocks.
Expected behavior
either:
or:
I realize that option number 2 is a feature request and not a bug report. In which case, my excuses.
Environment
The text was updated successfully, but these errors were encountered: