Ask if overwrite existing ClearML experiment by mikel-brostrom · Pull Request #9865 · ultralytics/yolov5

mikel-brostrom · 2022-10-20T07:11:53Z

Overwriting experiment data without asking is very painful and a huge waste of time. This can be solved by a simple y/n question.

If a task with the chosen name already exists in ClearML, ask if you want to overwrite it.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced ClearML integration with a timeout feature for avoiding experiment overwrite in YOLOv5.

📊 Key Changes

Introduced a Timeout decorator to prevent hanging during user input.
Added a check to avoid overwriting existing ClearML experiments.
Implemented a default timeout of 20 seconds for user confirmation when a potential experiment overwrite is detected.

🎯 Purpose & Impact

🎨 Enhances User Experience: By introducing a timeout, the change helps avoid accidental overwrites and enhances user control without stalling the process.
🔒 Increases Robustness: Preventing accidental overwriting ensures that previous experiment data remains intact, which is valuable for data integrity.
💡 Streamlines Workflow: This automated check can streamline experiments by notifying users of possible naming conflicts, fostering a smoother workflow when using ClearML with YOLOv5.

Signed-off-by: Mike <yolov5.deepsort.pytorch@gmail.com>

for more information, see https://pre-commit.ci

AyushExel · 2022-10-20T16:06:44Z

@mikel-brostrom nice idea. But I'm not sure if we should have a function that stalls forever for input? Instead we should have a this function time out after a fixed interval if user doesn't respond and go ahead with the default setting. What do you think?

glenn-jocher · 2022-10-20T16:12:28Z

See Timeout class in utils/general

    # YOLOv5 Timeout class. Usage: @Timeout(seconds) decorator or 'with Timeout(seconds):' context manager

mikel-brostrom · 2022-10-20T16:31:07Z

Let's add @timeout(10) on __init__ under ClearmlLogger? Should we change the warning message as well then? What do you think @AyushExel & @glenn-jocher?

AyushExel · 2022-10-20T16:47:40Z

Maybe add 20sec? 10 sec will get users like me anxious 😆
Yeah for message something like this -
Overwrite existing CLearML experiment? (y/n): 20sec timeout set
and after timeout
Timed out. Using 'y'

for more information, see https://pre-commit.ci

mikel-brostrom · 2022-10-20T17:05:27Z

Ok, let's do 20 xD

Signed-off-by: Mike <yolov5.deepsort.pytorch@gmail.com>

thepycoder · 2022-11-28T17:31:04Z

Busting in here to add my 2 cents:

ClearML will not by default overwrite an experiment (even one with the same name) UNLESS it has no artifacts/model files etc. This was originally designed to not clutter the interface with tasks that have no value to the user.

If you don't like this behaviour, the easier way to stop this from happening is to add reuse_last_task_id=False to the ClearML task init call. @mikel-brostrom would adding this argument to Task.init fix the issue for you?

If so, we could add it like this by default or allow the user to pass a new command line argument to set this?

mikel-brostrom · 2022-11-29T08:03:58Z

The problem I see is that sometimes you may want to interrupt a training with partial results as they may be enough for you particular experiment. Specially for us that are low on GPUs. This means that no artifacts/models gets uploaded which leads to this issue. I think reuse_last_task_id=False would be enough @thepycoder, yes. If this should be the default behavior or configured by a command line argument I think is a question for @glenn-jocher or @AyushExel

AyushExel · 2022-11-29T08:18:25Z

@thepycoder @mikel-brostrom yeah I'd also prefer reuse_last_task_id=False. But then can we also manually reuse a previous id by passing it somehow?

AyushExel · 2022-11-29T08:19:03Z

The way I see it is that by default the experiments should not be overwritten but I should be able to overwrite them if I want to

thepycoder · 2022-11-29T08:22:24Z

I agree, there 2 parts of this though:

Set reuse_last_task_id=False and allow users to override it with an arg if they want to. I think we are in agreement that this makes sense to do.
Being able to override a specific task by its ID does not really work out of the box, it overrides the latest (if it had no model files) or none at all. We CAN use a specific ID to resume training, provided there is a "latest" model that was captured. But this is a completely different feature :)

EDIT: Actually overriding a task by its ID is in fact possible! I don't know it it's the best way to go, but it is available should you think it a better option @AyushExel

AyushExel · 2022-11-30T15:33:34Z

@thepycoder
I think if overwriting a task by its ID is complicated then let's just go with method 1 and not overwrite anything by default. What do you think?

thepycoder · 2022-12-01T08:27:58Z

I'll get right to it :)

thepycoder · 2022-12-01T09:01:48Z

@mikel-brostrom @AyushExel I've made a PR request that sets the default behaviour to not override, allows you to override it either with the newest task or a specific one by ID. Thanks to ClearML already supporting this it was straightforward to add :)

Thanks all!

glenn-jocher · 2022-12-01T21:36:55Z

@thepycoder @mikel-brostrom hi guys, sorry for the delay here. Is this PR good to go? I see a related PR in #10363, do these work together or should I do one or ther other?

glenn-jocher · 2022-12-03T23:09:00Z

@mikel-brostrom @thepycoder I'd like to close this out along with #10363. Are these two PRs related, should we do one or the other? Can you guys please let me know the status of these two? Thanks!

Also as I mentioned in #10363 we have an existing --exist-ok argument that is meant to allow for re-use of existing projects:

mikel-brostrom · 2022-12-04T22:02:02Z

Correct me if I am wrong @thepycoder, but your PR solves my issue stated here #9865 as well right, but with the internal functionality in ClearML instead of my workaround? In that case #10363 is the one to merge. I have however not tried this out.

thepycoder · 2022-12-05T09:27:50Z

@mikel-brostrom Is right, #10363 is the one to use here (Sorry for stealing your spotlight @mikel-brostrom thanks so much for reporting!)

I've just updated it though, according to @glenn-jocher 's comments on it.

mikel-brostrom · 2022-12-05T12:25:04Z

Sorry for stealing your spotlight @mikel-brostrom thanks so much for reporting!

np @thepycoder
glad to help 😄

mikel-brostrom · 2022-12-05T12:25:47Z

closing this down

glenn-jocher · 2022-12-05T20:13:01Z

@mikel-brostrom good news 😃! Your original issue may now be fixed ✅ in PR #10363 by @thepycoder.

To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

mikel-brostrom and others added 4 commits October 20, 2022 09:08

Ask if overwrite existing ClearML experiment

9742cbc

Signed-off-by: Mike <yolov5.deepsort.pytorch@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

29af8e4

for more information, see https://pre-commit.ci

added comment

8a6b7c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

9834038

for more information, see https://pre-commit.ci

mikel-brostrom and others added 2 commits October 20, 2022 19:04

Add timeout on user input

f055258

[pre-commit.ci] auto fixes from pre-commit.com hooks

d808800

for more information, see https://pre-commit.ci

mikel-brostrom added 9 commits October 20, 2022 19:20

f-string not needed

e5fa0d2

delete unecessary pass

1afc7ab

Merge branch 'master' into patch-1

e48d50e

Merge branch 'master' into patch-1

c193032

Merge branch 'master' into patch-1

ce98b5e

Merge branch 'master' into patch-1

7a5b3ab

Merge branch 'master' into patch-1

b2f8357

Merge branch 'master' into patch-1

085170e

Merge branch 'master' into patch-1

e289dee

Signed-off-by: Mike <yolov5.deepsort.pytorch@gmail.com>

Merge branch 'master' into patch-1

b2b5984

thepycoder mentioned this pull request Dec 1, 2022

Allow users to specify how to override a ClearML Task #10363

Merged

Merge branch 'master' into patch-1

be311c5

Merge branch 'master' into patch-1

f28f4da

glenn-jocher assigned mikel-brostrom and thepycoder Dec 3, 2022

glenn-jocher added the TODO High priority items label Dec 3, 2022

mikel-brostrom closed this Dec 5, 2022

glenn-jocher removed the TODO High priority items label Dec 5, 2022

Uh oh!

Conversation

mikel-brostrom commented Oct 20, 2022 • edited by UltralyticsAssistant Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

Uh oh!

AyushExel commented Oct 20, 2022

Uh oh!

glenn-jocher commented Oct 20, 2022

Uh oh!

mikel-brostrom commented Oct 20, 2022

Uh oh!

AyushExel commented Oct 20, 2022

Uh oh!

mikel-brostrom commented Oct 20, 2022

Uh oh!

thepycoder commented Nov 28, 2022

Uh oh!

mikel-brostrom commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AyushExel commented Nov 29, 2022

Uh oh!

AyushExel commented Nov 29, 2022

Uh oh!

thepycoder commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AyushExel commented Nov 30, 2022

Uh oh!

thepycoder commented Dec 1, 2022

Uh oh!

thepycoder commented Dec 1, 2022

Uh oh!

glenn-jocher commented Dec 1, 2022

Uh oh!

glenn-jocher commented Dec 3, 2022

Uh oh!

mikel-brostrom commented Dec 4, 2022

Uh oh!

thepycoder commented Dec 5, 2022

Uh oh!

mikel-brostrom commented Dec 5, 2022

Uh oh!

mikel-brostrom commented Dec 5, 2022

Uh oh!

glenn-jocher commented Dec 5, 2022 • edited by UltralyticsAssistant Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikel-brostrom commented Oct 20, 2022 •

edited by UltralyticsAssistant

Loading

mikel-brostrom commented Nov 29, 2022 •

edited

Loading

thepycoder commented Nov 29, 2022 •

edited

Loading

glenn-jocher commented Dec 5, 2022 •

edited by UltralyticsAssistant

Loading