Skip to content

Ask if overwrite existing ClearML experiment#9865

Closed
mikel-brostrom wants to merge 18 commits intoultralytics:masterfrom
mikel-brostrom:patch-1
Closed

Ask if overwrite existing ClearML experiment#9865
mikel-brostrom wants to merge 18 commits intoultralytics:masterfrom
mikel-brostrom:patch-1

Conversation

@mikel-brostrom
Copy link
Copy Markdown

@mikel-brostrom mikel-brostrom commented Oct 20, 2022

Overwriting experiment data without asking is very painful and a huge waste of time. This can be solved by a simple y/n question.

If a task with the chosen name already exists in ClearML, ask if you want to overwrite it.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced ClearML integration with a timeout feature for avoiding experiment overwrite in YOLOv5.

📊 Key Changes

  • Introduced a Timeout decorator to prevent hanging during user input.
  • Added a check to avoid overwriting existing ClearML experiments.
  • Implemented a default timeout of 20 seconds for user confirmation when a potential experiment overwrite is detected.

🎯 Purpose & Impact

  • 🎨 Enhances User Experience: By introducing a timeout, the change helps avoid accidental overwrites and enhances user control without stalling the process.
  • 🔒 Increases Robustness: Preventing accidental overwriting ensures that previous experiment data remains intact, which is valuable for data integrity.
  • 💡 Streamlines Workflow: This automated check can streamline experiments by notifying users of possible naming conflicts, fostering a smoother workflow when using ClearML with YOLOv5.

@AyushExel
Copy link
Copy Markdown
Contributor

@mikel-brostrom nice idea. But I'm not sure if we should have a function that stalls forever for input? Instead we should have a this function time out after a fixed interval if user doesn't respond and go ahead with the default setting. What do you think?

@glenn-jocher
Copy link
Copy Markdown
Member

See Timeout class in utils/general

    # YOLOv5 Timeout class. Usage: @Timeout(seconds) decorator or 'with Timeout(seconds):' context manager

@mikel-brostrom
Copy link
Copy Markdown
Author

Let's add @timeout(10) on __init__ under ClearmlLogger? Should we change the warning message as well then? What do you think @AyushExel & @glenn-jocher?

@AyushExel
Copy link
Copy Markdown
Contributor

Maybe add 20sec? 10 sec will get users like me anxious 😆
Yeah for message something like this -
Overwrite existing CLearML experiment? (y/n): 20sec timeout set
and after timeout
Timed out. Using 'y'

@mikel-brostrom
Copy link
Copy Markdown
Author

Ok, let's do 20 xD

@thepycoder
Copy link
Copy Markdown
Contributor

Busting in here to add my 2 cents:

ClearML will not by default overwrite an experiment (even one with the same name) UNLESS it has no artifacts/model files etc. This was originally designed to not clutter the interface with tasks that have no value to the user.

If you don't like this behaviour, the easier way to stop this from happening is to add reuse_last_task_id=False to the ClearML task init call. @mikel-brostrom would adding this argument to Task.init fix the issue for you?

If so, we could add it like this by default or allow the user to pass a new command line argument to set this?

@mikel-brostrom
Copy link
Copy Markdown
Author

mikel-brostrom commented Nov 29, 2022

The problem I see is that sometimes you may want to interrupt a training with partial results as they may be enough for you particular experiment. Specially for us that are low on GPUs. This means that no artifacts/models gets uploaded which leads to this issue. I think reuse_last_task_id=False would be enough @thepycoder, yes. If this should be the default behavior or configured by a command line argument I think is a question for @glenn-jocher or @AyushExel

@AyushExel
Copy link
Copy Markdown
Contributor

@thepycoder @mikel-brostrom yeah I'd also prefer reuse_last_task_id=False. But then can we also manually reuse a previous id by passing it somehow?

@AyushExel
Copy link
Copy Markdown
Contributor

The way I see it is that by default the experiments should not be overwritten but I should be able to overwrite them if I want to

@thepycoder
Copy link
Copy Markdown
Contributor

thepycoder commented Nov 29, 2022

I agree, there 2 parts of this though:

  1. Set reuse_last_task_id=False and allow users to override it with an arg if they want to. I think we are in agreement that this makes sense to do.

  2. Being able to override a specific task by its ID does not really work out of the box, it overrides the latest (if it had no model files) or none at all. We CAN use a specific ID to resume training, provided there is a "latest" model that was captured. But this is a completely different feature :)

EDIT: Actually overriding a task by its ID is in fact possible! I don't know it it's the best way to go, but it is available should you think it a better option @AyushExel

@AyushExel
Copy link
Copy Markdown
Contributor

@thepycoder
I think if overwriting a task by its ID is complicated then let's just go with method 1 and not overwrite anything by default. What do you think?

@thepycoder
Copy link
Copy Markdown
Contributor

I'll get right to it :)

@thepycoder
Copy link
Copy Markdown
Contributor

@mikel-brostrom @AyushExel I've made a PR request that sets the default behaviour to not override, allows you to override it either with the newest task or a specific one by ID. Thanks to ClearML already supporting this it was straightforward to add :)

Thanks all!

@glenn-jocher
Copy link
Copy Markdown
Member

@thepycoder @mikel-brostrom hi guys, sorry for the delay here. Is this PR good to go? I see a related PR in #10363, do these work together or should I do one or ther other?

@glenn-jocher glenn-jocher added the TODO High priority items label Dec 3, 2022
@glenn-jocher
Copy link
Copy Markdown
Member

@mikel-brostrom @thepycoder I'd like to close this out along with #10363. Are these two PRs related, should we do one or the other? Can you guys please let me know the status of these two? Thanks!

Also as I mentioned in #10363 we have an existing --exist-ok argument that is meant to allow for re-use of existing projects:
Screenshot 2022-12-03 at 15 08 45

@mikel-brostrom
Copy link
Copy Markdown
Author

Correct me if I am wrong @thepycoder, but your PR solves my issue stated here #9865 as well right, but with the internal functionality in ClearML instead of my workaround? In that case #10363 is the one to merge. I have however not tried this out.

@thepycoder
Copy link
Copy Markdown
Contributor

@mikel-brostrom Is right, #10363 is the one to use here (Sorry for stealing your spotlight @mikel-brostrom thanks so much for reporting!)

I've just updated it though, according to @glenn-jocher 's comments on it.

@mikel-brostrom
Copy link
Copy Markdown
Author

Sorry for stealing your spotlight @mikel-brostrom thanks so much for reporting!

np @thepycoder
glad to help 😄

@mikel-brostrom
Copy link
Copy Markdown
Author

closing this down

@glenn-jocher glenn-jocher removed the TODO High priority items label Dec 5, 2022
@glenn-jocher
Copy link
Copy Markdown
Member

glenn-jocher commented Dec 5, 2022

@mikel-brostrom good news 😃! Your original issue may now be fixed ✅ in PR #10363 by @thepycoder.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Run on Gradient Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants