Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hash, not experiment names in all DVC commands and as IDs in the table #3945

Open
shcheklein opened this issue May 21, 2023 · 9 comments
Labels
A: experiments Area: experiments table webview and everything related A: integration Area: DVC integration layer bug Something isn't working

Comments

@shcheklein
Copy link
Member

shcheklein commented May 21, 2023

E.g. /usr/local/bin/python -m dvc plots diff workspace testy-leno.

We should not be using testy-leno - it's ambiguous.

Only if we don't know the hash yet it's fine to use name (or even better - some other id? how does DVC handle a collision like this? how does it identify them)?

We can't use these names as IDs in the table either. Now it seems action selection and other stuff works for duplicates simultaneously.

Read this thread for the context: https://iterativeai.slack.com/archives/CB41NAL8H/p1684707770045189 . We easily hit duplicates since that repo sets random seed (quite regular story in ML) and our experiment names repeat as we add more commits. Here is the logic how they are generated:

https://github.com/iterative/dvclive/blob/main/src/dvclive/dvc.py#L150

@shcheklein shcheklein added the A: integration Area: DVC integration layer label May 21, 2023
@mattseddon
Copy link
Member

You can still get collisions on the short hash. Are you suggesting using the full hash as the ID?

@mattseddon
Copy link
Member

We also currently do not touch the stdout when providing updates back to the user in the form of toast notifications. We would need to update the output after switching to the hash.

I.e the following would be much less useful with the full hash instead of experiment name:

image

@shcheklein
Copy link
Member Author

We would need to update the output after switching to the hash.

No need for this - one thing is to see some ambiguous text, another one is not being able to run an action (and overall this is DVC's issue, not extension's).

You can still get collisions on the short hash.

This is I'm less worried to be honest - this is indeed rare.

@shcheklein shcheklein changed the title Use hash, not experiment names in all DVC commands Use hash, not experiment names in all DVC commands and as IDs in the table May 22, 2023
@shcheklein shcheklein added A: experiments Area: experiments table webview and everything related priority-p1 Regular product backlog labels May 22, 2023
@shcheklein
Copy link
Member Author

Update the story a bit with more context. Set p1 unless we decide to change the logic on name gen in DVC / DVCLive to use an independent random generator.

@shcheklein
Copy link
Member Author

Made a PR in DVCLive to reduce the possibility https://github.com/iterative/dvclive/pull/576/files . If merged it becomes less priority, but still a bug - we can't use names as unique IDs.

@mattseddon
Copy link
Member

iterative/dvc#9490 has been merged on the DVC side.

@shcheklein shcheklein added bug Something isn't working and removed priority-p1 Regular product backlog labels May 22, 2023
@shcheklein
Copy link
Member Author

Put a bug label, removed p1 for now.

@mattseddon
Copy link
Member

The discussion in iterative/dvc#9485 is related

@aschuh-hf
Copy link

I also ran into this issue when I wanted to remove an experiment that I had saved under the same name with respect to the latest branch HEAD. Basically I did a manual "rebase", but then used the "Experiments" context menu for convenience to remove the original experiment.

dvc exp apply olive-vies
git add . & git commit -m "..."
dvc exp save olive-vies

The experiment that was deleted was the latest, though I right-clicked on the older one. If I had done it using the CLI, it would have been obvious that the name was ambiguous. But using the VS Code context menu I assumed it would then know which one I want to delete given I right-clicked on it's entry in the table.

It seems if one was able to use --rev alongside the experiment name, the ambiguity in my case could be worked around:

$ dvc exp remove --rev 2c85e5f olive-vies
ERROR: Either provide an `experiment` argument, or use the `--rev` or `--all-commits` or `--queue` flag.

For now, I have to remove the old experiment before creating the updated new one:

dvc exp apply olive-vies
git add . & git commit -m "..."
dvc exp remove olive-vies
dvc exp save olive-vies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Area: experiments table webview and everything related A: integration Area: DVC integration layer bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants