Skip to content

Conversation

@pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Mar 14, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Closes #8478
Closes #9348

@pmrowla pmrowla self-assigned this Mar 14, 2023
@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 14, 2023

current state:
asciicast

The rows marked Running are displaying the current/live state of the tempdir for the running experiments. Those rows are marked workspace (<exp_name>) for now, but workspace is a placeholder here (not really sure how to those rows should be labeled). For active checkpoint runs, you would get the typical grouping of finished checkpoint commits with a tip labeled workspace (exp_name)

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 14, 2023

The new json format currently looks like:

[
  {
    "rev":"workspace",
    "name":null,
    "data":{
      "rev":"workspace",
      "timestamp":null,
      "params":{
        "params.yaml":{
          "data":{
            "prepare":{
              "split":0.2,
              "seed":20170428
            },
            "featurize":{
              "max_features":400,
              "ngrams":2
            },
            "train":{
              "seed":20170428,
              "n_est":50,
              "min_split":0.01
            }
          }
        }
      },
      "metrics":{...},
      "deps":{...},
      "outs":{...}
    },
    "error":null,
    "children":null
  },
  {
    "rev":"079882fbd3281bd26983fb5768db5d4628e3a34b",
    "name":"main",
    "data":{
      "rev":"079882fbd3281bd26983fb5768db5d4628e3a34b",
      "timestamp":"2023-02-20T13:36:53",
      "params":{...},
      "metrics":{...},
      "deps":{...},
      "outs":{...},
    "error":null,
    "children":[
      {
        "revs":[
          {
            "rev":"e8f17cec96b03357fbb87af553c335009019a7e8",
            "name":"swish-purl",
            "data":{
              "rev":"e8f17cec96b03357fbb87af553c335009019a7e8",
              "timestamp":"2023-03-14T19:32:22",
              "params":{...},
              "metrics":{...},
              "deps":{...},
              "outs":{...},
            "error":null,
            "children":null
          }
        ],
        "executor":null,
        "name":"swish-purl"
      },
      ...
    ]
  }
]

There are two basic dict structures here, and the entire table/mapping of baseline+exp revs can is represented by nesting them:

ExpState (git commit state)

{
  rev: str // revision (git sha or 'workspace')
  name: str // optional name that identifies rev (may be an exp name or git tag/branch name)
  data: {} // DVC repo data dictionary for this rev (params/metrics/deps/outs, data matches the current format for these dicts)
  error: {} // optional error info for this rev
  children: [] // list of ExpRanges that branch off of this rev
}

If we add plots support for vscode, we would probably want to add something like plots-datapoints in ExpState.data

ExpRange (ordered range of git (or exp) commits that should be grouped together)

{
  revs: [] // list of ExpStates containing the actual git/exp commits in this range
  name: str // optional name that applies to this entire range (i.e. an exp name that may apply to an entire checkpoint run)
  executor: {} // optional dictionary containing executor state
}

For regular dvc experiments, revs will just be a list of length 1. For checkpoints, revs would contain all checkpoint commits in the run. revs may also contain an additional workspace entry denoting the live/current uncommitted workspace state for an actively running experiment.

Example executor states:

  • failed experiment:
"executor":{"state":"failed"}
  • queued experiment:
"executor":{
  "state":"queued",
  "name":null,
  "local":null
}
  • running experiment:
"executor":{
  "state":"running",
  "name":"dvc-task",
  "local":{
    "root":"/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/tmpp07ed2zm",
    "log":"/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/run/4a75a0bf....out",
    "pid":2036,
    "returncode":null
  }
}

For our current purposes, DVC experiments won't ever have children. Checkpoints that branch off of another checkpoint will be a sibling of the original checkpoint instead (and a child of the baseline commit), but in theory we could more accurately represent the git tree structure by doing additional recursive nesting.

@mattseddon I think this covers everything vscode currently consumes (aside from plots), but I'm not really tied to this data schema so feel free to suggest/request any changes

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 14, 2023

For testing purposes, the current PR should handle all of the current CLI use cases except for the --sort- related parameters. (vscode won't work because of the changed --json format)

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 14, 2023

One other thing to note is that with this PR git fetch for running experiments is disabled by default now (so what used to be exp show --no-fetch). Data for live/active experiments is collected without git fetch directly from the tempdir, so exp show will no longer update .git refs, which should help with performance/stability, exp show should no longer trigger follow up exp show calls when used from vscode.

(This does mean that vscode filewatcher will have to watch executor.local.root for metrics file changes instead)

@dberenbaum
Copy link
Contributor

The rows marked Running are displaying the current/live state of the tempdir for the running experiments. Those rows are marked workspace (<exp_name>) for now, but workspace is a placeholder here (not really sure how to those rows should be labeled). For active checkpoint runs, you would get the typical grouping of finished checkpoint commits with a tip labeled workspace (exp_name)

I think we previously showed the task id here, which I think is the sha of the queued changes? Is that still what we show for failed experiments? If so, can we keep that like <task_id> [<exp_name>]? That sha may not reflect the current state, but I guess the same is true for failed experiments.

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

I think we previously showed the task id here, which I think is the sha of the queued changes? Is that still what we show for failed experiments? If so, can we keep that like <task_id> [<exp_name>]? That sha may not reflect the current state, but I guess the same is true for failed experiments.

Yeah we can do this, but I wasn't sure if we wanted some better way of indicating that a row was live/current and not a commit.

The task ID is technically the SHA for the queued experiment stash commit. For failed experiments it makes sense to display that SHA, since it also allows the user to see the params/deps settings from that queued stash commit (so they can determine if the params/deps were related to why it failed)

But in the live/running case we'd be displaying the active (tempdir) workspace state and not the queued state, but also still displaying the same label as the queued state

@mattseddon
Copy link
Contributor

I went through the points that I listed in #8478 (comment) and tried to group bullet points with things that were mentioned above.

  • Status is used to determine the state an experiment is in. We actually aggregate this data to work out whether there is a single experiment running and then stop the user from performing the majority of other experiment actions whilst that is happening.

may also contain an additional workspace entry denoting the live/current uncommitted workspace state for an actively running experiment

How will we determine that a non-checkpoint experiment with a logger attached (e.g DVCLive) is running in the workspace? What will the data structure look like? Would an example of an ExpState entry running in the workspace look like this:

...
{
  rev: '079882fbd3281bd26983fb5768db5d4628e3a34b',
  name: 'main',
  data: {
    ...
  },
  error: null,
  children: [
    {
      revs: [
        {
          ...
          rev: 'e8f17cec96b03357fbb87af553c335009019a7e8',
          name: 'swish-purl',
          data: {
            ...
          }
        }
      ],
      executor: {
        state: 'running',
        name: 'workspace',
        local: ?,
        pid: 1234
      },
      ...
      name: 'swish-purl'
    }
  ]
}

If yes will there be some associated UI changes? Or will this information still be in the workspace ExpState entry?

For checkpoints, revs would contain all checkpoint commits in the run.

Is revs an ordered list with the tip as the 1st entry? Will each rev in the ExpRange still contain fields like checkpoint_tip/checkpoint_parent?

  • We recreate logic that is in DVC to add [ ] or ( ) to the displayed name of experiments e.g f596aa8 [mixed-sacs].

How difficult would it be to add in a displayName or similar?

  • Whether or not there are checkpoint experiments in the workspace by reading all available dvc.yamls.

Is it possible to get this from the data if children is empty? Would it be difficult to add this information?

But in the live/running case we'd be displaying the active (tempdir) workspace state and not the queued state, but also still displaying the same label as the queued state

Could display <tempdir> [<exp_name>] or queue [<exp_name>] or temp [<exp_name>]

Would there be any reason to include the original task_id in the data?

[N] One thing that jumped out just from looking at the data is that rev is duplicated under the data key:

    "rev":"079882fbd3281bd26983fb5768db5d4628e3a34b",
    "name":"main",
    "data":{
      "rev":"079882fbd3281bd26983fb5768db5d4628e3a34b",

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

How will we determine that a non-checkpoint experiment with a logger attached (e.g DVCLive) is running in the workspace? What will the data structure look like? Would an example of an ExpState entry running in the workspace look like this:
If yes will there be some associated UI changes? Or will this information still be in the workspace ExpState entry?

Workspace runs will still have a local executor state. Essentially there's now a difference between the top level workspace commit, and the running experiment. They will have the same params/metrics data, but the top level workspace just raw state, and has no experiment/executor information attached to it.

So when something is running in the workspace it will look like:

[
  {
    "rev":"workspace",
    "name":null,
    "data":{},
    "error":null,
    "children":null
  },
  {
    "rev":"079882fbd3281bd26983fb5768db5d4628e3a34b",
    "name":null,
    "data":{},
    "error":null,
    "children":[
      {
        "revs":[
          {
            "rev":"workspace",
            "name":"dormy-akes",
            "data":{
              "rev":"workspace",
              "timestamp":null,
              "params":{},
              "metrics":{},
              "deps":{},
              "outs": {}
            },
            "error":null,
            "children":null
          }
        ],
        "executor":{
          "state":"running",
          "name":"workspace",
          "local":{
            "root":"/Users/pmrowla/git/example-get-started",
            "log":null,
            "pid":null,
            "returncode":null
          }
        },
        "name":"dormy-akes"
      }
    ]
  }
]

In the CLI table this now looks like:
Screenshot 2023-03-15 at 3 25 34 PM

So the top level/bolded workspace is always just the raw current workspace state (whether or not the user is running anything).

The actual active experiment is tracked separately (workspace [dormy-akes] here). If this was a checkpoint run, children[0].revs would contain multiple experiments, with workspace [dormy-akes] at the top, and then each additional committed checkpoint (with all of them grouped by the same workspace executor information)

There's still some bugs here, 07988f should show up as main, and executor.local.pid isn't filled in right now for workspace runs but it should be (but there will be no logs entry).

We could also consider displaying a stash commit SHA instead of workspace for the experiment line as well, but the SHA wouldn't actually correspond to a task ID so I'm not sure if it would be useful.

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

Is revs an ordered list with the tip as the 1st entry? Will each rev in the ExpRange still contain fields like checkpoint_tip/checkpoint_parent?

Yes, revs is now an ordered list, with revs[0] as tip and revs[len(revs) - 1] as the base.

I dropped the use of checkpoint_tip and checkpoint_parent entirely with this PR, so you do not need to display the (<parent>) any more.

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

How difficult would it be to add in a displayName or similar?

We could do this but display name in the CLI is just rev [name] for nested experiments (anything inside children). For top level commits we just prefer name over rev when possible.

(Honestly I don't think vscode should even feel tied to this display format, given that you have more UI space and could always display both revs and names if you wanted)

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

Is it possible to get this from the data if children is empty? Would it be difficult to add this information?

It's not possible right now, but we could add fields in data to indicate things about the repo/pipeline config that vscode wants to know

Maybe something like

{"data": {"meta": {"has_checkpoints": true}}}

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

Would there be any reason to include the original task_id in the data?

If you think it could be useful in vscode I can add this to the executor.local data

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 15, 2023

[N] One thing that jumped out just from looking at the data is that rev is duplicated under the data key:

This is intentional for now as it is useful for how things are organized and cached internally in DVC, but for vscode's purposes the outer rev and data.rev will always match

@mattseddon
Copy link
Contributor

Is revs an ordered list with the tip as the 1st entry? Will each rev in the ExpRange still contain fields like checkpoint_tip/checkpoint_parent?

Yes, revs is now an ordered list, with revs[0] as tip and revs[len(revs) - 1] as the base.

I dropped the use of checkpoint_tip and checkpoint_parent entirely with this PR, so you do not need to display the (<parent>) any more.

How difficult would it be to add in a displayName or similar?

We could do this but display name in the CLI is just rev [name] for nested experiments (anything inside children). For top level commits we just prefer name over rev when possible.

(Honestly I don't think vscode should even feel tied to this display format, given that you have more UI space and could always display both revs and names if you wanted)

This is all good stuff, means I can drop a lot of replicated logic from the extension (especially code relating to adding (<parent>) to the UI). For now, we'll continue to follow the format that is used in the CLI. We can revisit this later.

In the CLI table this now looks like: Screenshot 2023-03-15 at 3 25 34 PM

πŸ™πŸ» Thanks for explaining. I'll update our UI as well.

[Q] (only partially related). Now that we are collecting data from temporary directories. Is there much of a reason to provide the option to run experiments in the workspace? Should --temp be the default now?

There's still some bugs here, 07988f should show up as main, and executor.local.pid isn't filled in right now for workspace runs but it should be (but there will be no logs entry).

πŸ‘πŸ»

Is it possible to get this from the data if children is empty? Would it be difficult to add this information?

It's not possible right now, but we could add fields in data to indicate things about the repo/pipeline config that vscode wants to know

Maybe something like

{"data": {"meta": {"has_checkpoints": true}}}

This would be very useful.

Would there be any reason to include the original task_id in the data?

If you think it could be useful in vscode I can add this to the executor.local data

Sounds good.

@mattseddon
Copy link
Contributor

mattseddon commented Mar 15, 2023

My primary focus right now is making sure that I can integrate #9146. That should only take a couple of days. After that, I'll be able to work on consuming this new format and making the appropriate updates. That work will give me a lot more insights on the shape of the API.

Mentioning this because it would be good to coordinate releasing this πŸ™πŸ».

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 16, 2023

[Q] (only partially related). Now that we are collecting data from temporary directories. Is there much of a reason to provide the option to run experiments in the workspace? Should --temp be the default now?

There's certain repo/pipeline setups where tempdir runs don't work, mostly due to users relying on relative paths to files that aren't tracked by git or DVC and aren't actually listed as pipeline dependencies, so keeping workspace support is probably still required.

@dberenbaum
Copy link
Contributor

Re: plots, @skshetry do you think collecting live plots for queued experiments should be part of this or should we do it some other way? Should dvc plots diff be able to accept queue task ids and collect live metrics for those?

@daavoo If we expose this as an API, could we use it in get started to analyze the completed experiments in the notebook?

@mattseddon
Copy link
Contributor

mattseddon commented Mar 17, 2023

Is revs an ordered list with the tip as the 1st entry? Will each rev in the ExpRange still contain fields like checkpoint_tip/checkpoint_parent?

Yes, revs is now an ordered list, with revs[0] as tip and revs[len(revs) - 1] as the base.

I dropped the use of checkpoint_tip and checkpoint_parent entirely with this PR, so you do not need to display the (<parent>) any more.

Regarding resuming experiments.

In the extension we use checkpoint_parent/checkpoint_tip to generate the continuation from a resumed experiment shown in the Trends section plots below. The plots shown in the Data Series are generated by data that comes out of plots diff. birch-math was resumed from sedgy-orle.

image

With the dropping of the two fields will the entry for [birch-math] contain all of the checkpoints from the experiment that was resumed?

Asking because it would be great to drop the continuation logic in treeverse/vscode-dvc#3466.

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 17, 2023

With the dropping of the two fields will the entry for [birch-math] contain all of the checkpoints from the experiment that was resumed?

Yes, after the changes the resumed experiment now just contains the duplicated revisions. So each exp can be handled separately, you don't have to worry about the continuation stuff.

In the CLI it looks like:
Screenshot 2023-03-17 at 11 31 11 AM
json output: https://gist.github.com/pmrowla/d0d7b56f07130001bacabca55b05e4ec

(where muggy-judo was resumed from tiled-puku after 6e0a930)

@skshetry
Copy link
Collaborator

Re: plots, @skshetry do you think collecting live plots for queued experiments should be part of this or should we do it some other way? Should dvc plots diff be able to accept queue task ids and collect live metrics for those?

@dberenbaum, is there anything more to live plots than dvc plots show --cd /tmp-workspace --json?

Since it is part of experiments, it feels like it should be part of experiments, and plots --json should probably not support task id. But no strong opinion.

@dberenbaum
Copy link
Contributor

@dberenbaum, is there anything more to live plots than dvc plots show --cd /tmp-workspace --json?

I think that pretty much covers it, except that it is needed for multiple queued experiments at once.

I don't think there's any strong product reason to prefer one implementation over another here since I doubt users will bother monitoring live plots from the CLI.

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 6, 2023

I think the executor typing should be something like

type Executor =
  | {
      state: 'queued'
      name: null
      local: null
    }
  | {
      state: 'running' | 'failed' | 'success'
      name: 'dvc-task' | 'workspace' | null
      local: {
        root: string
        log: string
        task_id: string
        pid: number
        returncode: null
      } | null
    }
  | null // assume success for null

Whether or not failed/success includes the local executor information depends on whether or not the task queue data is still available for a given experiment (the task queue logs/pid information/etc can be removed without removing the actual exp git commit/ref)

Other than that the types look correct. You can double check things against typed DVC dataclasses as well: https://github.com/pmrowla/dvc/blob/exp-live-metrics/dvc/repo/experiments/serialize.py

@mattseddon
Copy link
Contributor

[Q] Is this a known bug for checkpoint experiments running in the workspace:

image

Revs are duplicated under a workspace record and the tip

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 17, 2023

That's definitely a bug, I'll look into it today

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 17, 2023

@mattseddon I can't reproduce that state w/this PR rebased against the latest DVC main

@mattseddon
Copy link
Contributor

@pmrowla I'm unable to call plots diff with the name of an experiment running in the workspace (command returns failed with ERROR: unknown Git revision 'name'). Is that behaviour expected? Should I be avoiding doing that?

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 19, 2023

@pmrowla I'm unable to call plots diff with the name of an experiment running in the workspace (command returns failed with ERROR: unknown Git revision 'name'). Is that behaviour expected? Should I be avoiding doing that?

It happens because there is no experiment commit generated yet (and no experiment ref with that name). Is there a reason you need to pass the actual name? Just doing dvc plots diff --json (with no revs) should work in this case.

@mattseddon
Copy link
Contributor

@pmrowla I'm unable to call plots diff with the name of an experiment running in the workspace (command returns failed with ERROR: unknown Git revision 'name'). Is that behaviour expected? Should I be avoiding doing that?

It happens because there is no experiment commit generated yet (and no experiment ref with that name). Is there a reason you need to pass the actual name? Just doing dvc plots diff --json (with no revs) should work in this case.

When an experiment starts running in the workspace we auto-select it for plotting. We had logic for checkpoint experiments that would select the workspace under that condition. I was hoping to remove/simplify that logic but we can keep it as a workaround.

All that means we end up with this behaviour:

Screen.Recording.2023-04-19.at.3.49.04.pm.mov

@mattseddon
Copy link
Contributor

We should be ready to accept these changes on the VS Code side tomorrow. Can we schedule a release soon? πŸ™πŸ»

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 19, 2023

@mattseddon I just need to update some tests in this PR and then we can merge/release on the DVC side, should be able to push a release out tomorrow

@mattseddon
Copy link
Contributor

mattseddon commented Apr 19, 2023

Found an edge case where running an experiment in the workspace along with the queue stops the record in the workspace from being shown until the queue finishes processing:

Screen.Recording.2023-04-20.at.8.42.50.am.mov

bijou-eggs (workspace) does not show up until snowy-snips (queue) finishes.

Not a blocker for me but others may disagree. LMK if you want me to raise a separate issue to track πŸ™πŸ».

Edit: I did also confirm this happens via CLI only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed tests here are all duplicated tests/func/experiments/test_show or no longer needed.

The output format specific CLI unit tests are no longer needed, since all of the output is now generated as a standard rich table and then converted to csv/md using rich. So what we need to test in DVC is that the correct flags to generate the expected table are passed into experiments.show from the CLI, and that we generate the table data correctly (and not that rich knows how to output properly formatted markdown)

@pmrowla pmrowla marked this pull request as ready for review April 20, 2023 07:10
@pmrowla pmrowla changed the title [WIP] exp: refactor show behavior exp: refactor show behavior Apr 20, 2023
@pmrowla pmrowla added ui user interface / interaction A: experiments Related to dvc exp labels Apr 20, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tests for functions that no longer exist after the standardized collection/data structure changes

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Patch coverage: 85.73% and project coverage change: -0.29 ⚠️

Comparison is base (10f5234) 92.94% compared to head (8ff6290) 92.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9170      +/-   ##
==========================================
- Coverage   92.94%   92.66%   -0.29%     
==========================================
  Files         461      461              
  Lines       37317    37199     -118     
  Branches     5372     5370       -2     
==========================================
- Hits        34685    34470     -215     
- Misses       2097     2187      +90     
- Partials      535      542       +7     
Impacted Files Coverage Ξ”
dvc/repo/experiments/queue/tempdir.py 66.31% <29.16%> (-15.63%) ⬇️
dvc/repo/experiments/queue/celery.py 81.71% <61.81%> (-6.10%) ⬇️
dvc/repo/experiments/collect.py 82.80% <82.80%> (ΓΈ)
dvc/commands/experiments/show.py 89.83% <83.33%> (-3.74%) ⬇️
dvc/repo/experiments/queue/workspace.py 79.02% <91.48%> (-3.02%) ⬇️
dvc/repo/experiments/serialize.py 85.71% <92.00%> (+1.56%) ⬆️
dvc/repo/experiments/show.py 92.94% <92.85%> (+2.48%) ⬆️
dvc/commands/queue/status.py 100.00% <100.00%> (ΓΈ)
dvc/repo/experiments/diff.py 89.47% <100.00%> (ΓΈ)
dvc/repo/experiments/queue/base.py 86.56% <100.00%> (+0.25%) ⬆️
... and 3 more

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

β˜” View full report in Codecov by Sentry.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 20, 2023

Found an edge case where running an experiment in the workspace along with the queue stops the record in the workspace from being shown until the queue finishes processing:

This will be fixed in the release (issue was replacing the list of workspace runs with the list of queue runs instead of concatenating the two lists together)

@pmrowla pmrowla merged commit ec090c5 into treeverse:main Apr 20, 2023
@pmrowla pmrowla deleted the exp-live-metrics branch April 20, 2023 07:58
@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 20, 2023

This is released in DVC 2.55.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp breaking-change ui user interface / interaction

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp: include names for running experiments exp queue: live metrics

4 participants