add metadata put in workflow #19195

lchu6 · 2021-10-07T19:33:54Z

Why are these changes needed?

Add metadata to workflow. Currently there is no option for user to attach any metadata to a step or workflow run, and workflow running metrics (except status) are not captured nor checkpointed.

We are adding various of metadata including:

step-level user metadata. can be set with step.options(metadata={})
step-level pre-run metadata. this captures pre-run metadata such as step_start_time, more metrics can be added later.
step-level post-run metadata. this captures post-run metadata such as step_end_time, more metrics can be added later.
workflow-level user metadata. can be set with workflow.run(metadata={})
workflow-level pre-run metadata. this captures pre-run metadata such as workflow_start_time, more metrics can be added later.
workflow-level post-run metadata. this captures post-run metadata such as workflow_end_time, more metrics can be added later.

Related issue number

#17090

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/workflow/common.py

python/ray/workflow/execution.py

fishbone · 2021-10-07T21:37:08Z

I think the high-level direction looks good, but we probably can improve the structure a little bit.

Here there are three metadata for workflow & step (6 in total):

For step metadata

user meta
- do it in commit step
pre step meta
- do it in commit step
post step meta
- do it in checkpoint output

For workflow metadata

user meta
- do it in run/run_async
pre meta/post meta
- do it in workflow access

Let's focus on step metadata first and then we can get to workflow metadata later.

fishbone

Thanks for the contribution! The highlevel direction looks good! Let's fix the comments and add some test and go with another round of review.

lchu6 · 2021-10-07T23:45:48Z

@iycheng thanks for the suggestion.

user meta (do it in commit step)
I have it in here:

ray/python/ray/workflow/step_executor.py

Line 209 in 76e2130

wf_storage._put(

which is inside _write_step_inputs that commit step uses. I think this is good as it is together with all other write input? or you are suggesting to do it somewhere else?
pre step meta (do it in commit step)
I have it here:

ray/python/ray/workflow/step_executor.py

Line 365 in 76e2130

asyncio.get_event_loop().run_until_complete(store._put(

right before the actually step running _wrap_run in the next line. Is the suggestion putting it inside

ray/python/ray/workflow/step_executor.py

Line 371 in 76e2130

commit_step(store, step_id, None, e, outer_most_step_id)

and

ray/python/ray/workflow/step_executor.py

Line 381 in 76e2130

commit_step(store, step_id, persisted_output, None, outer_most_step_id)

?
If so, I would assume:
the start_time should still be recorded before the _wrap_run line, but we keep that value and bring it inside commit_step. (as an extra arg for commit_step I assume?)
post step meta (do it in checkpoint output)
I have it here:

ray/python/ray/workflow/step_executor.py

Line 404 in 76e2130

step_end_metadata = {'end_time': time.time()}

which is after the step_commit. I guess the suggestion is to put it inside the step_commit in the save_step_output portion. Correct?

fishbone · 2021-10-08T18:01:46Z

user meta looks good to me.

I got your point. I think you are right about pre-run metadata and post-run metadata.
Let's add two functions into workflow_storage: save_step_prerun_metadata/save_step_postrun_metadata

and call them before and after wrap run.

ray/python/ray/workflow/step_executor.py

Lines 367 to 369 in 76e2130

    
           persisted_output, volatile_output = _wrap_run( 
        
               func, step_type, step_id, catch_exceptions, max_retries, *args, 
        
               **kwargs)

lchu6 · 2021-10-11T17:38:16Z

@iycheng Update:

For all 6 inputs, here are where they are now:

step_user_metadata is within _write_step_inputs, so it is saved together with all other attributes ofWorkflowData.
The rest 5 are stored with newly created methods: save_step_prerun_metadata, save_step_postrun_metadata, save_workflow_user_metadata, save_workflow_prerun_metadata and save_workflow_postrun_metadata in workflow_storage.
The two step-level ones are put inside where we agreed on - before and after _wrap_run. For workflow-level user metadata, I put it inside execution.run which is where run/asyn_run called from. For workflow-level pre/post run metadata, this is where I am not sure where to put the best inside workflow_access, I currently put pre-meta at the beginning of run_or_resume while post-meta inside update_step_status (i.e. post-meta is recorded whenever FAILED or SUCCESSFUL is captured.) Let me know if you have a better place to put in mind.

Btw, I think the code changes are now much cleaner, thanks to the early feedbacks.

fishbone

Really thanks for the updating! It looks nice! I have some comments there which shouldn't be too hard to fix

let's try to keep workflow storage clean without logic related to the application
except for user-facing API, let's rename metadata to user_metadata if it's from the user. Because for user-facing API, there is only one type of metadata so no confusion there. But internally, we have several kinds of metadata.
once fixed them, please add some test cases you can follow the pattern here (python/ray/workflow/tests/test_basic_workflows_2.py)

python/ray/workflow/common.py

python/ray/workflow/execution.py

python/ray/workflow/step_executor.py

python/ray/workflow/step_function.py

python/ray/workflow/step_executor.py

lchu6 · 2021-10-12T00:28:25Z

@iycheng added tests in 93eb53f.

fishbone · 2021-10-12T01:39:38Z

There is lint failure, could you check this doc and format it?
https://docs.ray.io/en/latest/getting-involved.html#lint-and-formatting

fishbone · 2021-10-12T01:40:25Z

it looks like rllib test failed. It looks like not related to this one. Could you merge to master and push it again?

python/ray/workflow/virtual_actor_class.py

python/ray/workflow/tests/test_metadata_put.py

fishbone · 2021-10-12T02:02:48Z

Everything else looks good! It's almost there and thanks for the work!

fishbone · 2021-10-12T07:11:05Z

lint failure

lchu6 · 2021-10-12T15:16:17Z

lint failure

@iycheng can you help me on this one? I checked all failed checks and there is only one related - lint with the following error:

Warning, treated as error:
--
  | /ray/python/ray/workflow/common.py:docstring of ray.workflow.common.Workflow.run:27:Definition list ends without a blank line; unexpected unindent.

However, scripts/format.sh didn't give me any fix, and I couldn't find any problem with manual check on the docstring of def run.

fishbone · 2021-10-12T17:42:16Z

lint failure

@iycheng can you help me on this one? I checked all failed checks and there is only one related - lint with the following error:
Warning, treated as error:
--
  | /ray/python/ray/workflow/common.py:docstring of ray.workflow.common.Workflow.run:27:Definition list ends without a blank line; unexpected unindent.
However, scripts/format.sh didn't give me any fix, and I couldn't find any problem with manual check on the docstring of def run.

Usually I run ci/travis/format.sh

fishbone

Thanks for the contribution!

fishbone · 2021-10-13T03:09:20Z

Test failure looks unrelated. @lchu-ibm could you give more details in the description part?

lchu6 · 2021-10-13T03:31:34Z

@iycheng done with updating description.

## Why are these changes needed? Quick fix for metadata put. Currently when workflow-level metadata is not given, it will output `null` to `user_run_metadata.json`, this fix will make it output `{}`. ## Related issue number original issue: #17090 original PR: #19195

lchu6 force-pushed the metadata branch from 8207e61 to 76e2130 Compare October 7, 2021 19:36

lchu6 requested a review from fishbone October 7, 2021 19:43

lchu6 assigned fishbone Oct 7, 2021

fishbone reviewed Oct 7, 2021

View reviewed changes

python/ray/workflow/common.py Outdated Show resolved Hide resolved

fishbone reviewed Oct 7, 2021

View reviewed changes

python/ray/workflow/common.py Outdated Show resolved Hide resolved

fishbone reviewed Oct 7, 2021

View reviewed changes

python/ray/workflow/execution.py Outdated Show resolved Hide resolved

fishbone requested changes Oct 7, 2021

View reviewed changes

fishbone assigned wuisawesome Oct 7, 2021

lchu6 requested a review from ericl as a code owner October 11, 2021 16:55

fishbone requested changes Oct 11, 2021

View reviewed changes

fishbone reviewed Oct 11, 2021

View reviewed changes

python/ray/workflow/step_executor.py Outdated Show resolved Hide resolved

lchu6 added 12 commits October 11, 2021 21:42

add metadata put in workflow

63ebaf9

change type hint for step user metadata

263a5cd

put prerun and postrun meta into workflow_storage

d06ab0e

add workflow level metadata put

d618fb4

fix type hint

f685598

re-order code additions

94b4734

change the location of where workflow metadata recorded

7ad2b4c

move application logic out of workflow storage api

882df49

change checkpoint files naming

5aa1b18

fix attribute naming change

bdc54bd

fix type

148bcff

change metadata file names

a0cb8be

add tests for workflow metadata

bbeeb5e

fishbone reviewed Oct 12, 2021

View reviewed changes

python/ray/workflow/virtual_actor_class.py Outdated Show resolved Hide resolved

fishbone reviewed Oct 12, 2021

View reviewed changes

python/ray/workflow/tests/test_metadata_put.py Outdated Show resolved Hide resolved

change metadata name

6ae3aa5

lchu6 force-pushed the metadata branch from 93eb53f to 6ae3aa5 Compare October 12, 2021 02:57

add json serialization check for user metadata

c496afa

lchu6 force-pushed the metadata branch from 846c5ab to c496afa Compare October 12, 2021 04:01

lint

4dfc0ed

lchu6 and others added 4 commits October 12, 2021 13:45

‘empty’

7c0fe88

try to fix

db34a29

up

bfb67ee

Merge remote-tracking branch 'upstream/master' into metadata

e4f8f48

fishbone approved these changes Oct 13, 2021

View reviewed changes

fishbone merged commit ce64e6d into ray-project:master Oct 13, 2021

lchu6 mentioned this pull request Oct 13, 2021

[workflow] fix workflow user metadata return when None is given #19356

Merged

6 tasks

lchu6 deleted the metadata branch October 13, 2021 22:47

add metadata put in workflow #19195

add metadata put in workflow #19195

Uh oh!

Conversation

lchu6 commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fishbone commented Oct 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fishbone left a comment

Choose a reason for hiding this comment

Uh oh!

lchu6 commented Oct 7, 2021

Uh oh!

fishbone commented Oct 8, 2021

Uh oh!

lchu6 commented Oct 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fishbone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lchu6 commented Oct 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fishbone commented Oct 12, 2021

Uh oh!

fishbone commented Oct 12, 2021

Uh oh!

Uh oh!

Uh oh!

fishbone commented Oct 12, 2021

Uh oh!

fishbone commented Oct 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lchu6 commented Oct 12, 2021

Uh oh!

fishbone commented Oct 12, 2021

Uh oh!

fishbone left a comment

Choose a reason for hiding this comment

Uh oh!

fishbone commented Oct 13, 2021

Uh oh!

lchu6 commented Oct 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lchu6 commented Oct 7, 2021 •

edited

Loading

fishbone commented Oct 7, 2021 •

edited

Loading

lchu6 commented Oct 11, 2021 •

edited

Loading

lchu6 commented Oct 12, 2021 •

edited

Loading

fishbone commented Oct 12, 2021 •

edited

Loading