Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics] Add metrics on operand and subtask executions #2947

Merged
merged 1 commit into from
Apr 24, 2022

Conversation

zhongchun
Copy link
Contributor

What do these changes do?

  • Add scheduling, band and op metrics
    mars.scheduling.submitted_subtask_count: The count of submitted subtasks to all bands.
    mars.scheduling.finished_subtask_count: The count of finished subtasks of all bands.
    mars.scheduling.canceled_subtask_count: The count of canceled subtasks of all bands.
    mars.band.submitted_subtask_number: The number of submitted subtask to a band.
    mars.band.unsubmitted_subtask_number: The number of unsubmitted subtask to a band.
    mars.operand.executed_number: The number of executed operands.
  • Optimize metrics arguments check

Related issue number

Fixes #xxxx

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

* Add scheduling, band and op metrics
* Optimize metrics arguments check
@qinxuye
Copy link
Collaborator

qinxuye commented Apr 23, 2022

Some metrics named with suffix count, but some with number, looks a bit weird though, is it possible to unify the naming?

@zhongchun
Copy link
Contributor Author

Some metrics named with suffix count, but some with number, looks a bit weird though, is it possible to unify the naming?

Thanks for you suggestion.
_count means metric type is Counter, _number means its type is Gauge. Just to distinguish them from their types.

@qinxuye
Copy link
Collaborator

qinxuye commented Apr 23, 2022

Some metrics named with suffix count, but some with number, looks a bit weird though, is it possible to unify the naming?

Thanks for you suggestion. _count means metric type is Counter, _number means its type is Gauge. Just to distinguish them from their types.

OK, can we have a document about the naming specification about metrics? Just add it to https://github.com/mars-project/mars/tree/master/docs/source/development ?

@qinxuye qinxuye added this to In progress in Distributed via automation Apr 23, 2022
@qinxuye qinxuye added this to PR-In progress in v0.9 Release via automation Apr 23, 2022
@qinxuye qinxuye added this to the v0.9.0rc3 milestone Apr 23, 2022
Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, like I said, we can add a document about adding metrics.

@zhongchun
Copy link
Contributor Author

LGTM overall, like I said, we can add a document about adding metrics.

Ok, I'll add docs soon.

@zhongchun
Copy link
Contributor Author

Some metrics named with suffix count, but some with number, looks a bit weird though, is it possible to unify the naming?

Thanks for you suggestion. _count means metric type is Counter, _number means its type is Gauge. Just to distinguish them from their types.

OK, can we have a document about the naming specification about metrics? Just add it to https://github.com/mars-project/mars/tree/master/docs/source/development ?

Added in #2955, please take a look.

@qinxuye qinxuye changed the title [Metrics] Add some metrics [Metrics] Add metrics for scheduling Apr 24, 2022
@@ -39,6 +40,11 @@
_op_type_to_size_estimator: Dict[Type[OperandType], Callable] = dict()


op_executed_number = Metrics.counter(
"mars.operand.executed_number", "The number of executed operands.", ("op",)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of what kind of objects executed? Please make the name of the counter and the variable more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about mars.operator_executed_number? And the description is The number of executed operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I've ignored the prefix.. Already merged.

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjsi wjsi merged commit 11dc135 into mars-project:master Apr 24, 2022
Distributed automation moved this from In progress to Done Apr 24, 2022
v0.9 Release automation moved this from PR-In progress to PR-Done Apr 24, 2022
@qinxuye qinxuye changed the title [Metrics] Add metrics for scheduling [Metrics] Add metrics on operand and subtask executions Apr 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Distributed
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants