Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metric] Fix prometheus metric backend #3124

Merged
merged 5 commits into from
Jun 9, 2022

Conversation

zhongchun
Copy link
Contributor

@zhongchun zhongchun commented Jun 7, 2022

What do these changes do?

  • Fix an error when executing a task with Prometheus metric backend
  • Add details docs for Prometheus metric backend

Related issue number

Fixes #3123

We can use Prometheus metric backend as follows:

Prepare Env

How to use

  1. New a Mars session
In [1]: import numpy as np
   ...:
   ...: import mars
   ...: import mars.dataframe as md
   ...:
   ...: session = mars.new_session(n_worker=1, n_cpu=2, web=True, config={"metrics.backend": "prometheus"})
   ...:
Finished startup prometheus http server and port is 46137
Finished startup prometheus http server and port is 32280
Finished startup prometheus http server and port is 39947
Finished startup prometheus http server and port is 25067
Web service started at http://0.0.0.0:11244
  1. Config and start prometheus
scrape_configs:
  - job_name: 'mars'

    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:46137', 'localhost:32280', 'localhost:39947', 'localhost:25067']
$ prometheus --config.file=promconfig.yaml
level=info ts=2022-06-07T13:05:01.484Z caller=main.go:296 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2022-06-07T13:05:01.484Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=non-git, revision=non-git)"
level=info ts=2022-06-07T13:05:01.484Z caller=main.go:333 build_context="(go=go1.13.1, [email protected], date=20191018-01:13:04)"
level=info ts=2022-06-07T13:05:01.485Z caller=main.go:334 host_details=(darwin)
level=info ts=2022-06-07T13:05:01.485Z caller=main.go:335 fd_limits="(soft=256, hard=unlimited)"
level=info ts=2022-06-07T13:05:01.485Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2022-06-07T13:05:01.487Z caller=main.go:657 msg="Starting TSDB ..."
level=info ts=2022-06-07T13:05:01.488Z caller=web.go:450 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2022-06-07T13:05:01.494Z caller=head.go:514 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2022-06-07T13:05:01.495Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1
level=info ts=2022-06-07T13:05:01.495Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1
level=info ts=2022-06-07T13:05:01.497Z caller=main.go:672 fs_type=1a
level=info ts=2022-06-07T13:05:01.497Z caller=main.go:673 msg="TSDB started"
level=info ts=2022-06-07T13:05:01.497Z caller=main.go:743 msg="Loading configuration file" filename=promconfig_mars.yaml
level=info ts=2022-06-07T13:05:01.501Z caller=main.go:771 msg="Completed loading of configuration file" filename=promconfig_mars.yaml
level=info ts=2022-06-07T13:05:01.501Z caller=main.go:626 msg="Server is ready to receive web requests."
  1. Run a task
df1 = md.DataFrame(np.random.randint(0, 3, size=(10, 4)),
                   columns=list('ABCD'), chunk_size=5)
df2 = md.DataFrame(np.random.randint(0, 3, size=(10, 4)),
                   columns=list('ABCD'), chunk_size=5)

r = md.merge(df1, df2, on='A').execute()
  1. Check Prometheus web url
    http://localhost:9090
    image

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@zhongchun zhongchun requested a review from a team as a code owner June 7, 2022 14:40
@qinxuye
Copy link
Collaborator

qinxuye commented Jun 8, 2022

Nice work, could you please add the description about how to start prometheus backend to the doc as well? Thanks in advance.

@qinxuye qinxuye added type: bug Something isn't working to be backported Indicate that the PR need to be backported to stable branch mod: metric labels Jun 8, 2022
@qinxuye qinxuye added this to In progress in Distributed via automation Jun 8, 2022
@qinxuye qinxuye added this to PR-In progress in v0.10 Release via automation Jun 8, 2022
@qinxuye qinxuye added this to the v0.10.0a1 milestone Jun 8, 2022
@zhongchun
Copy link
Contributor Author

Nice work, could you please add the description about how to start prometheus backend to the doc as well? Thanks in advance.

OK.

global _metric_backend
_metric_backend = "console"
global _init
_init = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shutdown_metrics only set the global var _init to False, how does it shutdown the http server started by the prometheus metrics backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you reminder. We should do some some cleanup. But there is no stop or shutdown in prometheus_client.

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjsi wjsi merged commit ff0e925 into mars-project:master Jun 9, 2022
Distributed automation moved this from In progress to Done Jun 9, 2022
v0.10 Release automation moved this from PR-In progress to PR-Done Jun 9, 2022
qinxuye pushed a commit to qinxuye/mars that referenced this pull request Jun 9, 2022
wjsi pushed a commit that referenced this pull request Jun 9, 2022
@qinxuye qinxuye added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: metric type: bug Something isn't working
Projects
Distributed
  
Done
Development

Successfully merging this pull request may close these issues.

[BUG] Error when executing a task with prometheus metric
4 participants