prometheus support #1507

fujita · 2017-11-22T15:18:11Z

Is anyone interested in?

CoreTex · 2017-11-23T13:40:14Z

Yes, but more interesting would be Zabbix.

benerudolph · 2017-11-23T14:04:44Z

Maybe there is a common way to handle both monitoring via prometheus and other monitoring systems in a generic way?

nstgt · 2017-11-25T08:12:46Z

These metrics must be helpful for us.

routes # of adj-in, accepted, and local RIB for route server mode
per peer bgp packets counter
buffering status

kilo-bit · 2017-12-08T02:05:01Z

Prometheus support would be greatly appreciated.

imcom · 2017-12-26T07:28:25Z

+1 for Prometheus

tonidas · 2018-01-04T16:49:09Z

+1 for Prometheus!

xiaopeng163 · 2018-03-15T15:46:13Z

+1 for Prometheus!

greenpau · 2018-05-02T23:44:07Z

Is anyone interested in?

@fujita, I am interested 😃 There are at least two ways to do that.

A separate project for gobgp_exporter, which queries gobgp API.
A separate go routine in gobgp

Any preference?

greenpau · 2018-05-03T03:22:47Z

@fujita , I looked that gobgp code and I personally like point 1. Could you create a separate project, say gobgp_exporter for the exporter under osrg? I could put a PR for the exporter code pretty fast. The client API for gobgp is pretty straight-forward, e.g. showRibInfo() function.

fujita · 2018-06-02T13:40:48Z

Sorry about the delay, I've just started to investigate implementation options.

@greenpau why can't we pick the best of option 1 and 2? Having a separate project (exporter) forces users to manage another daemon. We can't run a go routine to execute the gRPC API and handle http requests from prometheus?

I've just create a branch to only send the received number in each peer:
fujita@090cca2

greenpau · 2018-06-02T17:28:20Z

@fujita, it depends on the circumstances.

Sync vs Async polling...imagine you want to monitor a routing table contining 1000s routes. Do you want the polling to be sync or async?
Containerizing daemons. Once you containerized gobgp with prom built-in, the different go routine compete for the same cpu, because 1 cpu for a container.
Relid configs. How do you reload prom config?
etc.

Prom exporters are designed on per app basis. Compare node_exporter and backbox_exporter and you will get the gist.

fujita · 2018-06-03T10:51:58Z

Sync vs Async polling.

I don't understand. I don't see how this is related with the exporter and exposing models.

Containerizing daemons.

1 cpu for a container? Why? You can configure what you like.

Relid configs. How do you reload prom config?

I had quick look at the official exporters, mysql, consul. Looks like they don't support the config reloading.

greenpau · 2018-06-04T13:18:11Z

@fujita , see below.

1 cpu for a container? Why? You can configure what you like.

Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.

I don't understand. I don't see how this is related with the exporter and exposing models.

Polling = scraping

Sync scraping means you make a call to the URL of an exporter, and subsequent data collection starts. Once the data collected it outputs it to your scraper.

Async scraping means that data collection runs independent of the scrape calls. In that mode, the scraper gets that data from data collection buffer.

Why that matters. Consider your exporter has 100-100s of metrics and data collection takes 1-3s. That's the time your GoBGP server routine competes with the collectors' routines for single CPU time. It obviously does not matter when you collect status of BGP peers, because it probably takes 0.00001s.

Looks like they don't support the config reloading.

Right. Consider you want to change the configuration of the exporter, would you restart GoBGP and drop neighbor relations with your peers?

fujita · 2018-06-04T14:16:11Z

Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.

Hmm, really? Inside a single container, can you try to run two processes; each execute an endless loop? Then see the output of top command on a host OS.

greenpau · 2018-06-04T15:51:59Z

Hmm, really?

Your GoBGP and Prometheus go routines are not separate processes. They are go routines. Hence, will be on the same CPU.

greenpau · 2018-06-04T15:57:57Z

can you try to run two processes; each execute an endless loop?

@fujita , that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.

greenpau · 2018-06-04T20:30:05Z

By default, processes can run on any CPU. However, processes can be instructed to run on a predetermined selection of CPUs, by changing the affinity of the process. Child processes inherent the CPU affinities of their parents.
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/chap-Realtime_Reference_Guide-Affinity.html

@fujita , refreshed my memory on the subject. Here, there is a snippet to run the test you proposed.

As the sample runs below demonstrate, the amount of real and CPU time
consumed when running the program will depend on intra-core caching
effects and whether the processes are using the same CPU.

fujita · 2018-06-04T22:15:18Z

can you try to run two processes; each execute an endless loop?

@fujita, that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.

Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.

greenpau · 2018-06-05T00:25:06Z

Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.

@fujita , yes, two processes running in a container can be on separate CPUs. My mistake.

The test script:

import os

def child():
   print('\nA new child ',  os.getpid())
   os._exit(0)

def parent():
   while True:
      newpid = os.fork()
      if newpid == 0:
         child()
      else:
         pids = (os.getpid(), newpid)
         print("parent: %d, child: %d\n" % pids)
      reply = input("q for quit / c for new fork")
      if reply == 'c':
          continue
      else:
          break

parent()

Running the test:

# python test.py
parent: 29, child: 30

q for quit / c for new fork('\nA new child ', 30)

The two processes are 1847 (parent) and 1848 (child):

$ ps -ef | grep python
root      1847  1817  0 20:05 pts/3    00:00:00 python test.py
root      1848  1847  0 20:05 pts/3    00:00:00 [python] <defunct>

CPU Sets:

# cat /proc/1847/status  | grep Cpus
Cpus_allowed:	f
Cpus_allowed_list:	0-3

# cat /proc/1848/status  | grep Cpus
Cpus_allowed:	f
Cpus_allowed_list:	0-3

The CPU process:

#  cat /proc/1848/stat | cut -d' ' -f39
1

#  cat /proc/1847/stat | cut -d' ' -f39
0

fujita · 2018-06-05T01:50:22Z

Good, now you understand goroutines DON'T compete for a single CPU inside a container. A container is just namespaces and processes from the perspective of Linux kernel so it can use as many CPUs as you have.

For me, some official exporters don't support reloading because it's not useful much. So I go with the current design. If someone needs reloading, it's pretty easy for him to implement a own gobgp exporter by using gobgp/exporter as a package; he just needs a main function.

greenpau · 2018-06-05T02:02:31Z

now you understand goroutines DON'T compete for a single CPU inside a container.

@fujita , overconfidence 😆 they DO compete when there is only 1 CPU to compete over 😄 They also compete when you have more goroutines than CPUs. I get that an application, over time, might use all CPUs. How do you know that goroutines are being load balanced across all CPUs?

fujita · 2018-06-05T02:04:48Z

No difference between the exporter design and the current design on that.

greenpau · 2018-06-05T02:13:07Z

FYI, check this article for CPU, memory profiling with Go.

jeffbean · 2018-06-14T03:15:00Z

Tally allows you to define and emit metrics in different ways. Its an option that allows you to configure multiple ways of collecting/emitting metrics. Just a suggestion.

greenpau · 2018-08-21T00:21:22Z

@fujita , I started with external exporter here: https://github.com/greenpau/gobgp_exporter
Would love to get your feedback on the code. There are probably some issues with concurrency. Still getting up to speed with Go. I spent my weekend on this 😄

fujita · 2018-08-23T02:08:42Z

@greenpau nice, but why you modify pb.go file?

greenpau · 2018-08-24T00:35:31Z

why you modify pb.go file?

@fujita , I guess the primary reason is that I want to control the timeout on the client.

The Collect() will be called on each scrape. If the daemon does not respond, the srcape holds the session and will not timeout right away. This results in hanging scrapes. It is not long, but should timeout in a second or two.
In my last commit, I added a logic where the GoBGP polling and scraping happen async. There is also some concurrency logic.

Additionally, ... somewhat related, but not ... since the changes to the API (hiding behind the /internal), the only way to get to some functionality (e.g. things related to ptype, any, etc.) is by adding extra functions in the way I did it with extended client. Not ideal. I would prefer to have that code in upstream, but I don't know whether you want to have it there.

greenpau · 2018-08-24T00:37:37Z

I think I am close to getting some of the things mentioned here done.

fujita · 2018-08-25T11:49:57Z

why not using context.WithTimeout?

greenpau · 2018-08-25T13:12:57Z

@fujita, because I don’t know how context works. I will take time to read up on it.

greenpau · 2018-08-28T19:00:33Z

why not using context.WithTimeout?

@fujita , I was reading up on the Go's context package. Say I implemented the context with timeout. The timeout gets triggered. How do I close the connection? (Ideally, in case of a timeout, I would prefer not keeping half-closed connections). How do I assess whether the connection is still alive?

nemith · 2020-10-09T20:31:14Z

@greenpau you do a select on ctx.Done() at some point of your connection. This is automatic with the net package I believe.

kikht mentioned this issue Mar 1, 2023

basic prometheus metrics #2630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus support #1507

prometheus support #1507

fujita commented Nov 22, 2017

CoreTex commented Nov 23, 2017

benerudolph commented Nov 23, 2017

nstgt commented Nov 25, 2017

kilo-bit commented Dec 8, 2017

imcom commented Dec 26, 2017

tonidas commented Jan 4, 2018

xiaopeng163 commented Mar 15, 2018

greenpau commented May 2, 2018

greenpau commented May 3, 2018

fujita commented Jun 2, 2018

greenpau commented Jun 2, 2018

fujita commented Jun 3, 2018

greenpau commented Jun 4, 2018

fujita commented Jun 4, 2018

greenpau commented Jun 4, 2018

greenpau commented Jun 4, 2018 •

edited

Loading

greenpau commented Jun 4, 2018

fujita commented Jun 4, 2018

greenpau commented Jun 5, 2018

fujita commented Jun 5, 2018

greenpau commented Jun 5, 2018

fujita commented Jun 5, 2018

greenpau commented Jun 5, 2018

jeffbean commented Jun 14, 2018

greenpau commented Aug 21, 2018

fujita commented Aug 23, 2018

greenpau commented Aug 24, 2018 •

edited

Loading

greenpau commented Aug 24, 2018

fujita commented Aug 25, 2018

greenpau commented Aug 25, 2018

greenpau commented Aug 28, 2018

nemith commented Oct 9, 2020

prometheus support #1507

prometheus support #1507

Comments

fujita commented Nov 22, 2017

CoreTex commented Nov 23, 2017

benerudolph commented Nov 23, 2017

nstgt commented Nov 25, 2017

kilo-bit commented Dec 8, 2017

imcom commented Dec 26, 2017

tonidas commented Jan 4, 2018

xiaopeng163 commented Mar 15, 2018

greenpau commented May 2, 2018

greenpau commented May 3, 2018

fujita commented Jun 2, 2018

greenpau commented Jun 2, 2018

fujita commented Jun 3, 2018

greenpau commented Jun 4, 2018

fujita commented Jun 4, 2018

greenpau commented Jun 4, 2018

greenpau commented Jun 4, 2018 • edited Loading

greenpau commented Jun 4, 2018

fujita commented Jun 4, 2018

greenpau commented Jun 5, 2018

fujita commented Jun 5, 2018

greenpau commented Jun 5, 2018

fujita commented Jun 5, 2018

greenpau commented Jun 5, 2018

jeffbean commented Jun 14, 2018

greenpau commented Aug 21, 2018

fujita commented Aug 23, 2018

greenpau commented Aug 24, 2018 • edited Loading

greenpau commented Aug 24, 2018

fujita commented Aug 25, 2018

greenpau commented Aug 25, 2018

greenpau commented Aug 28, 2018

nemith commented Oct 9, 2020

greenpau commented Jun 4, 2018 •

edited

Loading

greenpau commented Aug 24, 2018 •

edited

Loading