Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger grafana mixin is broken #2956

Closed
bravecobra opened this issue Apr 26, 2021 · 20 comments · Fixed by #3331
Closed

Jaeger grafana mixin is broken #2956

bravecobra opened this issue Apr 26, 2021 · 20 comments · Fixed by #3331
Assignees
Labels

Comments

@bravecobra
Copy link

Describe the bug
I tried following the mixin documentation to generate the CRDs for the monitoring jaeger in grafana, but the mixin is broken due to git repo's being moved. No idea where to begin to fix this.

The generated version lacks documentation on how to use them, so I'm stuck.

To Reproduce
Steps to reproduce the behavior:

  1. Just follow the docs at https://github.com/jaegertracing/jaeger/tree/master/monitoring/jaeger-mixin

Expected behavior
A working config to apply to kubernetes to monitor Jaeger. No idea why this mixin stuff is being added as it's why too complicated to a simple dashboard in Grafana.

@jpkrohling
Copy link
Contributor

@gouthamve, would you be interested in maintaining the mixin? Otherwise, it might be better to remove it, and point people to a blog post instead.

@bravecobra
Copy link
Author

I'd be happy with a blog post, as long as I can get monitoring with prometheus and grafana to work. I already have a prometheus-operator instance and a jaeger-operator instance running. Just matter of hooking them up.

@jpkrohling
Copy link
Contributor

Have you seen this blog post? This was made at the time I added the mixin to this repository:

https://developers.redhat.com/blog/2019/08/28/build-a-monitoring-infrastructure-for-your-jaeger-installation/

@bravecobra
Copy link
Author

@jpkrohling Yes, I saw that one, but that one fails at the jb install step as the repo references are out of date. That blogpost is no longer relevant in its current state.

@jpkrohling
Copy link
Contributor

Sorry about that, I knew that it was risky to use quite a bunch of tools in that blog post, even though those are an example of what people might use in production to provision their resources. In any case, you can get an initial setup working with simpler steps. I can help you with the Jeager parts of the equation, but can't really provide much help about Prometheus/Grafana other than what they have in the docs. I would recommend taking a look at kube-prometheus for those parts. The quick start from the readme should give you enough to have a Grafana dashboard with Prometheus as a data source pre-configured.

Also make sure that you have a Jaeger instance running, probably provisioned by the jaeger-operator.

Once you have that ready, you'll need to either create a PodMonitor, with a selection that returns all your Jaeger instances, or you create a Service (backed by the same pods) + ServiceMonitor. The service should then expose the Jaeger monitoring port, which is 14269 if you are using the collector as a separate component. Once you get the PodMonitor/ServiceMonitor in place, the prometheus-operator should update your Prometheus config to add the Jaeger instances to the target list, and it might take a few minutes.

The jaeger-operator wouldn't be relevant here unless you want to monitor the operator itself. In which case, it works the same as the above, with your PodMonitor/Service being backed by the operator pods instead.

Let me know if you have enough to get unblocked, or if you need more clarification on specific steps.

@bravecobra
Copy link
Author

Yeah, that was pretty much what I hoped the mixin would provide for me and how I understood the mixin would work. Having to craft the CRD's myself though would take much longer. Is there no one that can provide the CRD's output as an example to start from? I already have jaeger, prometheus and grafana up and running using their operators. I'm only missing the extra service and the service monitor to get the data into prometheus. Hooking up a dashboard from there, shouldn't be that hard unless that all changed in the meantime. (I found a couple at https://grafana.com/grafana/dashboards?search=jaeger)
Btw, tnx for taking the time to figure this out.

@jpkrohling
Copy link
Contributor

I'd need more time to give you a configuration that I know to work, but the getting started guide from the Prometheus operator has an example of both a service and a service monitor, which I copy here:

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

The port in the service has to be 14269. The selector in the service should be the same as the service that backs the collector deployment. Using the simplest example, this would be the selector:

  selector:
    app: jaeger
    app.kubernetes.io/component: all-in-one
    app.kubernetes.io/instance: simplest
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: simplest
    app.kubernetes.io/part-of: jaeger

You can also leave some tags out, which would likely yield more pods, like the name and the instance ones. Make sure to not have query and collector pods as part of the same service, as they use different ports for the metrics.

The service should have a set of labels like this, if you want them to be like the service that fronts the collector deployments:

  labels:
    app: jaeger
    app.kubernetes.io/component: service-collector
    app.kubernetes.io/instance: simplest
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: simplest-collector
    app.kubernetes.io/part-of: jaeger

Those labels would then be part of the selector for the service monitor. Again, you can remove a few of them, but make sure to keep at least the component.

Hope this helps.

@bravecobra
Copy link
Author

Perfect, that should get me started. I'll keep you posted on the progress. Once I do have a working example, I'll make sure to document it for future reference and for validation, whether the approach is production-safe.

@bravecobra
Copy link
Author

@jpkrohling, following your instructions, I seem to have it working now.
So, I deployed an all-in-one jaeger instance with the jaeger-operator. I also deployed a prometheus instance with the prometheus-operator. Now lets tie them together and let prometheus monitor the jaeger instance.

I added an extra service jaeger-admin pointing to admin port of the jaeger instance. It would be nice if the jaeger-operator would allow creating this instead.

kind: Service
apiVersion: v1
metadata:
  name: jaeger-admin
  labels:
    app: jaeger
  namespace: infrastructure
spec:
  selector:
    app: jaeger
    app.kubernetes.io/component: all-in-one
    app.kubernetes.io/instance: jaeger
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: jaeger
    app.kubernetes.io/part-of: jaeger
  ports:
  - name: admin-http
    port: 14269

Note that I deployed my jaeger instance in an infrastructure namespace. So the jaeger-admin service would logically be in that same namespace. You might want to change that accordingly.

Then I added a ServiceMonitor to prometheus to let it scrape the metrics from the new jaeger-admin service.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jaeger
  namespace: infrastructure
  labels:
    app: jaeger
    release: prometheus
spec:
  jobLabel: jaeger-metrics
  selector:
    matchLabels:
      app: jaeger
  namespaceSelector:
    matchNames:
    - infrastructure
  endpoints:
  - port: admin-http

Again, change the namespace depending on where you deployed the jaeger instance.
Then I gave prometheus the hint to reload its configuration (https://www.robustperception.io/reloading-prometheus-configuration) by hitting the reload endpoint curl -X POST http://localhost:9090/-/reload
That made prometheus start scraping the jaeger-admin service and it appeared among the targets.
image

I then imported a Jaeger dashboard in grafana: https://grafana.com/grafana/dashboards/10001
which gave me output there
image

Success!!
Adding a new PrometheusRule to report problems, might be a nice future addition.

In hindsight, that was all relatively trivial, but somehow this is missing from the documentation. The mixin part really threw me off, and since it's broken and unmaintained, I would suggest taking it out if no one is willing to pick it up.
This might be a nice topic for a blog post and/or some documentation addition.

As the report is about the mixin being broken, I'll let you guys decide what to do with the status of this issue.

@jpkrohling
Copy link
Contributor

Adding a new PrometheusRule to report problems, might be a nice future addition.

Could you open a feature request in the jaeger-operator repository?

The mixin part really threw me off, and since it's broken and unmaintained, I would suggest taking it out if no one is willing to pick it up.

Would you be willing to send a PR removing it?

This might be a nice topic for a blog post and/or some documentation addition.

I think it would be a great material for this: https://www.jaegertracing.io/docs/1.22/monitoring/ . How do you feel about helping us here? This is the source of that page: https://github.com/jaegertracing/documentation/blob/master/content/docs/1.22/monitoring.md . Perhaps a sub page would make sense? Like, Monitoring with Prometheus on Kubernetes?

@Kampe
Copy link

Kampe commented Jul 29, 2021

does the operator not have a flag to expose metrics over admin ports at the service level???

@jpkrohling
Copy link
Contributor

Not via the regular service. I think we have a feature request to create a monitoring service, exposing only this port.

@esnible
Copy link
Contributor

esnible commented Sep 23, 2021

I also tried to follow the mixin docs and failed. I can give you my detailed notes. I don't want to attempt a PR yet because I never did get it running. I did make progress.

go get github.com/google/go-jsonnet/cmd/jsonnet gives the warning "installing executables with 'go get' in module mode is deprecated." I am using Go 1.17; I assume with Go 1.18 it won't work at all. See https://golang.org/doc/go-get-install-deprecation

I installed with

go install github.com/google/go-jsonnet/cmd/jsonnet
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest

The jb install instructions fail with ""error: pathspec 'master' did not match any file(s) known to git"". To fix this, the last thing installed needed to change from github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master to github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@main

I tried to follow the instructions to make the complete monitoring stack. jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet failed with "RUNTIME ERROR: couldn't open import "kube-prometheus/kube-prometheus.libsonnet": no match locally or in the Jsonnet library paths". The file was renamed in January as part of prometheus-operator/kube-prometheus@1eedb90#diff-d262f5c7fd279c6ca3e9e1e56fac31c64b41c94490873b182683b35ecf4c2cec but the doc wasn't changed at that time.

At that point kubectl apply -f manifests/ makes progress, but complains because of an "error validating "manifests/dashboards-jaeger.json": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false"

(At this point 13 pods are running successfully in the default namespace. I tried to put them into the observability namespace, but failed.)

@odidev
Copy link

odidev commented Oct 13, 2021

Hi Team,

I am also trying to deploy monitoring on Jaeger, following the documentation, on the Linux platform.

As pointed out by @esnible in the above comment, I am also facing the same issues while executing the jsonnet command. Below are the error logs:

$ jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet 

RUNTIME ERROR: couldn't open import "kube-prometheus/kube-prometheus.libsonnet": no match locally or in the Jsonnet library paths 

Are you planning to update the monitoring documentation to resolve these issues?

@jpkrohling
Copy link
Contributor

Yes, I have this on my queue.

@jpkrohling
Copy link
Contributor

I submitted a PR to get this fixed. Would anyone be interested in trying it out and giving me feedback?

@odidev
Copy link

odidev commented Oct 21, 2021

@jpkrohling , Thank you for this PR.

I followed the updated mixin documentation, and getting some issues while executing jsonnet command:

$ ~/bin/jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet
RUNTIME ERROR: Unexpected type function
        vendor/kube-prometheus/components/grafana.libsonnet:(41:16)-(71:4)      object <anonymous>
        vendor/kube-prometheus/components/grafana.libsonnet:73:11-15
        monitoring-setup.jsonnet:36:34-50       object <anonymous>
        During manifestation

Can you please provide some pointers on the same?

@jpkrohling
Copy link
Contributor

I'm on this, and I'm not sure yet what's wrong. For now, remove the grafana part from your jsonnet file:

{ ['grafana-' + name + '.json']: kp.grafana[name] for name in std.objectFields(kp.grafana) } +

@jpkrohling
Copy link
Contributor

Looks like it was a temporary fluke somewhere. Could you please try removing your vendor, running the jb install commands again before retrying the jsonnet command? Also, I submitted a new PR to fix a small issue with the example as seen from the readme file: #3336

@odidev
Copy link

odidev commented Oct 25, 2021

@jpkrohling, Thanks for the updates. All the commands are successfully running now and PodMonitor named “tracing” has been created successfully.

$ kubectl get PodMonitor tracing -n observability 

NAME      AGE 
tracing   37s 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants