Skip to content

Conversation

@andrzej-stencel
Copy link
Contributor

@andrzej-stencel andrzej-stencel commented Feb 18, 2025

Proposed commit message

Update github.com/prometheus/prometheus dependency to v0.300.1
by running go get github.com/prometheus/[email protected].
This version matches the version used in OTel v0.120.0,
and this update is needed to include OTel v0.120.0 in Elastic Agent.
See https://github.com/elastic/elastic-agent/pull/6912/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R496.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@andrzej-stencel andrzej-stencel requested a review from a team as a code owner February 18, 2025 13:47
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 18, 2025
@mergify
Copy link
Contributor

mergify bot commented Feb 18, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @andrzej-stencel? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@andrzej-stencel andrzej-stencel added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-8.x Automated backport to the 8.x branch with mergify labels Feb 18, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 18, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@andrzej-stencel
Copy link
Contributor Author

cc @blakerouse for 👀 as you created a similar PR before.

Added `fallbackType == ""`  and `skipOMCTSeries == false`.
OMCT series are OpenMetrics CreatedTimestamp timeseries,
and I believe the default for this is false.
@andrzej-stencel
Copy link
Contributor Author

andrzej-stencel commented Feb 18, 2025

I've pushed a commit to fix the compilation error by adding missing parameters to the textparse.New call.

See [email protected] (before this change) and [email protected] (after this change).

I've set fallbackType argument to empty string "" and skipOMCTSeries to false.
OMCT series are OpenMetrics CreatedTimestamp timeseries,
and I believe the default for this is false.

@andrzej-stencel andrzej-stencel marked this pull request as ready for review February 18, 2025 14:47
@andrzej-stencel andrzej-stencel requested a review from a team as a code owner February 18, 2025 14:47
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Feb 18, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz cmacknz requested a review from a team February 18, 2025 15:48
@cmacknz
Copy link
Member

cmacknz commented Feb 18, 2025

Pinging @elastic/obs-infraobs-integrations as they own the prometheus integration that depends on this and are most likely to know how to test any impact this might have. https://github.com/elastic/integrations/blob/119156a21359b7a56756e91743b18d6970b9c643/packages/prometheus/manifest.yml#L35

@MichaelKatsoulis
Copy link
Contributor

In https://github.com/elastic/beats/blob/26fecc126fe80b648471e401a7b175cf2165ab1b/metricbeat/mb/testing/testdata.go#L119C6-L119C23 the default content type is application/json which in new textParse

https://github.com/prometheus/prometheus/blob/1f56e8492c31a558ccea833027db4bd7f8b6d0e9/model/textparse/interface.go#L128

func New(b []byte, contentType, fallbackType string, parseClassicHistograms, skipOMCTSeries bool, st *labels.SymbolTable) (Parser, error) {
	mediaType, err := extractMediaType(contentType, fallbackType)
	// err may be nil or something we want to warn about.

	switch mediaType {
	case "application/openmetrics-text":
		return NewOpenMetricsParser(b, st, func(o *openMetricsParserOptions) {
			o.SkipCTSeries = skipOMCTSeries
		}), err
	case "application/vnd.google.protobuf":
		return NewProtobufParser(b, parseClassicHistograms, st), err
	case "text/plain":
		return NewPromParser(b, st), err
	default:
		return nil, err
	}
}

is not supported as mediaType. For the same behaviour as in previous versions we could use text/plain instead.

…:andrzej-stencel/beats into update-prometheus-to-match-otel-v0.120.0
@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Feb 21, 2025

@MichaelKatsoulis have you been able to reproduce this test failure locally? I tried on main after bumping the k8s libraries, and it didn't fail once in 100 runs.

@swiatekm
Yes it is failing many times locally. Did you clean the go test cache before running the tests?
I was using:

go clean -testcache && /usr/local/go/bin/go test -timeout 30s -run ^TestNewLeaderElectionManager$ github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes -v

Also the PR ci was failing almost every time. Check one buildkite output that shows exactly that.

I updated the test logic because I think it was faulty in the part that it was forcing the lease renewals.
I believe now it is better.
Also I tested in a real cluster and there was no problem with the new library. So there was no bug.

@MichaelKatsoulis
Copy link
Contributor

For me the PR is good to be merged!

@MichaelKatsoulis
Copy link
Contributor

@elastic/obs-infraobs-integrations based on a concern raised from @tommyers-elastic, do you believe that the compatibility of the new github.com/prometheus/prometheus v0.300.1 library with older prometheus version 2.X should be tested?

@tommyers-elastic
Copy link
Contributor

i have found at least one way this can cause a breaking change for prometheus 2.x users, which can occur when scraping targets that do not send content type headers, as described here.

running the prometheus collector metricbeat module in main works fine for such endpoints, but running with the changes here results in errors and no metrics reported.

{
    "@timestamp": "2025-02-24T20:37:21.290Z",
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "Toms-MacBook-Pro.local"
    },
    "agent": {
      "id": "77b8b5ed-245e-4ddc-a98c-195ffa174328",
      "name": "Toms-MacBook-Pro.local",
      "type": "metricbeat",
      "version": "9.1.0",
      "ephemeral_id": "f47013a6-7c33-42a6-a227-44a52db2a257"
    },
    "metricset": {
      "name": "collector",
      "period": 10000
    },
    "service": {
      "address": "http://localhost:9188/metrics",
      "type": "prometheus"
    },
    "event": {
      "dataset": "prometheus.collector",
      "module": "prometheus",
      "duration": 1019196708
    },
    "error": {
      "message": "unable to decode response from prometheus endpoint: failed to parse families: non-compliant scrape target sending blank Content-Type and no fallback_scrape_protocol specified for target"
    }
  }

the otel collector prometheus receiver actually suffered from the same bug after they recently upgraded to prometheus 0.300.x, open-telemetry/opentelemetry-collector-contrib#38018.

i am continuing to test, but right now I don't think we can merge this without more thorough consideration of breaking changes and more testing and validation.

FYI @lalit-satapathy

@ArthurSens
Copy link

i have found at least one way this can cause a breaking change for prometheus 2.x users, which can occur when scraping targets that do not send content type headers, as described here.

running the prometheus collector metricbeat module in main works fine for such endpoints, but running with the changes here results in errors and no metrics reported.

{
    "@timestamp": "2025-02-24T20:37:21.290Z",
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "Toms-MacBook-Pro.local"
    },
    "agent": {
      "id": "77b8b5ed-245e-4ddc-a98c-195ffa174328",
      "name": "Toms-MacBook-Pro.local",
      "type": "metricbeat",
      "version": "9.1.0",
      "ephemeral_id": "f47013a6-7c33-42a6-a227-44a52db2a257"
    },
    "metricset": {
      "name": "collector",
      "period": 10000
    },
    "service": {
      "address": "http://localhost:9188/metrics",
      "type": "prometheus"
    },
    "event": {
      "dataset": "prometheus.collector",
      "module": "prometheus",
      "duration": 1019196708
    },
    "error": {
      "message": "unable to decode response from prometheus endpoint: failed to parse families: non-compliant scrape target sending blank Content-Type and no fallback_scrape_protocol specified for target"
    }
  }

the otel collector prometheus receiver actually suffered from the same bug after they recently upgraded to prometheus 0.300.x, open-telemetry/opentelemetry-collector-contrib#38018.

i am continuing to test, but right now I don't think we can merge this without more thorough consideration of breaking changes and more testing and validation.

FYI @lalit-satapathy

Hey 👋 -- Prometheus maintainer here and the person who updated the Prometheus dependency in the collector (I got here after @andrzej-stencel ping me 😄)

Your comment is spot on! This is a breaking change in Prometheus, but doesn't necessarily needs to be in the collector and its distributions. Prometheus started to fail scrapes that don't have content-type headers during negotiation, but that's where the new config fallback_scrape_protocol comes into play.

In the collector, during start-up we read the provided configuration for the Prometheus receiver and, if not present, we add fallback_scrape_protocol: PrometheusText0.0.4 to every scrape configuration

@ArthurSens
Copy link

We also noticed breaking changes once we upgraded the dependency of prometheus/common to 0.62.0, which relaxed its requirements about what kind of metric/label name is valid. Now, every UTF-8 character is valid, not just underscores.

I've noticed that in this PR, you managed to keep prometheus/common below 0.62, but the next releases of Prometheus require it. I just wanted to make sure you're aware :)

@cmacknz
Copy link
Member

cmacknz commented Feb 25, 2025

I am removing the EDOT collector's dependency on Beats in elastic/elastic-agent#7023 in the 9.0 branch to give us more time to sort this out.

@khushijain21
Copy link
Contributor

khushijain21 commented Mar 11, 2025

Is this still required for updating otel components to v0.120.0?

Because the parent PR is required to be merged in elastic-agent, to update recent changeset from ES-exporter

See issues blocked elastic/elastic-agent#7233 (comment)

@mergify
Copy link
Contributor

mergify bot commented Mar 27, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b update-prometheus-to-match-otel-v0.120.0 upstream/update-prometheus-to-match-otel-v0.120.0
git merge upstream/main
git push upstream update-prometheus-to-match-otel-v0.120.0

@mauri870
Copy link
Member

mauri870 commented Apr 1, 2025

This PR is blocking multiple tasks related to the OTel SRE work. Does anyone have any updates on this? According to the most recent messages, we found some breaking changes with the proposed fix, which are preventing it from being merged.

@ishleenk17
Copy link
Member

Please follow this issue #42916 for ongoing Prometheus Migration tasks.

@mauri870
Copy link
Member

mauri870 commented Apr 7, 2025

FYI we migrated directly to OTel v0.121.0 in elastic/elastic-agent#7686. Closing this PR since it is now obsolete.

@mauri870 mauri870 closed this Apr 7, 2025
@mauri870 mauri870 reopened this Apr 7, 2025
@mergify
Copy link
Contributor

mergify bot commented Apr 7, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b update-prometheus-to-match-otel-v0.120.0 upstream/update-prometheus-to-match-otel-v0.120.0
git merge upstream/main
git push upstream update-prometheus-to-match-otel-v0.120.0

@mauri870
Copy link
Member

mauri870 commented Apr 7, 2025

Sorry, I did not mean to close this, I thought it was elastic/elastic-agent#6912. Apologies.

@rdner
Copy link
Member

rdner commented Apr 22, 2025

@andrzej-stencel @cmacknz do we still need this PR or we solved it elsewhere?

@tommyers-elastic
Copy link
Contributor

@rdner we can close this, it's being addressed here

#43540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.