textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile #1885

zmedico · 2020-11-14T00:12:11Z

Host operating system: output of `uname -a`

Linux hostname 5.4.65 #1 SMP Tue Sep 15 21:22:36 UTC 2020 x86_64 AMD EPYC 7551P 32-Core Processor AuthenticAMD GNU/Linux

node_exporter version: output of `node_exporter --version`

# node_exporter --version
node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)
  build user:       root@1f76dbbcfa55
  build date:       20200616-12:44:12
  go version:       go1.14.4

node_exporter command line flags

node_exporter -- --collector.textfile.directory=/var/lib/node_exporter/ --collector.bonding --collector.buddyinfo --collector.ntp --no-collector.infiniband --no-collector.nfs --no-collector.nfsd --no-collector.wifi --no-collector.zfs --web.listen-address=127.0.0.1:9100

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Write two different prom files to the textfile collector directory, containing metrics with the same names but different labels. For example:

$ cat /var/lib/node_exporter/app-instance-1.prom
# TYPE app_metric_name counter
app_metric_name{instance="1"} 0
$ cat /var/lib/node_exporter/app-instance-2.prom
# TYPE app_metric_name counter
app_metric_name{instance="2"} 0

What did you expect to see?

Expected to collect metrics with same names but different labels from both prom files.

What did you see instead?

Metrics were successfully collected from the first prom file, and metrics from the second prom file were rejected by the prometheus.checkMetricConsistency function at prometheus/registry.go#L914

The inconsistency can be avoided by replacing the full path of the prom file with the parent directory name, like this:

diff --git a/collector/textfile.go b/collector/textfile.go
index 50c1807..f464d44 100644
--- a/collector/textfile.go
+++ b/collector/textfile.go
@@ -254,7 +254,7 @@ func (c *textFileCollector) processFile(name string, ch chan<- prometheus.Metric
 
 	for _, mf := range families {
 		if mf.Help == nil {
-			help := fmt.Sprintf("Metric read from %s", path)
+			help := fmt.Sprintf("Metric read from %s", c.path)
 			mf.Help = &help
 		}
 	}

The error was not logged, but I was able to log it with the following patch:
* [from Gatherer #2] collected metric app_metric_name label:<name:"instance" value:"2" > counter:<value:0 > has help "Metric read from /var/lib/node_exporter/app-instance-2.prom" but should have "Metric read from /var/lib/node_exporter/app-instance-1.prom"

--- node_exporter-1.0.1/node_exporter.go
+++ node_exporter-1.0.1/node_exporter.go
@@ -31,6 +31,7 @@
 	"github.com/prometheus/node_exporter/collector"
 	"github.com/prometheus/node_exporter/https"
 	kingpin "gopkg.in/alecthomas/kingpin.v2"
+	stdlog "log"
 )
 
 // handler wraps an unfiltered http.Handler but uses a filtered handler,
@@ -121,6 +122,7 @@
 	handler := promhttp.HandlerFor(
 		prometheus.Gatherers{h.exporterMetricsRegistry, r},
 		promhttp.HandlerOpts{
+			ErrorLog:            stdlog.New(os.Stderr, "", stdlog.LstdFlags),
 			ErrorHandling:       promhttp.ContinueOnError,
 			MaxRequestsInFlight: h.maxRequests,
 			Registry:            h.exporterMetricsRegistry,

The text was updated successfully, but these errors were encountered:

zmedico · 2020-11-14T04:12:58Z

Looks like this is the same root cause as #1600.

discordianfish · 2020-11-24T10:17:49Z

The inconsistency can be avoided by replacing the full path of the prom file with the parent directory name, like this:

Might be a good option. @SuperQ wdyt? And maybe @juliusv has an opinion on this.

brian-brazil · 2020-11-24T10:37:09Z

IIRC, @SuperQ previously said that the use case of having the same metric in two files was explicitly not supported.

The directory name wouldn't be much use, as it'd be the same across all metrics so we'd no longer be defaulting in a value to help you figure out which metric came from which file.

discordianfish · 2020-11-24T10:46:13Z

Makes sense, but I still like to support the use case of having two separate processes report the same metric via the textfile exporter. If they'd need to write the same metrics file, they need to coordinate/merge their state somehow which seems annoying.. Any alternative suggestions?

juliusv · 2020-11-24T10:56:04Z

If someone wants to contribute code for it, the most optimal solution to this problem would seem to be to group identically named metrics from different files into a single output group with just one HELP comment saying something like Metric read from <file-1> and <file-2>.

brian-brazil · 2020-11-24T11:12:50Z

Any alternative suggestions?

If both files contain the same explicit HELP then this logic doesn't apply, and it'll all work.

juliusv · 2020-11-24T13:22:43Z

This is true, but it would be even better to make things work automatically even if the user forgot to provide their own HELP strings. Especially since this is not an error that you notice on startup, so for many people this will be a silent error that they don't notice unless they're super ontop of their monitoring.

brian-brazil · 2020-11-24T14:26:47Z

That's partially due to the ContinueOnError, which is a dangerous setting as it hides failures like this in a non-deterministic way.

juliusv · 2020-11-24T14:31:04Z

Agreed, the Node Exporter sets ContinueOnError here:

node_exporter/node_exporter.go

Line 124 in 202ecf9

ErrorHandling: promhttp.ContinueOnError,

That seems dangerous, but I'm not deep enough into the Node Exporter to say what kinds of things it would break if we set that to e.g. HTTPErrorOnError instead.

brian-brazil · 2020-11-24T15:09:13Z

Basically if there's a problem the whole scrape would fail with HTTPErrorOnError, rather than effectively silently failing in a non-deterministic way like it current does.

juliusv · 2020-11-24T15:28:38Z

Yes, that much is clear. I was just wondering if that would be too problematic to change it now. I was wondering why ContinueOnError was initially introduced into the Node Exporter, and it was in dace41e, explicitly for the textfile collector.

lymweb · 2021-05-31T09:06:32Z

same problem

discordianfish · 2021-06-08T08:36:26Z

Yeah this is still an open issue. Feel free to implement Julius suggestion:

If someone wants to contribute code for it, the most optimal solution to this problem would seem to be to group identically named metrics from different files into a single output group with just one HELP comment saying something like Metric read from and .

wkhere · 2022-01-27T01:40:16Z

Ouch, I also have hit that, and would benefit from such metrics merging.

Instead of single crons.prom file, now each job will have it's own. However, by default node_exporter' textfile don't merge them and use only first one. Details in prometheus/node_exporter#1885 Updates: * instead of single crons.prom file, each job now have - and locks - it's own * added same HELP header to all files * cron_job metric renamed to cronjob * minor README updates

This was referenced Nov 14, 2020

promhttp.HandlerOpts has nil ErrorLog which prevents errors from being logged #1886

Closed

Textfile Collector reading only 1 prom file #1600

Closed

discordianfish added accepted bug labels Nov 24, 2020

discordianfish mentioned this issue Aug 9, 2021

append test around same metrics name for multiple textfile #1698

Closed

hansmi mentioned this issue Aug 20, 2021

Clarify README, add support for writing to file hansmi/prometheus-textformat-merge#1

Merged

quatre mentioned this issue Sep 19, 2022

Merge metrics descriptions in textfile collector #2475

Merged

juliusv closed this as completed in #2475 Sep 20, 2022

cbryant42 mentioned this issue Apr 18, 2023

textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile prometheus-community/windows_exporter#1191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile #1885

textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile #1885

zmedico commented Nov 14, 2020

zmedico commented Nov 14, 2020

discordianfish commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

discordianfish commented Nov 24, 2020

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020 •

edited

Loading

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

juliusv commented Nov 24, 2020

lymweb commented May 31, 2021

discordianfish commented Jun 8, 2021

wkhere commented Jan 27, 2022 •

edited

Loading

textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile #1885

textfile: the same metric occurring with different labels in two different prom files trigger error due to inconsistent help message in processFile #1885

Comments

zmedico commented Nov 14, 2020

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

zmedico commented Nov 14, 2020

discordianfish commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

discordianfish commented Nov 24, 2020

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020 • edited Loading

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

juliusv commented Nov 24, 2020

brian-brazil commented Nov 24, 2020

juliusv commented Nov 24, 2020

lymweb commented May 31, 2021

discordianfish commented Jun 8, 2021

wkhere commented Jan 27, 2022 • edited Loading

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`

brian-brazil commented Nov 24, 2020 •

edited

Loading

wkhere commented Jan 27, 2022 •

edited

Loading