Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of non-ASCII characters in regular expressions #347

Closed
maksim77 opened this issue Jul 31, 2018 · 7 comments
Closed

Incorrect handling of non-ASCII characters in regular expressions #347

maksim77 opened this issue Jul 31, 2018 · 7 comments

Comments

@maksim77
Copy link

Host operating system:

  • Darwin air.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
  • Linux prometheus 2.6.32-34-pve Format code #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter -version

blackbox_exporter, version 0.12.0 (branch: HEAD, revision: 4a22506cf0cf139d9b2f9cde099f0012d9fcabde)
  build user:       root@634195974c8e
  build date:       20180227-11:51:19
  go version:       go1.10

What is the blackbox.yml module config.

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      no_follow_redirects: false
      fail_if_ssl: false
      fail_if_not_ssl: false
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: "ip4"

  http_2xx_regex:
    prober: http
    timeout: 5s
    http:
      method: GET
      no_follow_redirects: false
      fail_if_ssl: false
      fail_if_not_ssl: false
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: "ip4"
      fail_if_not_matches_regexp:
        - "Сделанно на платформе"

What is the prometheus.yml scrape config.

  - job_name: blackbox_regex
    metrics_path: /probe
    params:
      module: [http_2xx_regex]
    static_configs:
      - targets:
        - http://some_url.ru
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115

What logging output did you get from adding &debug=true to the probe URL?

ts=2018-07-31T08:41:39.327158227Z caller=http.go:65 module=http_2xx_regex target=prometheus.io level=error msg="Body did not match regular expression" regexp="(Сделано на платформе)"

What did you do that produced an error?

Just set the fail_if_no_matches_regexp parameter to a string containing Unicode characters

What did you expect to see?

Successful check that the substring is found in the response

What did you see instead?

Always a failed check because the line "Сделанно на платформе" does not match "СÐеÐ" аннРнна Ð Ð Ð "аÑ' нрÐðµ""

@brian-brazil
Copy link
Contributor

What do the headers say is the encoding of that output page? Is the string you used valid utf-8?

@maksim77
Copy link
Author

@brian-brazil Of course!
https://pastebin.com/r5QF7Mx3 maybe full log provide more information

@brian-brazil
Copy link
Contributor

I'm looking for the HTTP response headers, which aren't included.

This is probably not something I can debug from here, so you may need to dig into this yourself.

@maksim77
Copy link
Author

Oh... i see.
In HTML code there is a string but in header only "Content-Encoding: gzip"

@brian-brazil
Copy link
Contributor

Okay, so it'd be interpreted as ASCII then I think. If this is still happening when you've an appropriate Content-Type header, then that's an issue.

@maksim77
Copy link
Author

maksim77 commented Jul 31, 2018

Thank you! I understand.

@brian-brazil
Copy link
Contributor

Gzipped responses should be handled transparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants