Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regexp matching of HTTP response headers to the http probe #419

Merged
merged 11 commits into from
Feb 21, 2019
Merged
20 changes: 18 additions & 2 deletions CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,30 @@ The other placeholders are specified separately.
# Probe fails if SSL is not present.
[ fail_if_not_ssl: <boolean> | default = false ]

# Probe fails if response matches regex.
# Probe fails if response body matches regex.
fail_if_matches_regexp:
[ - <regex>, ... ]

# Probe fails if response does not match regex.
# Probe fails if response body does not match regex.
fail_if_not_matches_regexp:
[ - <regex>, ... ]

# Probe fails if response header matches regex. For headers with multiple values, fails if *at least one* matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here and in other places

fail_if_header_matches_regexp:
[ - [ header: <string>,
[ regex: <regex>, ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, does not look quite good. I got it from the prometheus config documentation, e.g. here when the same level contains both optional and non-optional parameters, then it looks like this (comments stripped):

job_name: <job_name>
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

In here, though, the regex should be named regexp and be made non-optional, so it would look like this:

  # Probe fails if response header matches regex. For headers with multiple values, fails if *at least one* matches.
  fail_if_header_matches:
    [ - [ header: <string>,
          regexp: <regex>,
          [ allow_missing: <boolean> | default = false ]
        ], ...
    ]

Would that be OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around, the only similar one we have is some of pagerduty for the AM. There we break it out as a different type.

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example right in this repo is query_response in tcp_probe, it does not break it out as a separate type. That's why I did not do it here. Can extract both or keep inlined like above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For query_response they're all optional though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, extracted only the http header match spec.

[ allow_missing: <boolean> | default = false ]
], ...
]

# Probe fails if response header does not match regex. For headers with multiple values, fails if *none* match
fail_if_header_not_matches_regexp:
[ - [ header: <string>,
[ regex: <regex>, ]
[ allow_missing: <boolean> | default = false ]
], ...
]

# Configuration for TLS protocol of HTTP probe.
tls_config:
[ <tls_config> ]
Expand Down
52 changes: 39 additions & 13 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,19 +80,27 @@ type Module struct {

type HTTPProbe struct {
// Defaults to 2xx.
ValidStatusCodes []int `yaml:"valid_status_codes,omitempty"`
ValidHTTPVersions []string `yaml:"valid_http_versions,omitempty"`
IPProtocol string `yaml:"preferred_ip_protocol,omitempty"`
IPProtocolFallback bool `yaml:"ip_protocol_fallback,omitempty"`
NoFollowRedirects bool `yaml:"no_follow_redirects,omitempty"`
FailIfSSL bool `yaml:"fail_if_ssl,omitempty"`
FailIfNotSSL bool `yaml:"fail_if_not_ssl,omitempty"`
Method string `yaml:"method,omitempty"`
Headers map[string]string `yaml:"headers,omitempty"`
FailIfMatchesRegexp []string `yaml:"fail_if_matches_regexp,omitempty"`
FailIfNotMatchesRegexp []string `yaml:"fail_if_not_matches_regexp,omitempty"`
Body string `yaml:"body,omitempty"`
HTTPClientConfig config.HTTPClientConfig `yaml:"http_client_config,inline"`
ValidStatusCodes []int `yaml:"valid_status_codes,omitempty"`
ValidHTTPVersions []string `yaml:"valid_http_versions,omitempty"`
IPProtocol string `yaml:"preferred_ip_protocol,omitempty"`
IPProtocolFallback bool `yaml:"ip_protocol_fallback,omitempty"`
NoFollowRedirects bool `yaml:"no_follow_redirects,omitempty"`
FailIfSSL bool `yaml:"fail_if_ssl,omitempty"`
FailIfNotSSL bool `yaml:"fail_if_not_ssl,omitempty"`
Method string `yaml:"method,omitempty"`
Headers map[string]string `yaml:"headers,omitempty"`
FailIfBodyMatchesRegexp []string `yaml:"fail_if_matches_regexp,omitempty"`
FailIfBodyNotMatchesRegexp []string `yaml:"fail_if_not_matches_regexp,omitempty"`
FailIfHeaderMatchesRegexp []HeaderMatch `yaml:"fail_if_header_matches_regexp,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a string, so I'd remove the _regexp from the name to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would the confusion arise from? Is there a convention that properties ending with _regexp are always strings that represent regular expression?

I could rename the properties so it would be like this:

# ...
fail_if_body_matches_regexp:
  - 'Thou shall not pass'
fail_if_header_matches:
  - header: Host
    regexp: example.com

It would lose the consistency between property names for header and body, though. Is it worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the data structures are different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FailIfHeaderNotMatchesRegexp []HeaderMatch `yaml:"fail_if_header_not_matches_regexp,omitempty"`
Body string `yaml:"body,omitempty"`
HTTPClientConfig config.HTTPClientConfig `yaml:"http_client_config,inline"`
}

type HeaderMatch struct {
Header string `yaml:"header,omitempty"`
Regexp string `yaml:"regexp,omitempty"`
AllowMissing bool `yaml:"allow_missing,omitempty"`
}

type QueryResponse struct {
Expand Down Expand Up @@ -217,3 +225,21 @@ func (s *QueryResponse) UnmarshalYAML(unmarshal func(interface{}) error) error {
}
return nil
}

// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (s *HeaderMatch) UnmarshalYAML(unmarshal func(interface{}) error) error {
type plain HeaderMatch
if err := unmarshal((*plain)(s)); err != nil {
return err
}

if s.Header == "" {
return errors.New("header name must be set for HTTP header matchers")
}

if !s.AllowMissing && s.Regexp == "" {
return errors.New("regexp must be set for required HTTP headers")
}

return nil
}
4 changes: 4 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ func TestLoadBadConfigs(t *testing.T) {
ConfigFile: "testdata/invalid-dns-module.yml",
ExpectedError: "error parsing config file: query name must be set for DNS module",
},
{
ConfigFile: "testdata/invalid-http-header-match.yml",
ExpectedError: "error parsing config file: regexp must be set for required HTTP headers",
},
}
for i, test := range tests {
err := sc.ReloadConfig(test.ConfigFile)
Expand Down
11 changes: 11 additions & 0 deletions config/testdata/blackbox-good.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,14 @@ modules:
ip_protocol_fallback: false
validate_answer_rrs:
fail_if_matches_regexp: [test]
http_header_match_origin:
prober: http
timeout: 5s
http:
method: GET
headers:
Origin: example.com
fail_if_header_not_matches_regexp:
- header: Access-Control-Allow-Origin
regexp: '(\*|example\.com)'
allow_missing: false
8 changes: 8 additions & 0 deletions config/testdata/invalid-http-header-match.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
modules:
http_headers:
prober: http
timeout: 5s
http:
fail_if_header_not_matches_regexp:
- header: Access-Control-Allow-Origin
allow_missing: false
12 changes: 10 additions & 2 deletions example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,21 @@ modules:
headers:
Host: vhost.example.com
Accept-Language: en-US
Origin: example.com
no_follow_redirects: false
fail_if_ssl: false
fail_if_not_ssl: false
fail_if_matches_regexp:
fail_if_matches_regexp: # Matches regexp against response body
- "Could not connect to database"
fail_if_not_matches_regexp:
fail_if_not_matches_regexp: # Matches regexp against response body
- "Download the latest version here"
fail_if_header_matches_regexp: # Verifies that no cookies are set
- header: Set-Cookie
allow_missing: true
regexp: '.*'
fail_if_header_not_matches_regexp:
- header: Access-Control-Allow-Origin
regexp: '(\*|example\.com)'
tls_config:
insecure_skip_verify: false
preferred_ip_protocol: "ip4" # defaults to "ip6"
Expand Down
84 changes: 81 additions & 3 deletions prober/http.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"net/http"
"net/http/cookiejar"
"net/http/httptrace"
"net/textproto"
"net/url"
"regexp"
"strconv"
Expand All @@ -44,7 +45,7 @@ func matchRegularExpressions(reader io.Reader, httpConfig config.HTTPProbe, logg
level.Error(logger).Log("msg", "Error reading HTTP body", "err", err)
return false
}
for _, expression := range httpConfig.FailIfMatchesRegexp {
for _, expression := range httpConfig.FailIfBodyMatchesRegexp {
re, err := regexp.Compile(expression)
if err != nil {
level.Error(logger).Log("msg", "Could not compile regular expression", "regexp", expression, "err", err)
Expand All @@ -55,7 +56,7 @@ func matchRegularExpressions(reader io.Reader, httpConfig config.HTTPProbe, logg
return false
}
}
for _, expression := range httpConfig.FailIfNotMatchesRegexp {
for _, expression := range httpConfig.FailIfBodyNotMatchesRegexp {
re, err := regexp.Compile(expression)
if err != nil {
level.Error(logger).Log("msg", "Could not compile regular expression", "regexp", expression, "err", err)
Expand All @@ -69,6 +70,68 @@ func matchRegularExpressions(reader io.Reader, httpConfig config.HTTPProbe, logg
return true
}

func matchRegularExpressionsOnHeaders(header http.Header, httpConfig config.HTTPProbe, logger log.Logger) bool {
for _, headerMatchSpec := range httpConfig.FailIfHeaderMatchesRegexp {
values := textproto.MIMEHeader(header)[headerMatchSpec.Header]
if len(values) == 0 {
if !headerMatchSpec.AllowMissing {
level.Error(logger).Log("msg", "Missing required header", "header", headerMatchSpec.Header)
return false
} else {
continue // no need to match any regex on missing headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"No", and full stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and in the other place.

}
}

re, err := regexp.Compile(headerMatchSpec.Regexp)
if err != nil {
level.Error(logger).Log("msg", "Could not compile regular expression", "regexp", headerMatchSpec.Regexp, "err", err)
return false
}

for _, val := range values {
if re.MatchString(val) {
level.Error(logger).Log("msg", "Header matched regular expression", "header", headerMatchSpec.Header,
"regexp", headerMatchSpec.Regexp, "value_count", len(values))
return false
}
}
}
for _, headerMatchSpec := range httpConfig.FailIfHeaderNotMatchesRegexp {
values := textproto.MIMEHeader(header)[headerMatchSpec.Header]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't canonicalise the key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added tests

if len(values) == 0 {
if !headerMatchSpec.AllowMissing {
level.Error(logger).Log("msg", "Missing required header", "header", headerMatchSpec.Header)
return false
} else {
continue // no need to match any regex on missing headers
}
}

re, err := regexp.Compile(headerMatchSpec.Regexp)
if err != nil {
level.Error(logger).Log("msg", "Could not compile regular expression", "regexp", headerMatchSpec.Regexp, "err", err)
return false
}

anyHeaderValueMatched := false

for _, val := range values {
if re.MatchString(val) {
anyHeaderValueMatched = true
break
}
}

if !anyHeaderValueMatched {
level.Error(logger).Log("msg", "Header did not match regular expression", "header", headerMatchSpec.Header,
"regexp", headerMatchSpec.Regexp, "value_count", len(values))
return false
}
}

return true
}

// roundTripTrace holds timings for a single HTTP roundtrip.
type roundTripTrace struct {
tls bool
Expand Down Expand Up @@ -173,6 +236,11 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr
Help: "Indicates if probe failed due to regex",
})

probeFailedDueToHeaders = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "probe_failed_due_to_headers",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_http_

falied_due_regex should already cover this

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_http

Thee are several other labels that do not have the http: probe_ssl_earliest_cert_expiry and probe_failed_due_to_regex. Should they be changed, too? Or can I just ditch the newly introduced probe_failed_due_to_headers and keep the rest as is?

EDIT: formatting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be changed, but at least the ssl one is commonly used for alerting so we need to be a little careful so not now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will not touch the existing labels.

But I am not sure if representing the failed header matches with probe_failed_due_to_regex is the right way. It would be helpful if the user could see immediately from the metrics where the failure originated from. I do not see any drawbacks to this either (the extra disk usage would be negligible).

If you do insist, I can reuse probe_failed_due_to_regex. Do you insist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest reusing, either that or have two more clearly named metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, reused existing metric. Renaming existing ones would be a more dangerous breaking change indeed.

Help: "Indicates if probe failed due to headers",
})

probeHTTPLastModified = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "probe_http_last_modified_timestamp_seconds",
Help: "Returns the Last-Modified HTTP response header in unixtime",
Expand All @@ -190,6 +258,7 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr
registry.MustRegister(statusCodeGauge)
registry.MustRegister(probeHTTPVersionGauge)
registry.MustRegister(probeFailedDueToRegex)
registry.MustRegister(probeFailedDueToHeaders)

httpConfig := module.HTTP

Expand Down Expand Up @@ -320,7 +389,16 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr
level.Info(logger).Log("msg", "Invalid HTTP response status code, wanted 2xx", "status_code", resp.StatusCode)
}

if success && (len(httpConfig.FailIfMatchesRegexp) > 0 || len(httpConfig.FailIfNotMatchesRegexp) > 0) {
if success && (len(httpConfig.FailIfHeaderMatchesRegexp) > 0 || len(httpConfig.FailIfHeaderNotMatchesRegexp) > 0) {
success = matchRegularExpressionsOnHeaders(resp.Header, httpConfig, logger)
if success {
probeFailedDueToHeaders.Set(0)
} else {
probeFailedDueToHeaders.Set(1)
}
}

if success && (len(httpConfig.FailIfBodyMatchesRegexp) > 0 || len(httpConfig.FailIfBodyNotMatchesRegexp) > 0) {
success = matchRegularExpressions(resp.Body, httpConfig, logger)
if success {
probeFailedDueToRegex.Set(0)
Expand Down
Loading