Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of compressed HTTP responses #684

Closed
mem opened this issue Aug 25, 2020 · 9 comments · Fixed by #764
Closed

Handling of compressed HTTP responses #684

mem opened this issue Aug 25, 2020 · 9 comments · Fixed by #764

Comments

@mem
Copy link
Contributor

mem commented Aug 25, 2020

Host operating system: output of uname -a

Linux clara 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter -version

blackbox_exporter, version 0.17.0 (branch: fix_content_length, revision: 58e365f15172f93709d2e4de4863bc12a8aef368)
  build user:       marcelo@clara
  build date:       20200825-16:25:42
  go version:       go1.14.7

What is the blackbox.yml module config.

modules:
  http:
    prober: http
    http:
      method: GET
      headers:
        Accept-Encoding: gzip
      fail_if_body_matches_regexp:
        - "<!DOCTYPE html>"
      preferred_ip_protocol: "ip4"
      ip_protocol_fallback: false

What is the prometheus.yml scrape config.

None.

What logging output did you get from adding &debug=true to the probe URL?

Logs for the probe:
ts=2020-08-25T16:28:51.176627Z caller=main.go:304 module=http target=https://prometheus.io/ level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2020-08-25T16:28:51.177415706Z caller=http.go:342 module=http target=https://prometheus.io/ level=info msg="Resolving target address" ip_protocol=ip4
ts=2020-08-25T16:28:51.255273005Z caller=http.go:342 module=http target=https://prometheus.io/ level=info msg="Resolved target address" ip=172.67.141.23
ts=2020-08-25T16:28:51.255468626Z caller=client.go:252 module=http target=https://prometheus.io/ level=info msg="Making HTTP request" url=https://172.67.141.23/ host=prometheus.io
ts=2020-08-25T16:28:51.51694598Z caller=main.go:119 module=http target=https://prometheus.io/ level=info msg="Received HTTP response" status_code=200
ts=2020-08-25T16:28:51.517282451Z caller=main.go:119 module=http target=https://prometheus.io/ level=info msg="Response timings for roundtrip" roundtrip=0 start=2020-08-25T10:28:51.255713943-06:00 dnsDone=2020-08-25T10:28:51.255713943-06:00 connectDone=2020-08-25T10:28:51.308865333-06:00 gotConn=2020-08-25T10:28:51.371621155-06:00 responseStart=2020-08-25T10:28:51.516747842-06:00 end=2020-08-25T10:28:51.517265582-06:00
ts=2020-08-25T16:28:51.51759115Z caller=main.go:304 module=http target=https://prometheus.io/ level=info msg="Probe succeeded" duration_seconds=0.340772252



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.077859237
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.340772252
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length -1
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.053151324
probe_http_duration_seconds{phase="processing"} 0.145126765
probe_http_duration_seconds{phase="resolve"} 0.077859237
probe_http_duration_seconds{phase="tls"} 0.115907188
probe_http_duration_seconds{phase="transfer"} 0.000517688
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 0
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 4585
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 2
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.601109704e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.628424e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.628424e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="44c9e62838db98e79918c841e4b72529849e2e7e6654e337a32a74ec502edaa7"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.3"} 1



Module configuration:
prober: http
http:
    preferred_ip_protocol: ip4
    method: GET
    headers:
        Accept-Encoding: gzip
    fail_if_body_matches_regexp:
        - <!DOCTYPE html>
tcp:
    ip_protocol_fallback: true
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

What did you do that produced an error?

See configuration above. Target is https://prometheus.io/

What did you expect to see?

probe_success 0

What did you see instead?

probe_success 1

By adding:

    headers:
        Accept-Encoding: gzip

the server is correctly returning a compressed response. The uncompressed body matches the specified regular expression, and my expectation is that the probe fails.

BBE is not decompressing the response and the regular expression is being applied to the compressed response, and therefore does not match.

@brian-brazil
Copy link
Contributor

I don't think it's the responsibility of this exporter to try to decode arbitrary opaque data that a server sends back based on hand-configured headers.

@mem
Copy link
Contributor Author

mem commented Aug 27, 2020

I can understand the argument, but I still think there's something that needs changing.

  • Since BBE is not passing an Accept-Encoding header explicitly, per RFC 7231, the server can consider any content to be acceptable and might respond with compressed data. Users can add header validations to verify that this is either happening or not happening. Users can also add Accept-Encoding headers to verify that the server is handling the request as expected.
  • We are talking about Content-Encoding not Content-Type, so this is not about being able to handle arbitrary data formats (jpeg, zip, etc; that's Content-Type), but a known set of compression algorithms that "allow a representation's data to be compressed without losing the identity of its underlying media type." By not handling Content-Encoding we are limiting the scope where body validations can operate (and the user has no way of making BBE behave differently), which is this issue.
  • probe_http_uncompressed_body_length will never be different from probe_http_content_length (this is what actually started me down the path of handling compression, because the existence of this metric set the expectation for me that BBE will decompress the body, and I suspect that it did at some point, because looking at git history, one of the underlying libraries changed behavior along the way).

Is adding something like "ValidContentEncodings" (similar to ValidResponseCodes) a solution here? What I mean is instead of making BBE transparently handle Content-Encoding, have the user specify which encodings should be handled (out of a specific set)? It would be the user's responsibility to pass whatever Accept-Encoding header is appropriate and set up header validations to verify the responses).

Or should BBE simply handle a known set of encodings (without something like ValidContentEncodings, since the user has a way to set up the headers), maybe with an option to disable this?

@brian-brazil
Copy link
Contributor

a known set of compression algorithms

I disagree that they're known, a server can return in principle any arbitrary encoding it wants to.

have the user specify which encodings should be handled (out of a specific set)?

The user can already check response headers, so I don't see what having a 2nd way to do that would gain us.

@mem
Copy link
Contributor Author

mem commented Aug 27, 2020

a known set of compression algorithms

I disagree that they're known, a server can return in principle any arbitrary encoding it wants to.

While that is true, RFC 7231 does say:

The following content-coding values are defined by this
specification:

 compress (and x-compress): See Section 4.2.1 of [RFC7230].

 deflate: See Section 4.2.2 of [RFC7230].

 gzip (and x-gzip): See Section 4.2.3 of [RFC7230].

Current browsers support "br" in addition to these, but I'm fine supporting only what the RFC lists. That said, compress is not clearly defined, as decompressing it requires a parameter that is not specified in the response (and it's likely not used anywhere on the modern Internet).

have the user specify which encodings should be handled (out of a specific set)?

The user can already check response headers, so I don't see what having a 2nd way to do that would gain us.

The question is more about which values should be automatically handled by BBE.

As an example, if the user sets Accept-Encoding: * (or leaves it blank) how would BBE know which format to decompress? They can verify that the Content-Encoding header has a value that they expect, but what I'm asking here is about what would BBE decompress automatically. That's why I posed the possibility of another setting disabling automatic decompression, in other words, decompress whatever is supported or don't decompress at all (with the default being don't decompress so that the existing behavior is preserved).

@brian-brazil
Copy link
Contributor

While that is true, RFC 7231 does say:

I was not aware, that changes things a bit. Are implementations required to support these though?

how would BBE know which format to decompress?

I'm still not getting your point here, why would you need to configure the blackbox exporter to tell the exporter what features the blackbox exporter has?

@mem
Copy link
Contributor Author

mem commented Feb 17, 2021

Sorry I didn't come back to this issue in a timely manner...

RFC 7231 says:

3.1.2.2.  Content-Encoding

   The "Content-Encoding" header field indicates what content codings
   have been applied to the representation, beyond those inherent in the
   media type, and thus what decoding mechanisms have to be applied in
   order to obtain data in the media type referenced by the Content-Type
   header field.  Content-Encoding is primarily used to allow a
   representation's data to be compressed without losing the identity of
   its underlying media type.

   [...]

   Unlike Transfer-Encoding (Section 3.3.1 of [RFC7230]), the codings
   listed in Content-Encoding are a characteristic of the
   representation; the representation is defined in terms of the coded
   form, and all other metadata about the representation is about the
   coded form unless otherwise noted in the metadata definition.
   Typically, the representation is only decoded just prior to rendering
   or analogous usage.

   If the media type includes an inherent encoding, such as a data
   format that is always compressed, then that encoding would not be
   restated in Content-Encoding even if it happens to be the same
   algorithm as one of the content codings.  [...]

In order to be able apply validations like fail_if_body_matches_regexp, BBE needs to get to the data before it was encoded.

If I can restate the original issue, I think there's are two things to consider:

  1. There's a bad interaction between allowing users to specify arbitrary headers and providing checks that will work against the data before encoding. By using Accept-Encoding: gzip the user is asking the server to compress if possible. The Content-Type of that response is still text/html, which is what the paragraph above is trying to explain: the encoding is not changing the response, just changing the representation of the response.
  2. By not specifying Accept-Encoding we are defaulting to anything goes, meaning that BBE should be prepared to deal with well known encodings (compress, gzip, deflate, optionally brotli).

What I was trying to say in later responses is that some users might actually want to verify that the response is compressed (by looking at the data as returned from the server), by writing a regular expression that matches the compressed data by using something like \x1f\x8b\x08\x00\x00..., and that would require a mechanism to prevent BBE from decompressing the response before applying the check.

The other point that I think is relevant is that we do have a metric probe_http_uncompressed_body_length that suggests to the users that we will decompress the response if it's compressed.

@mem
Copy link
Contributor Author

mem commented Feb 17, 2021

@roidelapluie I would appreciate your take on this.

@roidelapluie
Copy link
Member

I think that if we support that, we should do it in a way that we do not let the user specify the headers, but that we specify the compression ourseves, e.g. compression: gzip, which would set the header and decode the answer accordingly. We would also force the target to reply compressed as well (the blackbox exporter would then verify the compression). WDYT?

@mem
Copy link
Contributor Author

mem commented Feb 19, 2021

That sounds good to me. If the user is trying to verify that including Accept-Encoding: gzip actually causes the server to return a compressed response, then adding an option on our side to verify which compression to request and expect is a good way to express that. This is good if the user is trying to verify the behavior with common browser requests.

Braindump:

Adding compression: gzip would cause BBE to reject a configuration that also includes any Accept-Encoding header other than exactly gzip.

The user can still add Accept-Encoding: gzip by hand without using compression: gzip. BBE would not decompress the response in that case or report a failure if it's not compressed. This would allow users to verify for example that the server is not compressing the response even if the header is present.

Can the user verify that the server is behaving as expected in the absence of Accept-Encoding? That header is optional, and it means "give me whatever you can, and I'll deal with it". The user might want to verify either that the server is returning compressed data or that it isn't. The former is easy: check header, check body using a regular expression that matches the compressed data. The later, too: verify the header is not present, match body against uncompressed data.

Yeah, that sounds good to me.

mem added a commit that referenced this issue Mar 16, 2021
If the module configuration specifies the "compression" option
blackbox_exporter will try to decompress the response using the
specified algorithm. If the response is not compressed using that
algorithm, the probe will fail.

It validates that the "Accept-Encoding" header is either absent, or that
it specifies the same algorithm as the "compression" option. If the
"Accept-Encoding" header is present but it specifies a different
algorithm, the probe will fail.

If the compression option is *not* used, probe_http_content_length and
probe_http_uncompressed_body_length will have the same value
corresponding to the original content length. If the compression option
is used and the content can be decompressed, probe_http_content_length
will report the original content length as it currently does, and
probe_http_uncompressed_body_length will report the length of the body
after decompression as expected.

Fixes #684

Signed-off-by: Marcelo E. Magallon <[email protected]>
mem added a commit that referenced this issue Mar 16, 2021
If the module configuration specifies the "compression" option
blackbox_exporter will try to decompress the response using the
specified algorithm. If the response is not compressed using that
algorithm, the probe will fail.

It validates that the "Accept-Encoding" header is either absent, or that
it specifies the same algorithm as the "compression" option. If the
"Accept-Encoding" header is present but it specifies a different
algorithm, the probe will fail.

If the compression option is *not* used, probe_http_content_length and
probe_http_uncompressed_body_length will have the same value
corresponding to the original content length. If the compression option
is used and the content can be decompressed, probe_http_content_length
will report the original content length as it currently does, and
probe_http_uncompressed_body_length will report the length of the body
after decompression as expected.

Fixes #684

Signed-off-by: Marcelo E. Magallon <[email protected]>
mem added a commit that referenced this issue Apr 19, 2021
If the module configuration specifies the "compression" option
blackbox_exporter will try to decompress the response using the
specified algorithm. If the response is not compressed using that
algorithm, the probe will fail.

It validates that the "Accept-Encoding" header is either absent, or that
it specifies the same algorithm as the "compression" option. If the
"Accept-Encoding" header is present but it specifies a different
algorithm, the probe will fail.

If the compression option is *not* used, probe_http_content_length and
probe_http_uncompressed_body_length will have the same value
corresponding to the original content length. If the compression option
is used and the content can be decompressed, probe_http_content_length
will report the original content length as it currently does, and
probe_http_uncompressed_body_length will report the length of the body
after decompression as expected.

Fixes #684

Signed-off-by: Marcelo E. Magallon <[email protected]>
@mem mem closed this as completed in #764 Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants