Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_http: add gzip option #4528

Merged
merged 1 commit into from
Jul 6, 2024

Conversation

rockliffelewis
Copy link
Contributor

@rockliffelewis rockliffelewis commented Jun 13, 2024

Which issue(s) this PR fixes:
Fixes #4410

What this PR does / why we need it:
Adds a compress gzip option to out_http to enable compressing the http request before sending it to save on bandwidth

Docs Changes:
fluent/fluentd-docs-gitbook#514

Release Note:

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I think we need to take into account the compression of buffer.
I commentted below on the details, so please check.

lib/fluent/plugin/out_http.rb Show resolved Hide resolved
test/plugin/test_out_http.rb Outdated Show resolved Hide resolved
lib/fluent/plugin/out_http.rb Outdated Show resolved Hide resolved
@daipom
Copy link
Contributor

daipom commented Jun 14, 2024

I'd like to add test patterns like this: c684509

However, the case of sending the gzipped chunk as is fails in my environment.
(Only one record decompressed on the test server.)
I'm checking the cause.

Failure: test_write_with_gzip[buffer_compress: "gzip", json_array: false](HTTPOutputTest::GZIP)
/home/daipom/work/fluentd/fluentd/test/plugin/test_out_http.rb:564:in `test_write_with_gzip'
     561:         result.content_type
     562:       )
     563:       assert_equal 'gzip', result.headers['content-encoding']
  => 564:       assert_equal test_events, result.data
     565:       assert_not_empty result.headers
     566:     end
     567:   end
<[{"bool"=>true, "message"=>"hello", "num"=>10},
 {"bool"=>false, "message"=>"hello", "num"=>11}]> expected but was
<[{"bool"=>true, "message"=>"hello", "num"=>10}]>

diff:
? [{"bool"=>true, "message"=>"hello", "num"=>10},
?                                               ]
?                                               ?
-  {"bool"=>false, "message"=>"hello", "num"=>11}]

@rockliffelewis
Copy link
Contributor Author

I'd like to add test patterns like this: c684509

However, the case of sending the gzipped chunk as is fails in my environment. (Only one record decompressed on the test server.) I'm checking the cause.

Failure: test_write_with_gzip[buffer_compress: "gzip", json_array: false](HTTPOutputTest::GZIP)
/home/daipom/work/fluentd/fluentd/test/plugin/test_out_http.rb:564:in `test_write_with_gzip'
     561:         result.content_type
     562:       )
     563:       assert_equal 'gzip', result.headers['content-encoding']
  => 564:       assert_equal test_events, result.data
     565:       assert_not_empty result.headers
     566:     end
     567:   end
<[{"bool"=>true, "message"=>"hello", "num"=>10},
 {"bool"=>false, "message"=>"hello", "num"=>11}]> expected but was
<[{"bool"=>true, "message"=>"hello", "num"=>10}]>

diff:
? [{"bool"=>true, "message"=>"hello", "num"=>10},
?                                               ]
?                                               ?
-  {"bool"=>false, "message"=>"hello", "num"=>11}]

I also noticed this in my testing, im assuming that the behavior is correct but not what the test is expecting.

Its delivering a single gzipped message rather than batching a few messages in the request.

I can add a few more tests to cover off on the different option combinations

@rockliffelewis
Copy link
Contributor Author

The issue here is that the test harness doesn't really handle getting multiple http requests for one test, and in the gzipped buffer case the messages will come one by one. Im not entirely sure how to fix this properly without doing something like this:

  assert_equal(
    data[:buffer_compress] == 'gzip' && !data[:json_array] ? [test_events[0]] : test_events,
    result.data
  )

@daipom
Copy link
Contributor

daipom commented Jun 14, 2024

@rockliffelewis Thanks for considering it!
I found the cause!

The cause is that zlib (Ruby) does not support gzip concatenation.
The chunk of Fluentd handles it as follows.

def string_decompress(compressed_data)
io = StringIO.new(compressed_data)
out = ''
loop do
gz = Zlib::GzipReader.new(io)
out << gz.read
unused = gz.unused
gz.finish
unless unused.nil?
adjust = unused.length
io.pos -= adjust
end
break if io.eof?
end
out
end

So, as you say, it's generally not a problem.
However, if we consider a use case where forwarding the gzip data to in_http, this could be a problem.
Currently, in_http does not support gzip concatenation.

# Content Encoding
# =================
# Decode payload according to the "Content-Encoding" header.
# For now, we only support 'gzip' and 'deflate'.
begin
if @content_encoding == 'gzip'.freeze
@body = Zlib::GzipReader.new(StringIO.new(@body)).read
elsif @content_encoding == 'deflate'.freeze
@body = Zlib::Inflate.inflate(@body)
end
rescue
@log.warn 'fails to decode payload', error: $!.to_s
send_response_and_close(RES_400_STATUS, {}, "")
return
end

So, if we consider a use case where forwarding the gzip data to in_http, it would be safer to start with a simple specification that always recompresses the chunk as the first implementation.

What do you think about this?

@rockliffelewis
Copy link
Contributor Author

Ah i see! I didnt realise the buffer was doing something fancy like that. I wouldn't expect most HTTP receivers to implement that either so at best we would need a recompress flag.

In my usecase I want to use in_http as the receiver, and I want the maximum compression so I would end up buffering it uncompressed anyway.

Signed-off-by: Lewis Rockliffe <[email protected]>
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.
I will wait a bit to merge for the other maintainers.

Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for your contribution!

@ashie ashie merged commit e5ca1c7 into fluent:master Jul 6, 2024
16 checks passed
@ashie ashie modified the milestone: v1.16.6 Jul 30, 2024
@daipom daipom added this to the v1.17.1 milestone Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compression Support in out_http plugin
3 participants