-
Notifications
You must be signed in to change notification settings - Fork 1k
Fixes #31274 - stream reports to save memory #8136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am having some issues with our app tho: https://discuss.rubyonrails.org/t/actioncontroller-live-chokes-at-buf-push/76490 Investigating. |
447f2c3 to
49489a5
Compare
| response.headers['Cache-Control'] = 'no-cache' | ||
| response.headers['Last-Modified'] = '0' | ||
| response.headers['ETag'] = nil | ||
| response.stream.write("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't writing an empty string into an IO be essentially a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://edgeapi.rubyonrails.org/classes/ActionController/Live.html states:
Calling write or close on the response stream will cause the response object to be committed. Make sure all headers are set before calling write or close on your stream.
| when :html | ||
| @output.write '<tr>' | ||
| if row_data.is_a? Array | ||
| @output.write row_data.map { |cell| "<td>#{ERB::Util.html_escape(cell)}</td>" }.join('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this compare to writing the cells one by one performance wise?
row_data.each do |cell|
@output.write "<td>#{ERB::Util.html_escape(cell)}</td>"
end| response.headers['ETag'] = nil | ||
| response.stream.write("") | ||
| # TODO @composer.report_filename | ||
| @composer.render(output: response.stream, params: params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead of writing directly to the steam, we can let rails handle it by passing an Enumerator to response_body similar to how we do it in https://github.com/theforeman/foreman/blob/develop/app/controllers/concerns/foreman/controller/csv_responder.rb and https://github.com/theforeman/foreman/blob/develop/app/services/csv_exporter.rb?
ekohl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ActionController::Live is really meant to do Server Side Events. They have their own format. Also note that I don't think Apache on EL7 today supports HTTP/2 which means you can only have 6 HTTP connections to a host (with HTTP/2 that's 100 streams).
I think what you are looking for here is Transfer-Encoding: chunked. See https://coderwall.com/p/kad56a/streaming-large-data-responses-with-rails for an example.
| if @composer.valid? | ||
| response = @composer.render(params: params) | ||
| send_data response, type: @composer.mime_type, filename: @composer.report_filename | ||
| response.headers['Content-Type'] = @composer.mime_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://edgeapi.rubyonrails.org/classes/ActionController/Live.html suggests it should be text/event-stream
| response.headers['Cache-Control'] = 'no-cache' | ||
| response.headers['Last-Modified'] = '0' | ||
| response.headers['ETag'] = nil | ||
| response.stream.write("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://edgeapi.rubyonrails.org/classes/ActionController/Live.html states:
Calling write or close on the response stream will cause the response object to be committed. Make sure all headers are set before calling write or close on your stream.
| param :gzip, :bool, desc: N_('Compress the report uzing gzip'), default_value: false | ||
| param :report_format, ReportTemplateFormat.selectable.map(&:id), desc: N_("Report format, defaults to '%s'") % ReportTemplateFormat.default.id | ||
|
|
||
| def generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://edgeapi.rubyonrails.org/classes/ActionController/Live.html suggests you need to implement stream to actually send the data (in a separate thread).
|
@lzap are you planning to work on this? |
|
I do. |
|
Note for myself: This is relevant: https://piotrmurach.com/articles/streaming-large-zip-files-in-rails/ I need to take a look later. |
|
Hello @lzap, given this is a draft for a very long time we'd like to know if you have a plan to move this out of the draft in a foreseeable future. If we don't get a reply in 2 weeks, we'll close the PR but it can always be reopened. |
|
2 weeks have passed, closing. Feel free to reopen when you get back to this. |
|
Sorry folks, I did not find time for this. It would be nice if somebody could pick this up, I feel like really close its just I am unsure how to finish the actual streaming. |
The way we generate reports is memory unfriendly. All rows are stored in an array as hashes and then dumped into string which is returned to the client. In background job case, string is stored in the db or emailed.
Refactoring: Rewrite report_headers, report_row and report_render helpers so they immediately stream data into an IO object. Do not use any intermediate data structure, all formats we currently support (JSON, YAML. CSV, plaintext) do support this. When called via API, data can be streamed directly over TCP. In case of background processing, data can be streamed via SQL LOAD postgresql directly into blob/text.
This could be implemented without any user-facing changes - the three helpers will remain the same, but they will immediately stream content instead of storing anything in memory. The last call (report_render) will output optional footer and close the stream.
This is the first cut which implements streaming when using API/CLI:
Only CSV and HTML implemented for now. I am currently investigating issue when development server gets stuck for some requests, maybe there is a different API than Live::Buffer available I need to dig it. All I need is an IO object I can pass into a template, this does not need to be "live" necessary just a regular IO stream.
New test was added to check all supported formats before the patch was made to ensure it works okay. Minimum user-level breaking was done, most templates if not all should work without any changes.
TODO: