Skip to content

Conversation

@lzap
Copy link
Member

@lzap lzap commented Nov 12, 2020

The way we generate reports is memory unfriendly. All rows are stored in an array as hashes and then dumped into string which is returned to the client. In background job case, string is stored in the db or emailed.

Refactoring: Rewrite report_headers, report_row and report_render helpers so they immediately stream data into an IO object. Do not use any intermediate data structure, all formats we currently support (JSON, YAML. CSV, plaintext) do support this. When called via API, data can be streamed directly over TCP. In case of background processing, data can be streamed via SQL LOAD postgresql directly into blob/text.

This could be implemented without any user-facing changes - the three helpers will remain the same, but they will immediately stream content instead of storing anything in memory. The last call (report_render) will output optional footer and close the stream.

This is the first cut which implements streaming when using API/CLI:

bundle exec bin/hammer report-template generate --id 128
X,Y,Z
1,2,3

Only CSV and HTML implemented for now. I am currently investigating issue when development server gets stuck for some requests, maybe there is a different API than Live::Buffer available I need to dig it. All I need is an IO object I can pass into a template, this does not need to be "live" necessary just a regular IO stream.

New test was added to check all supported formats before the patch was made to ensure it works okay. Minimum user-level breaking was done, most templates if not all should work without any changes.

TODO:

  • implement V2 API for all formats
  • implement for the background job (stream into postgres)
  • test on development
  • test on production

@lzap lzap requested a review from ares November 12, 2020 17:48
@lzap
Copy link
Member Author

lzap commented Nov 13, 2020

I am having some issues with our app tho: https://discuss.rubyonrails.org/t/actioncontroller-live-chokes-at-buf-push/76490

Investigating.

@lzap lzap force-pushed the report-streaming-31274 branch from 447f2c3 to 49489a5 Compare November 13, 2020 15:17
response.headers['Cache-Control'] = 'no-cache'
response.headers['Last-Modified'] = '0'
response.headers['ETag'] = nil
response.stream.write("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't writing an empty string into an IO be essentially a no-op?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://edgeapi.rubyonrails.org/classes/ActionController/Live.html states:

Calling write or close on the response stream will cause the response object to be committed. Make sure all headers are set before calling write or close on your stream.

when :html
@output.write '<tr>'
if row_data.is_a? Array
@output.write row_data.map { |cell| "<td>#{ERB::Util.html_escape(cell)}</td>" }.join('')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compare to writing the cells one by one performance wise?

row_data.each do |cell|
  @output.write "<td>#{ERB::Util.html_escape(cell)}</td>"
end

response.headers['ETag'] = nil
response.stream.write("")
# TODO @composer.report_filename
@composer.render(output: response.stream, params: params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ActionController::Live is really meant to do Server Side Events. They have their own format. Also note that I don't think Apache on EL7 today supports HTTP/2 which means you can only have 6 HTTP connections to a host (with HTTP/2 that's 100 streams).

I think what you are looking for here is Transfer-Encoding: chunked. See https://coderwall.com/p/kad56a/streaming-large-data-responses-with-rails for an example.

if @composer.valid?
response = @composer.render(params: params)
send_data response, type: @composer.mime_type, filename: @composer.report_filename
response.headers['Content-Type'] = @composer.mime_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response.headers['Cache-Control'] = 'no-cache'
response.headers['Last-Modified'] = '0'
response.headers['ETag'] = nil
response.stream.write("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://edgeapi.rubyonrails.org/classes/ActionController/Live.html states:

Calling write or close on the response stream will cause the response object to be committed. Make sure all headers are set before calling write or close on your stream.

param :gzip, :bool, desc: N_('Compress the report uzing gzip'), default_value: false
param :report_format, ReportTemplateFormat.selectable.map(&:id), desc: N_("Report format, defaults to '%s'") % ReportTemplateFormat.default.id

def generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://edgeapi.rubyonrails.org/classes/ActionController/Live.html suggests you need to implement stream to actually send the data (in a separate thread).

@ezr-ondrej
Copy link
Member

@lzap are you planning to work on this?

@lzap
Copy link
Member Author

lzap commented Mar 17, 2021

I do.

@lzap
Copy link
Member Author

lzap commented Apr 8, 2021

Note for myself: This is relevant: https://piotrmurach.com/articles/streaming-large-zip-files-in-rails/

I need to take a look later.

@ares
Copy link
Member

ares commented Nov 10, 2021

Hello @lzap, given this is a draft for a very long time we'd like to know if you have a plan to move this out of the draft in a foreseeable future.

If we don't get a reply in 2 weeks, we'll close the PR but it can always be reopened.

@tbrisker
Copy link
Member

2 weeks have passed, closing. Feel free to reopen when you get back to this.

@tbrisker tbrisker closed this Nov 25, 2021
@lzap
Copy link
Member Author

lzap commented Mar 2, 2022

Sorry folks, I did not find time for this. It would be nice if somebody could pick this up, I feel like really close its just I am unsure how to finish the actual streaming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants