[Reporting] Abstract reports storage#106821
Conversation
ee1bccf to
729ba1f
Compare
|
Pinging @elastic/kibana-app-services (Team:AppServices) |
There was a problem hiding this comment.
Let's prefix unused variables with _ to stop warnings in the editor:
| _write(chunk: Buffer | string, encoding: string, callback: Callback) { | |
| _write(chunk: Buffer | string, _encoding: string, callback: Callback) { |
There was a problem hiding this comment.
getContentStream always seems to be called only once per file, so I am not seeing a benefit from having it returned from a factory function.
We could remove the factory wrapper by having getContentStream take reporting as the first argument.
There are a lot of factory functions in Reporting, but it is a leftover of the old platform. The existing code that's like that should all be cleaned up in this PR: #106940
tsullivan
left a comment
There was a problem hiding this comment.
This is looking great! I'm excited to get this in.
I left a few comments, but the main concern is we need to make sure when a report document is queried from Elasticsearch, the report was created by the authenticated user.
There was a problem hiding this comment.
We need to add the filter that was in jobsQueries.getContent, which was to match the user of the document with the authenticated user.
This would preserve the requirement that users can not download reports created by other users.
There was a problem hiding this comment.
I added that at first, but then I decided not to do that because I don't think it belongs there. The stream itself should be as simple as possible and responsible only for reading and writing the data. And apart from that, we already have this check in the store and perform that here before reading from the stream.
There was a problem hiding this comment.
++ after I wrote my previous comment, I realized that .get is checking the username for us.
There was a problem hiding this comment.
I had in mind that the abstracted file storage mechanism would be used inside of the task runner functions, (aka execute_job functions) so they can take control of streaming their output content to storage as it becomes available.
Internally, the file storage mechanism could chunk up the data into multiple documents as it is available, which lends itself towards solving #18322
It might be good to create a stream variable in _performJob, and pass it to the task runner functions on line 247. We can have the task runners return the stream back again because the _seq_no and _primary_term need to be updated. Then we could remove the parts from execute_report.ts that handle the entire output content. Something like that would be the only way to allow the csv.maxSizeBytes setting to be unlimited.
There was a problem hiding this comment.
i agree we should be writing inside execute job as we are getting data, but imo its ok to do that in a follow up PR, while keeping this PR as small as possible and just abstracting our storage without doing any other changes.
There was a problem hiding this comment.
I have addressed that in the latest commit. The task runner functions no longer return content but write that to the stream.
|
Let's also get @ppisljar to review |
x-pack/plugins/reporting/server/routes/lib/job_response_handler.ts
Outdated
Show resolved
Hide resolved
ppisljar
left a comment
There was a problem hiding this comment.
LGTM after Tims concerns are addressed.
tsullivan
left a comment
There was a problem hiding this comment.
LGTM
This tackles the issue very well. GREAT WORK!
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* Add duplex content stream * Add content stream factory * Move report contents gathering and writing to the content stream * Update jobs executors to use content stream instead of returning report contents # Conflicts: # x-pack/plugins/reporting/server/export_types/printable_pdf/execute_job/index.test.ts
* Add duplex content stream * Add content stream factory * Move report contents gathering and writing to the content stream * Update jobs executors to use content stream instead of returning report contents
| if_seq_no?: number; | ||
| } | ||
|
|
||
| export class ContentStream extends Duplex { |
There was a problem hiding this comment.
What is the expected behaviour of writing to a stream? Will we only ever allow writing to non-existing document IDs?
Summary
This pull-request is encapsulating the reporting storage logic behind a Node.js stream.
Resolves #98726.
Checklist
For maintainers