Skip to content

[Reporting] support long dashboards via multiple snapshots#248566

Closed
pmuellr wants to merge 14 commits intoelastic:mainfrom
pmuellr:138812-reporting-rectangles
Closed

[Reporting] support long dashboards via multiple snapshots#248566
pmuellr wants to merge 14 commits intoelastic:mainfrom
pmuellr:138812-reporting-rectangles

Conversation

@pmuellr
Copy link
Contributor

@pmuellr pmuellr commented Jan 10, 2026

resolves #138812

Summary

When "long" dashboards have reports generated, there are often artifacts, as described in the issue above.

The problem is that Chrome does not support generating a screenshot greater than 16K rows. Because we also use a zoom setting of 2, that only allows dashboards < 8K from not having artifacts.

To fix this, instead of taking a single snapshot of the entire dashboard, we will partition the dashboard into a vertical set of "tiles" of a fixed height, and take a snapshot of each tile. We then stitch them back together with @pdf-lib/upng.

We'll just do this for dashboards that are so long they need to be generated this way.

Note that it would be possible to also sort very WIDE dashboards, with a fair amount of work, and probably extremely slow - it's easy to compose vertical tiles of images, will be harder to compose them vertically as well.

New dependency @pdf-lib/upng

Purpose: What is this dependency used for? Briefly explain its role in your changes.

We extract "tiles" from a large image - basically a subset of the image. Then we combine them to form the full image. To do this, we need code that converts .png formatted image to RBGA pixels, and vice versa. This package provides that.

Justification: Why is adding this dependency the best approach?

Because this package is already indirectly included in Kibana via pdf-lib.

Alternatives explored: Were other options considered (e.g., using existing internal libraries/utilities, implementing the functionality directly)? If so, why was this dependency chosen over them?

Actually, I'd prefer to use a native solution, like sharp, but that will no doubt add more size, and I think we're trying to avoid native packages at this point.

Existing dependencies: Does Kibana have a dependency providing similar functionality? If so, why is the new one preferred?

Not one that is explicitly depended on. we actually do have sharp and also pngjs as devDependencies, but moving those into dependencies again, adds more size and product runtime deps. Feels better to use the existing implicirly depended on, already shipped package.

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

@pmuellr pmuellr added release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// backport:all-open Backport to all branches that could still receive a release Feature:Reporting:Screenshot Reporting issues pertaining to PNG/PDF file export labels Jan 10, 2026
@kibanamachine
Copy link
Contributor

Project deployments require a Github label, please add one or more of ci:project-deploy-(elasticsearch|observability|security) and trigger the job through the checkbox again.

@kibanamachine
Copy link
Contributor

Cloud deployments require a Github label, please add ci:cloud-deploy or ci:cloud-redeploy and trigger the job through the checkbox again.

@pmuellr pmuellr added the ci:cloud-deploy Create or update a Cloud deployment label Jan 14, 2026
@pmuellr pmuellr marked this pull request as ready for review January 14, 2026 04:23
@pmuellr pmuellr requested review from a team as code owners January 14, 2026 04:23
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@kibanamachine
Copy link
Contributor

kibanamachine commented Jan 14, 2026

Dependency Review Bot Analysis 🔍

Found 1 new third-party dependencies:

Package Version Vulnerabilities Health Score
@pdf-lib/upng 1.0.1 🔴 C: 0, 🟠 H: 0, 🟡 M: 0, 🟢 L: 0 @pdf-lib/upng

Self Checklist

To help with the review, please update the PR description to address the following points for each new third-party dependency listed above:

  • Purpose: What is this dependency used for? Briefly explain its role in your changes.
  • Justification: Why is adding this dependency the best approach?
  • Alternatives explored: Were other options considered (e.g., using existing internal libraries/utilities, implementing the functionality directly)? If so, why was this dependency chosen over them?
  • Existing dependencies: Does Kibana have a dependency providing similar functionality? If so, why is the new one preferred?

Thank you for providing this information!

@pmuellr
Copy link
Contributor Author

pmuellr commented Jan 14, 2026

Note for anyone looking at the "new" dep @pdf-lib/upng. This was already an implicit dependency, from pdf-lib. We're just referencing it specifically. It doesn't look to be exposed via pdf-lib itself, so pulled this new one in explicitly.

From main:

https://github.com/elastic/kibana/blob/c75d09e86439f2213e779fcc4c3cdb8f679b9507/yarn.lock#L28148-L28156

https://github.com/elastic/kibana/blob/c75d09e86439f2213e779fcc4c3cdb8f679b9507/yarn.lock#L11615-L11620

@pmuellr pmuellr added ci:cloud-deploy Create or update a Cloud deployment ci:cloud-redeploy Always create a new Cloud deployment and removed ci:cloud-deploy Create or update a Cloud deployment labels Jan 20, 2026
@pmuellr pmuellr added the ci:cloud-persist-deployment Persist cloud deployment indefinitely label Jan 20, 2026
@pmuellr
Copy link
Contributor Author

pmuellr commented Jan 20, 2026

/ci

@pmuellr
Copy link
Contributor Author

pmuellr commented Jan 21, 2026

I'm not having any success generating a moderately long report using the current level of code. Here's a log of what I'm seeing:

04:36:45.089Z  evaluate GetTimeRange                                                                                                                                         
04:36:45.090Z  evaluate ElementPositionAndAttributes                                                                                                                         
04:36:45.090Z  evaluate GetVisualisationsRenderErrors                                                                                                                        
04:36:45.605Z  timeRange: Jan 13, 2026 @ 00:00:00.000 to Jan 20, 2026 @ 23:36:34.025                                                                                         
04:36:45.608Z  Setting viewport to: width=1430 height=8881 scaleFactor=2                                                                                                     
04:36:45.608Z  taking screenshots                                                                                                                                            
04:36:46.200Z  getTiledScreenshot: allocated buffer: width: 1430; height: 8744; bufferSize: 200062720                                                                        
04:36:46.200Z  getTiledScreenshot: getting tile: {\"x\":0,\"y\":137,\"width\":1430,\"height\":8000}                                                                          
04:36:46.260Z  Message in browser console: { text: \"Detected a viewport resize: width=1430 height=8881 scaleFactor:2\"                                                         
04:36:52.738Z  Protocol error (Page.captureScreenshot): Target closed                                                                                                        
04:36:52.739Z  deleting chromium user data directory at [/usr/share/kibana/data/chromium-xx9o8L]                                                                             
04:36:52.739Z  It looks like the browser is no longer being used. Closing the browser...                                                                                     
04:36:52.739Z  Protocol error (Page.captureScreenshot): Target closed                                                                                                        
04:36:52.739Z  Protocol error (Performance.getMetrics): Session closed. Most likely the page has been closed.                                                                
04:36:52.740Z  Attempting to close browser...                                                                                                                                
04:36:52.740Z  Saving execution error for PNGV2 job 181f6b3d-33f8-480a-8294-f999fb225ae4: Error: Protocol error (Page.captureScreenshot): Target closed                      
04:36:52.778Z  Browser closed.                                                                                                                                               
04:36:52.778Z  child process closed                                                                                                                                          
04:36:53.651Z  Job 181f6b3d-33f8-480a-8294-f999fb225ae4 failed on its last attempt and will not be retried. Error: Protocol error (Page.captureScreenshot): Target closed.   
04:36:53.651Z  Reports running: 0.                                                                                                                                           

So the report is 1430 wide by 8881 high. Each pixel will be 4 bytes - RGBA, and we're at zoom:2, which means double the width and height (so 4 times bigger), so 1430 * 8881 * 4 * 4 == 203197280.

I originally read that as 20MB, but it's 203,197,280 - 200MB! We allocate space for that right when we log getTiledScreenshot: allocated buffer: ..., and the next log line is where we get the first tile of the screen shot. And then the browser closes!

My current guess is that node js has allocated that memory that chromium needed to build the screenshot - it will need a fair amount of memory to get a shot that big as well. I'll try with a smaller tile size, but I'm not hopefult this technique is going to be practical. And it makes me wonder if even a native png tile stitching function would use significantly less memory. Just not a great "job" for a low-end, low-memory non-gpu linux box?

@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 27, 2026

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #28 / "before all" hook in "{root}"
  • [job] [logs] FTR Configs #119 / Canvas Canvas app Canvas PDF Report Generation Print PDF button creates a PDF with correct response headers
  • [job] [logs] FTR Configs #119 / Canvas Canvas app Canvas PDF Report Generation Print PDF button creates a PDF with correct response headers
  • [job] [logs] Jest Tests #3 / Create Layout creates preserve layout instance
  • [job] [logs] Jest Tests #3 / Create Layout creates preserve layout instance

Metrics [docs]

✅ unchanged

History

@pmuellr
Copy link
Contributor Author

pmuellr commented Jan 29, 2026

Going to close this issue and fix the "long report" issue with PR #248785 .

The "tiling" approach in this PR does work, but requires a lot of memory. It may be useful in the future if we move to some kind of centralized printing service where memory isn't as much of an issue.

@pmuellr pmuellr closed this Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:all-open Backport to all branches that could still receive a release ci:cloud-deploy Create or update a Cloud deployment ci:cloud-persist-deployment Persist cloud deployment indefinitely ci:cloud-redeploy Always create a new Cloud deployment Feature:Reporting:Screenshot Reporting issues pertaining to PNG/PDF file export release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t//

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Reporting][Research] Large dashboard layout screen capture contains glitch

3 participants