How to get a stream using Page.printToPDF method? #308

baofeidyz · 2024-01-31T07:06:52Z

I am attempting to convert HTML to a PDF file, but I'm encountering issues due to excessively large image data, causing the process to slow down. I have tried using the Page.printToPDF method with the transferMode set to ReturnAsStream parameters.

However, the value of the result['stream'] is consistently '1', and I am unsure about why this is happening and how to resolve it. Any assistance would be greatly appreciated.

My Chrome version is: 121.0.6167.85 (x86_64)
My system OS version is: macOS 13.5.2

I am using Python with Selenium, and the code is as follows:

import time

from selenium import webdriver

options = webdriver.ChromeOptions()
# options.add_argument("--headless=new")
# options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options)
driver.get("https://nodejs.org/api/fs.html")
scroll_distance = 200
scroll_interval = 0.1
current_scroll_position = driver.execute_script("return window.scrollY;")
num_scrolls = int(
    (driver.execute_script("return document.body.scrollHeight;") - current_scroll_position) / scroll_distance)

for i in range(num_scrolls):
    print(f'scroll pages {i + 1}/{num_scrolls}')
    driver.execute_script(f"window.scrollBy(0, {scroll_distance});")
    time.sleep(scroll_interval)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
# https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
result = driver.execute_cdp_cmd(
    "Page.printToPDF",
    {
        "printBackground": True,
        "generateTaggedPDF": True,
        "transferMode": "ReturnAsStream"
    })
pdf_stream = result['stream']

with open('demo.pdf', 'ab') as pdf_file:
    chunk_size = 1024
    for chunk in pdf_stream:
        pdf_file.write(chunk)

wynnw · 2025-02-21T15:47:17Z

For anyone else who runs into this question in the future, I had the same issue and figured out that the result['stream'] value must be used with the CDP IO.read() function, passing the handle=result['stream'] as the handle parameter (and making sure to take care of the base64 encoding stuff, and watching for the eof flag. It's pretty easy to do, and does save a lot of memory overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a stream using Page.printToPDF method? #308

How to get a stream using Page.printToPDF method? #308

baofeidyz commented Jan 31, 2024

wynnw commented Feb 21, 2025

How to get a stream using Page.printToPDF method? #308

How to get a stream using Page.printToPDF method? #308

Comments

baofeidyz commented Jan 31, 2024

wynnw commented Feb 21, 2025