Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Trying to connect to existing playwright session via Chromium CDP #11442

Closed
oliverswitzer opened this issue Jan 17, 2022 · 5 comments

Comments

@oliverswitzer
Copy link

oliverswitzer commented Jan 17, 2022

Hi,

I have a particular use case that involves starting a playwright Chromium session via node script, and connecting to that session via chromium.connectOverCDP via a playwright wrapper library from another language (I'm using playwright-elixir).

The reason for me wanting to do this is to take advantage of node libraries that add on functionality to playwright, like Header rotation etc, while still using my preferred language to interact with playwright. I expect these libraries will make these modifications to the browse session:

  • set things on window
  • adding initialization scripts
  • set headers
  • set cookies
  • override browse capabilities

My hope is that connected to said session via CDP will allow me to inherit those modifications made to the session by the playwright libraries I plan on using.

Here's an example of two of the libraries I was hoping to use:

I do not wish to re-implement these library behaviors in Elixir, and would much rather attach to an existing session that was started in node that has used one of these libraries.

I have been reading this previous issue that seems to state that you can use Chromium + chromium.connectOverCDP to connect to a pre-existing playwright session.

I have attempted to re-create what this comment from @dmitrysteblyuk and this comment from @mxschmitt seem to suggest as the way to do this with the following script, but have run into a few issues.

The breakdown is:

  1. Launch a chromium session with devtools and remote debugger enabled
  2. Modify the browser context by adding cookies, setting initialization scripts, etc...
  3. Create a page on said context and navigate to google
  4. Create a new browser by using connectOverCDP() with the cdp websocket url I retrieve from the locally running instance of the debugger
  5. Verify that I have a pre-existing context on that browser, as well as a page that is pointed at google
  6. Verify that the modifications I made in step 2 to the first browser context have persisted (things have been set on window, headers have been modified, etc)
  7. Navigate to a new url using the existing page.

So far I have no been able to get past step 5.

import { chromium } from "playwright";
import fetch from "node-fetch";

(async () => {
  // 1. 
  const browserOne = await chromium.launch({
    args: ["--remote-debugging-port=9222"],
    devTools: true,
    headless: false,
  });

  // 2.
  const contextOne = await browserOne.newContext({
    userAgent: "Some Overriden User Agent",
  });
  await contextOne.addCookies([
    {
      name: "Some cookie",
      value: "Some cookie value",
      url: "https://example.com",
    },
  ]);
  contextOne.addInitScript(() => (window.hello = "hello"));

  // 3. 
  const pageOne = await contextOne.newPage();
  await pageOne.goto("https://google.com");

  // 4.
  // BEGIN: Try to connect to previously created Chromium session via CDP
  const [{ webSocketDebuggerUrl: debugWsUrl }] = await fetch(
    "http://localhost:9222/json/list"
  ).then((r) => r.json());

  const browserTwo = await chromium.connectOverCDP(debugWsUrl);

  // 5.
  // Shows 1 context
  console.log(
    "Number of contexts in CDP browser session: ",
    browserTwo.contexts().length
  );
  const contextTwo = browserTwo.contexts()[0];

  // Shows 0 pages on said context
  console.log(
    "Number of pages in CDP browser session: ",
    contextTwo.pages().length
  );

  // Creating a new page from the context we find in the CDP session blows up with error. In reality, I'd actually like to use the 
  // pre-existing page to navigate, but I don't see one. So instead I try to create a newPage:
  
  const pageTwo = await contextTwo.newPage();

  // That blows up with:
  //  browserContext.newPage: Cannot read property 'pageOrError' of undefined
  //    at file:///Users/oliverswitzer/workspace/playwright-cdp-spike/first_session.js:50:36 { name: 'TypeError' }
})();

Here is the repo with the above script that you can use to reproduce this issue.

While I can see that there is a pre-existing browser context after connecting over CDP to the browserOne session, I do not see that there are any pre-existing pages on that context.

Also, when I try to create a new page with await contextTwo.newPage(); on the context created via CDP I get the following error:

node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

browserContext.newPage: Cannot read property 'pageOrError' of undefined
    at file:///Users/oliverswitzer/workspace/playwright-cdp-spike/first_session.js:50:36 {
  name: 'TypeError'
}

In googling for the error that I received when created a newPage I found this old playwright issue thread from 2020. It claims that it is due to using too old of a version of Chrome. However, in this case I am using Chromium, and have also validated that my version is the expected version from playwright's perspective:

I have verified that I am using the pinned version by opening Chromium from my library cache ~/Library/Caches/ms-playwright/chromium-939194/chrome-mac/Chromium.app and checking the version to compare to what is pinned in npm for playwright:

image
image

Any insights on how to proceed would be greatly appreciated!

Thanks

@mxschmitt
Copy link
Member

mxschmitt commented Jan 19, 2022

Something like this might work for you:

//@ts-check
import { chromium } from 'playwright';

(async () => {
  const context1 = await chromium.launchPersistentContext('', {
    args: ['--remote-debugging-port=9222'],
    headless: false,
    userAgent: 'Some Overriden User Agent',
  });

  await context1.addCookies([
    {
      name: 'Some cookie',
      value: 'Some cookie value',
      url: 'https://example.com',
    },
  ]);
  context1.addInitScript(() => window.hello = 'hello');

  const pageOne = await context1.newPage();
  await pageOne.goto('https://google.com');

  const browser2 = await chromium.connectOverCDP('http://localhost:9222');

  // Shows 1 context
  console.log(
      'Number of contexts in CDP browser session: ',
      browser2.contexts().length
  );
  const contextTwo = browser2.contexts()[0];

  // Shows 0 pages
  console.log(
      'Number of pages in CDP browser session: ',
      contextTwo.pages().length
  );

  // Creating a new page blows up with error:
  //
  // browserContext.newPage: Cannot read property 'pageOrError' of undefined
  //    at file:///Users/oliverswitzer/workspace/playwright-cdp-spike/first_session.js:50:36 { name: 'TypeError' }
  const pageTwo = await contextTwo.newPage();
  await browser2.close();
  await context1.close();
})();

@oliverswitzer
Copy link
Author

@mxschmitt that totally works, thank you! It sounds like the magic sauce I was missing was launchPersistentContext?

Also, I noticed that connectOverCDP takes a CDP websocket url as an option as well. Curious what the reason might be to use the specific remote devtool session's websocket URL like I did in my example vs only passing localhost:9222 as the endpoint?

Thank you!

@mxschmitt
Copy link
Member

mxschmitt commented Jan 19, 2022

Yes!

There is no difference, because we would do internally the same:

async function urlToWSEndpoint(progress: Progress, endpointURL: string) {
if (endpointURL.startsWith('ws'))
return endpointURL;
progress.log(`<ws preparing> retrieving websocket url from ${endpointURL}`);
const httpURL = endpointURL.endsWith('/') ? `${endpointURL}json/version/` : `${endpointURL}/json/version/`;
const request = endpointURL.startsWith('https') ? https : http;
const json = await new Promise<string>((resolve, reject) => {
request.get(httpURL, resp => {
if (resp.statusCode! < 200 || resp.statusCode! >= 400) {
reject(new Error(`Unexpected status ${resp.statusCode} when connecting to ${httpURL}.\n` +
`This does not look like a DevTools server, try connecting via ws://.`));
}
let data = '';
resp.on('data', chunk => data += chunk);
resp.on('end', () => resolve(data));
}).on('error', reject);
});
return JSON.parse(json).webSocketDebuggerUrl;
}

@mxschmitt
Copy link
Member

Closing since this issue seems answered. Please file a new issue for further questions.

@Ayan-Bandyopadhyay
Copy link

@mxschmitt This seems to still be an issue in the python library. launch_persistent_context automatically creates a page, but it's not accessible through the CDP connection, which shows 0 pages.

Here is the script that shows the error

import os
import requests
from playwright.async_api import async_playwright
import asyncio


async def main():
    async with async_playwright() as pw:
        context_path = os.path.dirname(os.path.realpath(__file__)) + "/chrome_data"
        if os.path.exists(context_path):
            os.system(f"rm -rf {context_path}")
        context = await pw.chromium.launch_persistent_context(
            user_data_dir=context_path,
            args=["--remote-debugging-port=9222"],
            headless=False,
        )

        # Prints out 1
        print("Number of pages in context:", len(context.pages))
        info = requests.get("http://localhost:9222/json/list").json()
        ws_url = info[0]["webSocketDebuggerUrl"]

        browser2 = await pw.chromium.connect_over_cdp(ws_url)

        # Prints out 1
        print("Number of contexts: ", browser2.contexts)

        # Prints out 0
        print("Number of pages: ", len(browser2.contexts[0].pages))

        # This line throws an error
        page2 = await browser2.contexts[0].new_page()


asyncio.run(main())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants