Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] puppeteer-extra-plugin-stealth cannot bypass meet.google.com protection with Chrome 122 #898

Open
MaximKalinin opened this issue Jun 20, 2024 · 24 comments
Labels
issue: bug report A bug has been reported needs triage

Comments

@MaximKalinin
Copy link

MaximKalinin commented Jun 20, 2024

Describe the bug

puppeteer-extra-plugin-stealth package works on https://meet.google.com website with puppeteer version 22.1.0 (Chrome for testing version 121.0.6167.85). However, when using puppeteer 22.2.0 (Chrome for testing version 122.0.6261.57), the website detects automation tool usage and blocks the access. Looks like either some old tricks stopped working or some new ways of detection were added.

Code Snippet
To reproduce the issue, save the following code into test.js file and run node test.js, while having the following dependencies in package.json:

    "puppeteer": "22.2.0",
    "puppeteer-extra": "3.3.6",
    "puppeteer-extra-plugin-stealth": "2.11.2",
(async () => {
  const puppeteer = require('puppeteer-extra');
  const StealthPlugin = require('puppeteer-extra-plugin-stealth');
  puppeteer.use(StealthPlugin());

  const browser = await puppeteer.launch({
    headless: false,
  });

  const page = await browser.newPage({
    storageState: { cookies: [], origins: [] },
    permissions: ['microphone', 'camera'],
  });
  await page.goto('https://meet.google.com/aaa-aaaa-aaa');
})();

To verify that it works with older versions of Chrome, just replace puppeteer version in package.json:

-   "puppeteer": "22.2.0",
+   "puppeteer": "22.1.0",
    "puppeteer-extra": "3.3.6",
    "puppeteer-extra-plugin-stealth": "2.11.2",

Versions

  System:
    OS: macOS 14.5
    CPU: (8) arm64 Apple M1 Pro
    Memory: 94.69 MB / 16.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.10.0 - ~/.nvm/versions/node/v20.10.0/bin/node
    npm: 10.2.3 - ~/.nvm/versions/node/v20.10.0/bin/npm
  npmPackages:
    playwright-extra: 4.3.6 => 4.3.6 
    puppeteer: 22.2.0 => 22.2.0 
    puppeteer-extra: 3.3.6 => 3.3.6 
    puppeteer-extra-plugin-stealth: 2.11.2 => 2.11.2 

@MaximKalinin MaximKalinin added issue: bug report A bug has been reported needs triage labels Jun 20, 2024
@Slyracoon23
Copy link

This is a issue I am experiencing. Are there any workarounds?

@Mrkk1
Copy link

Mrkk1 commented Jan 7, 2025

I am also very concerned about how this issue is progressing, and I still cannot successfully enter googlemeet.

@sonhm3029
Copy link

I am also facing same issue. Anyone solved it yet ?

@vladtreny
Copy link

vladtreny commented Jan 10, 2025

They detect iframe is forged + cds changed.

Instead

  const puppeteer = require('puppeteer-extra');
  const StealthPlugin = require('puppeteer-extra-plugin-stealth');
  puppeteer.use(StealthPlugin());

Use

    const pp = StealthPlugin()
    pp.enabledEvasions.delete('iframe.contentWindow')
    pp.enabledEvasions.delete('media.codecs')
    puppeteer.use(pp)
import puppeteer from 'puppeteer-extra'

import StealthPlugin from 'puppeteer-extra-plugin-stealth'

(async () => {
    const pp = StealthPlugin()
    pp.enabledEvasions.delete('iframe.contentWindow')
    pp.enabledEvasions.delete('media.codecs')
    puppeteer.use(pp)
    const browser = await puppeteer.launch({
        headless: false,
    })
    const page = await browser.newPage({
        storageState: {cookies: [], origins: []},
        permissions: ['microphone', 'camera'],
    })
    await page.goto('https://meet.google.com/aaa-aaaa-aaa')
})()

@Mrkk1
Copy link

Mrkk1 commented Jan 10, 2025

That's great. Thanks for the solution.

@sonhm3029
Copy link

@vladtreny thanks, let me try

@sonhm3029
Copy link

@vladtreny I 've checked what you suggest and it still not work in headless mode

@Mrkk1
Copy link

Mrkk1 commented Jan 14, 2025

@vladtreny I 've checked what you suggest and it still not work in headless mode

I have succeeded in using this method. You must have made a mistake. Did you use "page.setUserAgent"? If so, please do not setUserAgent

@sonhm3029
Copy link

@Mrkk1 Oh thanks very much, it work now. You totally saved me :V

@Mrkk1
Copy link

Mrkk1 commented Jan 14, 2025

@Mrkk1 Oh thanks very much, it work now. You totally saved me :V

Hahahaha, we are all helping each other

@MaximKalinin
Copy link
Author

I have found a workaround in running chrome with --headless=new flag. This made it work without any plugins.

@sonhm3029
Copy link

I have checked again and now i dont use anything else just:

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());

And run in headless mode, everything working ok. So weird because i remeber trying that before but not work

"puppeteer": "^22.1.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2"

@Mrkk1
Copy link

Mrkk1 commented Jan 14, 2025

StealthPlugin

That's what I did before. I couldn't get in. I think Google did something yesterday and stopped testing.

@Mrkk1
Copy link

Mrkk1 commented Jan 14, 2025

puppeteer.use(StealthPlugin());

In order to stabilize the operation, I feel that I will no longer change the code, but will always adopt the solution of deleting cds before.

@Tushar-Kapil
Copy link

import puppeteer from "puppeteer-extra";
import { PrismaClient } from "@prisma/client";
import Stealth from "puppeteer-extra-plugin-stealth";

const prisma = new PrismaClient();

const pp = Stealth();

pp.enabledEvasions.delete('iframe.contentWindow');
pp.enabledEvasions.delete('media.codecs')

puppeteer.use(pp);

export const scrapeExtraInfoBySymbol = async () => {
const DOMAIN = "https://www.nseindia.com";
const INFO_PAGE = ${DOMAIN}/get-quotes/equity?symbol=;

try {
const stocks = await prisma.nifty_fifty.findMany();

const browser = await puppeteer.launch({
  headless: false, 
  args: [
    '--disable-features=SameSiteByDefaultCookies,CookiesWithoutSameSiteMustBeSecure', // Disable SameSite cookie restrictions
    '--disable-web-security', // Disable web security to bypass restrictions
    '--disable-blink-features=AutomationControlled', // Prevent detection of automation features
    '--no-sandbox', // Prevent sandboxing errors
    '--disable-setuid-sandbox'
  ],
});

const page = await browser.newPage();
await page.setBypassCSP(true); // Ensure CSP is bypassed

// Mimicking headers from the real browser request
await page.setExtraHTTPHeaders({
  'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'accept-language': 'en-US,en;q=0.9',
  'referer': DOMAIN,
  'origin': DOMAIN,
});

const delay = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

for (const stock of stocks) {
  if (stock.symbol.includes("&")) {
    stock.symbol = stock.symbol.replace("&", "%26"); // Replace "&" with "%26" to avoid errors
  }
  const stockPage = `${INFO_PAGE}${stock.symbol}`;
  console.log(`Navigating to ${stockPage}`);

  try {
    await page.goto(stockPage, { waitUntil: 'networkidle2', timeout: 20000 });

    console.log(`Scraping extra information for ${stock.symbol}`);

    // Scraping logic to extract Adjusted P/E and Basic Industry
    const extraInfo = await page.evaluate(() => {
      const getValue = (selector) => {
        const element = document.querySelector(selector);
        return element?.textContent?.trim() || null;
      };

      return {
        adjustedPE: getValue('#SectoralIndxPE + td'), // Adjusted P/E value
        basicIndustry: getValue('#BasicIndustry + td'), // Basic Industry value
      };
    });

    console.log(`Extra Info for ${stock.symbol}:`, extraInfo);

    // Update database with extracted info (uncomment to enable database updates)
    // if (extraInfo.adjustedPE || extraInfo.basicIndustry) {
    //   await prisma.nifty_fifty.update({
    //     where: { symbol: stock.symbol },
    //     data: {
    //       adjusted_pe: extraInfo.adjustedPE ? parseFloat(extraInfo.adjustedPE) : null,
    //       basic_industry: extraInfo.basicIndustry,
    //     },
    //   });
    // }

  } catch (err) {
    console.error(`Error scraping ${stock.symbol}:`, err.message);
  }
}

await browser.close();

} catch (err) {
console.error('Error in scrapeExtraInfoBySymbol:', err.message);
}
};

// Utility function to generate a random number between a given range
function getRandomNumber(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}

export const startScraping = (time) => {
scrapeNse();
setInterval(scrapeNse, time);
};

hey can you guys look at the code i have tried everything but still data is not loading on the website am i doing something wrong

@vladtreny
Copy link

Try to remove argument
--disable-web-security

@Tushar-Kapil
Copy link

Still not working sometimes it just randomly works like 1 in 10-15 times

Try to remove argument --disable-web-security

@vladtreny
Copy link

Remove all args in args

const browser = await puppeteer.launch({
  headless: false, 
  args: [ ],
});

@Tushar-Kapil
Copy link

tried it still not working 🙁

@vladtreny
Copy link

Probably your IP is detected or rate limited.
Make a screenshot of the error

@Tushar-Kapil
Copy link

Image

These are the errors i am getting in the console for nse in puppetteer browser using headless false mode

@vladtreny
Copy link

vladtreny commented Jan 19, 2025

add this line after the page created

await page.evaluateOnNewDocument(() => delete Function.prototype.toString)

    const page = await browser.newPage()
    await page.evaluateOnNewDocument(() => delete Function.prototype.toString)
    await page.setBypassCSP(true)  

@Tushar-Kapil
Copy link

add this line after the page created

await page.evaluateOnNewDocument(() => delete Function.prototype.toString)

    const page = await browser.newPage()
    await page.evaluateOnNewDocument(() => delete Function.prototype.toString)
    await page.setBypassCSP(true) // Ensure CSP is bypassed

Wow ! The only solution that worked, thanks for the help

@Zei33
Copy link

Zei33 commented Jan 19, 2025

So I have just spent the last few days solving my evasion issues. It's been a serious challenge.

I have been using the following websites to test what issues are popping up.

After finally, successfully, integrating rebrowser patch to deal with CDP evasion and integrate it with puppeteer-extra and stealth, I finally passed all of the tests.

But still, when I attempted to access Google search, I'd immediately get the dreaded recaptcha. The only sign that something was off was that the IP Address field that shows up below the recaptcha had a bunch of jumbled characters appended to it. (IP is changed for privacy of course).

IP address: 124.252.21.138 ≠ }���

I also got the same sort of thing when I connected to one of our proxies for an office in another city, but seems like the symbols change depending on your IP, but remain consistent to each IP address.

IP address: 57.93.50.73 ≠ :`3M

I knew for absolutely sure that it's puppeteer-extra-plugin-stealth causing the issue, because when I disable it, the problem goes away and I bypass the captcha (although it's not a solution because I need stealth for the features it provides).

I have literally no idea how @vladtreny figured this out, but I can confirm without a doubt that the following code solves the issue:

import { type Browser, type Page } from "puppeteer";
const puppeteer = require("puppeteer");
const { addExtra } = require("puppeteer-extra");
const puppeteerExtra = addExtra(puppeteer);

const StealthPlugin = require("puppeteer-extra-plugin-stealth");
const stealth = StealthPlugin();
stealth.enabledEvasions.delete('iframe.contentWindow');
stealth.enabledEvasions.delete('media.codecs');
puppeteerExtra.use(stealth);

I can also confirm that you must turn off both the iframe.contentWindow evasion and the media.codecs evasion. You can't just turn off one or the other, it doesn't work.

Another confirmation (at least for me) is that headless mode is not required for evasion. However I have to admit, I've done a significant amount of customisation to make my browser look legitimate, so there's a chance that this won't apply to everyone by default.

I just want to write off other solutions to this problem for now:

await page.evaluateOnNewDocument(() => delete Function.prototype.toString)
await page.setBypassCSP(true) // Ensure CSP is bypassed

Doesn't seem to work. In fact, bypassing CSP is suspicious, and I've specifically set it to false. Though it might help some people. Somehow I highly doubt that deleting toString is a good idea, considering several of Google's detections use that function to check certain fields. 🤷 But hey, might help someone since not everyone is targeting Google.

So thank you very much vlad, for the excellent and auspiciously timely solution.

Side note: Anyone working on a similar thing should be aware. puppeteer-extra-plugin-stealth is nowhere near sufficient enough to evade Google's detection. It took an incredible amount of effort to refine my solution and I'm sure it will only be temporary. If you need to work with a serious website like Google Search, you're going to need to bring your best and write a lot of custom solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: bug report A bug has been reported needs triage
Projects
None yet
Development

No branches or pull requests

7 participants