[Bug] BOT detected when headless mode #614

adiwirak · 2022-01-25T11:50:25Z

Describe the bug
if I set headless mode, open the web that I scrap detects as BOT.
I know that this is because of WAF. so that it is taken to /_Incapsula_Resource? blah..bla...bla

Any ideas to bypass this problem?

Because if I force it to set headless=false, then the problem I face is that my OS is Linux, which doesn't have a GUI.

Versions

"dependencies": { "cheerio": "*", "express": "^4.17.1", "moment": "^2.29.1", "mongodb": "^4.2.0", "mysql": "^2.18.1", "puppeteer-extra": "^3.2.3", "puppeteer-extra-plugin-adblocker": "^2.12.0", "puppeteer-extra-plugin-stealth": "^2.9.0", "request-promise": "^4.2.6", "shelljs": "^0.8.4", "socket.io": "^4.4.0", "socket.io-client": "^4.4.0", "sprintf-js": "^1.1.2", "telegraf": "^4.5.2", "util": "^0.12.4" }

The text was updated successfully, but these errors were encountered:

sk91 · 2022-01-27T13:35:23Z

In my case, I needed the browser to run in a headfull mode in a docker container + vnc.
Maybe my solution will help you find a workaround for you.
I solved it by using fluxbox (http://fluxbox.org/)

# worker-base image
FROM node:14.18-slim

## install base deps
RUN apt-get update \
  && apt-get install -yq --no-install-recommends \
  gnupg  \
  curl \
  gconf-service \
  libasound2 \
  libatk1.0-0 \
  libc6 \
  libcairo2 \
  libcups2 \
  libdbus-1-3 \
  libexpat1 \
  libfontconfig1 \
  libgcc1 \
  libgconf-2-4 \
  libgdk-pixbuf2.0-0 \
  libglib2.0-0 \
  libgtk-3-0 \
  libnspr4 \
  libpango-1.0-0 \
  libpangocairo-1.0-0 \
  libstdc++6 \
  libx11-6 \
  libx11-xcb1 \
  libxcb1 \
  libxcomposite1 \
  libxcursor1 \
  libxdamage1 \
  libxext6 \
  libxfixes3 \
  libxi6 \
  libxrandr2 \
  libxrender1 \
  libxss1 \
  libxtst6 \
  ca-certificates \
  fonts-liberation \
  libappindicator1 \
  libnss3 \
  lsb-release \
  xdg-utils \
  wget \
  x11vnc \
  x11-xkb-utils \
  xfonts-100dpi \
  xfonts-75dpi \
  xfonts-scalable \
  xfonts-cyrillic \
  x11-apps xvfb \
  fonts-ipafont-gothic \
  fonts-wqy-zenhei \
  fonts-thai-tlwg \
  fonts-kacst \
  ttf-freefont \
  fluxbox \
  procps \
  x11-utils \
  eterm \
  xterm \
  netcat

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
  && apt-get update \
  && apt-get install -y --no-install-recommends \
  google-chrome-stable \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /src/*.deb

ENV DISPLAY=:99
ENV X11VNC_PASSWORD=password
ENV XVFB_SCREEN_SIZE=1024x768x24
WORKDIR /usr/src/app
COPY ./scripts/worker-entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

entrypoint.sh

#! /bin/sh
export DISPLAY=${DISPLAY:-:0} # Select screen 0 by default.
export XVFB_SCREEN_SIZE=${XVFB_SCREEN_SIZE:-1024x768x24}
export X11VNC_PASSWORD=${X11VNC_PASSWORD:-password}
xdpyinfo
rm -rf /tmp/.X99-lock
rm -rf .X11-unix
sleep 1
! pgrep -a Xvfb && Xvfb $DISPLAY -screen 0 ${XVFB_SCREEN_SIZE} -ac &
sleep 1
if which x11vnc &>/dev/null; then
  # ! pgrep -a x11vnc && x11vnc -bg -forever -passwd ${X11VNC_PASSWORD} -ncache 10 -ncache_cr -quiet -display WAIT$DISPLAY &
  ! pgrep -a x11vnc && x11vnc -bg --shared -forever -passwd ${X11VNC_PASSWORD} -quiet -display WAIT$DISPLAY &
fi
if which fluxbox &>/dev/null; then
  ! pgrep -a fluxbox && fluxbox 2>/dev/null &
fi
echo "IP: $(hostname -I) ($(hostname))"

exec "$@"

StackedQueries · 2022-01-27T14:31:14Z

Using just Xvfb solved most of these problems for me. There is an older xvfb for node that one could use to manage the screens. Essentially run xvfb via the package (or a homemade script) and push to that display via the launch args in puppeteer. IIRC it would be something like --display=:${displayId}. It's worth mentioning that working with sessions/multiple screens/etc will require you to do some manipulation of the lock files like mentioned in @sk91's entrypoint.sh

rm -rf /tmp/.X99-lock
rm -rf .X11-unix

adiwirak · 2022-01-28T03:47:49Z

rm -rf

I've tried using xvfb. But always error, when xvfb.startSync()

    const Xvfb = require('xvfb');
    const xvfb = new Xvfb({
        silent: true,
        xvfb_args: ["-screen", "0", '1280x720x24', "-ac"],
    });
    xvfb.startSync()
    this.config.args.push( '--display='+xvfb._display)
    this.VirtLayar = xvfb
  }

I don't understand, is this because my PC doesn't support it, or I'm wrong in the installation?
Can u explain step by step, how to install & use it?

StackedQueries · 2022-01-28T14:23:26Z

Can you provide the error you are getting? silent: false should give you some information regarding xvfb errors as well. This is essentially the logic I would follow.

const display = new Xvfb({
  displayNum: 1,
  reuse: false,
  silent: true,
  xvfb_args: ['-screen', '0', '1280x720x24', '-ac', '-noreset']
})

display.startSync()

await puppeteer.launch( {args: [
  `--display=${display.id}`
]})

adiwirak · 2022-01-30T08:53:30Z

Can you provide the error you are getting? silent: false should give you some information regarding xvfb errors as well. This is essentially the logic I would follow.
const display = new Xvfb({
  displayNum: 1,
  reuse: false,
  silent: true,
  xvfb_args: ['-screen', '0', '1280x720x24', '-ac', '-noreset']
})

display.startSync()

await puppeteer.launch( {args: [
  `--display=${display.id}`
]})

Thank you very much. Finally, my problem was solved with Xvfb module.

But I'm still curious about the initial parameters in the module.

Like the following example:

    const xvfb = new Xvfb({
        silent: true, reuse: true,
        xvfb_args: ["-screen", "0", '1280x720x24', "-ac"],
    });
    xvfb.startSync()

if I don't set reuse: true, then I get an error. Can you explain why this happened?

Indeed, this code (module) is executed with different parameters. So I thought, there was a crash in using the screen. Is my guess correct?

So, in your opinion, should each module create its own virtual monitor? or all leads to one virtual monitor?

Because I don't really understand, how to create a virtual monitor? how many can be created? etc.
Please explain. I really appreciate your explanation

StackedQueries · 2022-01-31T16:24:41Z

Sure :) The reuse option is really just dependent on the use case. From the docs, reuse - whether to reuse an existing Xvfb instance if it already exists on the X display referenced by displayNum. If I understand this correctly, it's just saying that you can reinitialize the display w/ diff params after the fact. I just continuously use the same display. It's important to understand the difference between displays and screens as well. Displays are what is referenced when starting puppeteer (i.e. :0), and screens are contained in the displays. Screens shouldn't really matter in your use case though. Multiple puppeteer instances can be used on the same display. I would recommend checking out the man page for xvfb as well.

soshimee · 2022-02-11T14:06:40Z

I have the exact same issue... except I'm on Windows. The protection service on the website I'm trying to scrape is "StackPath." It passes after a few seconds without headless mode, but it gets blocked instantly with headless mode.

Postur · 2022-02-21T19:47:53Z

for me google detects i'm headless.
refuses to log me in because 'browser may not be secure' or whatever.

does anyone have a fix for this?

I don't want to add a display to my environment, I need headless.

adiwirak added issue: bug report A bug has been reported needs triage labels Jan 25, 2022

vogler mentioned this issue Feb 2, 2022

headless mode fails at hcaptcha challenge vogler/free-games-claimer#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] BOT detected when headless mode #614

[Bug] BOT detected when headless mode #614

adiwirak commented Jan 25, 2022

sk91 commented Jan 27, 2022

StackedQueries commented Jan 27, 2022

adiwirak commented Jan 28, 2022 •

edited

Loading

StackedQueries commented Jan 28, 2022

adiwirak commented Jan 30, 2022

StackedQueries commented Jan 31, 2022 •

edited

Loading

soshimee commented Feb 11, 2022

Postur commented Feb 21, 2022

[Bug] BOT detected when headless mode #614

[Bug] BOT detected when headless mode #614

Comments

adiwirak commented Jan 25, 2022

sk91 commented Jan 27, 2022

StackedQueries commented Jan 27, 2022

adiwirak commented Jan 28, 2022 • edited Loading

StackedQueries commented Jan 28, 2022

adiwirak commented Jan 30, 2022

StackedQueries commented Jan 31, 2022 • edited Loading

soshimee commented Feb 11, 2022

Postur commented Feb 21, 2022

adiwirak commented Jan 28, 2022 •

edited

Loading

StackedQueries commented Jan 31, 2022 •

edited

Loading