fails to work with tor/socks proxy #233

izidan · 2019-07-09T19:06:29Z

Would you please provide sample on how it should work using tor/socks proxy.

tried both of the following settings using .defaults() and still no luck

{ proxy: "socks://127.0.0.1:9050" }

{ agentClass: SocksProxyAgent, agentOptions: { protocol: "socks:", host: "localhost", port: 9050 } }

The text was updated successfully, but these errors were encountered:

ghost · 2019-07-09T21:55:47Z

If it works with request or request-promise then it'll work with Cloudscraper with very few exceptions.
This isn't one of those exceptions, those options will be passed to the underlying library as you'd expect.

Replace your calls to cloudscraper with request-promise to make sure your code works as intended with request first. There is documentation: https://github.com/request/request#requestoptions-callback

const cloudscraper = require('request-promise') // <-- Does it work in this case?

If the fault lies with request, the issue should be opened there.

izidan · 2019-07-09T22:05:17Z

it throws the following error when trying the following default settings even though it request module works by itself when calling directly using the same settings

{ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' }}

CaptchaError: captcha
at validate (.\node_modules\cloudscraper\index.js:236:11)
at onCloudflareResponse (.\node_modules\cloudscraper\index.js:197:5)
at onRequestResponse (.\node_modules\cloudscraper\index.js:176:5)
at Request. (.\node_modules\cloudscraper\index.js:137:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request. (.\node_modules\request\request.js:1161:10)
at Request.emit (events.js:198:13)
at IncomingMessage. (.\node_modules\request\request.js:1083:12)
at Object.onceWrapper (events.js:286:20)
at IncomingMessage.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1129:12)
at process._tickCallback (internal/process/next_tick.js:63:19)

izidan · 2019-07-09T22:07:45Z

same error as above when setup defaults with the following settings

{ agent: (new require('socks-proxy-agent'))('socks://127.0.0.1:9050') }

izidan · 2019-07-09T22:12:31Z

and when called with default settings being { proxy: 'socks://127.0.0.1:9050' } it throws the following error

RequestError: Error: tunneling socket could not be established, statusCode=501
at onRequestResponse (.\node_modules\cloudscraper\index.js:153:21)
at Request. (.\node_modules\cloudscraper\index.js:132:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request.onRequestError (.\node_modules\request\request.js:881:8)
at ClientRequest.emit (events.js:198:13)
at ClientRequest.onConnect (.\node_modules\tunnel-agent\index.js:168:23)
at Object.onceWrapper (events.js:286:20)
at ClientRequest.emit (events.js:198:13)
at Socket.socketOnData (_http_client.js:475:11)
at Socket.emit (events.js:198:13)
at addChunk (_stream_readable.js:288:12)
at readableAddChunk (_stream_readable.js:269:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17) +0ms

ghost · 2019-07-09T22:16:24Z

request doesn't throw exceptions when you get a CAPTCHA, Cloudscraper does
It's working with both libraries, or you wouldn't get the CAPTCHA error.

You may log the response that you're getting:

const cloudscraper = require('cloudscraper')
cloudscraper.get(url).then(console.log, error => console.error(error.response.body.toString('utf-8')))

and when called with default settings being { proxy: 'socks://127.0.0.1:9050' } it throws the following error

I'm guessing that those settings aren't working with request either.
I'm going to close this issue now since it is working when it works with the request library.

izidan · 2019-07-09T22:20:35Z

using const cloudscraper = require('request-promise') with both of the agent or agentClass + agentOptions options works fine as the html is returned with the captcha check, although it fails to go past the security page

ghost · 2019-07-09T22:23:25Z

Yeh, unfortunately Cloudscraper can't solve those for you but we do have some convenience methods to help solve them using a third party service. It's mentioned in the README and there are examples.

I hope that helps

izidan · 2019-07-09T22:32:26Z

I am really confused with your reply above, Cloudscraper already works to resolve the captcha and pass using normal http proxy yet it fails with socks proxy, and I believe this has nothing to do with request or request-promise

simply this works

node -e "require('cloudscraper').defaults({ proxy: 'http://54.37.21.29:3128' }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

whilst the same page over socks proxy fails

node -e "require('cloudscraper').defaults({ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' } }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

I shall leave it to you to figure out why and please reopen the issue as it is a bug in the captcha handler itself.

ghost · 2019-07-09T22:39:42Z

I am really confused with your reply above, Cloudscraper already works to resolve the captcha and pass using normal http proxy yet it fails with socks proxy, and I believe this has nothing to do with request or request-promise

I apologize if my response wasn't clear but Cloudscraper has never handled CAPTCHA, has never solved CAPTCHA ever. If you have a CAPTCHA handler, please share the code so I'll know what we're dealing with here.

whilst the same page over socks proxy fails

You're connecting to the target server using two different IP's. It's most probable that the target has blocked Tor users. I don't find it odd in the least that the web proxy gets through without a CAPTCHA but Tor gets flagged.

I shall leave it to you to figure out why and please reopen the issue as it is a bug in the captcha handler itself.

Please identify a problem with the library as I have not. I'll reopen this for a while.

izidan · 2019-07-09T22:51:33Z

@pro-src you are right, it seems to be an issue with cloudflare blocking the tor proxy itself even though it works via the tor browser itself, I shall close the issue and do further investigation. Thanks for your time.

ghost · 2019-07-09T22:54:03Z

If it works in the Tor Browser then I'd definitely like to have it working in Cloudscraper. The only Tor related conversation besides this issue is #202.

I'd be glad to know what your investigation yields.

ghost · 2019-07-09T22:58:40Z

To be clear, we want to imitate the browser to avoid the CAPTCHA being sent in the first place. This might be TLS/SSL related and could be pretty complicated... If you want, open another issue: "CAPTCHA when using Tor", the issue template would be great for this.

The Tor exit node is more than likely changing which could give you different results. It'd help if you could configure your instance of Tor to only use one exit node, the same exit node as Tor Browser is currently using...

Cheers

brunogaspar · 2019-07-10T23:36:35Z

@pro-src It's also failing on my end with regular proxies with or without user & password.

Trying to access your own website, without proxy, works as expected, with proxy, it fails with the same error as in the OP.

Though the interesting part is, if i make my network to use this proxy or even the browser alone, it works fine.

Do you have any pointers where to look? I'm a tad new into CF bypassing, but i can manage, just need a bit of pointers to where to look so i can also help you fixing this :)

Thanks!

ghost · 2019-07-11T06:03:19Z

Hi @brunogaspar,

Give this a couple of tries

npm i --save cloudscraper proxy-lists
node issue-233.js > results.txt

issue-233.js

const ProxyLists = require('proxy-lists')
const cloudscraper = require('cloudscraper')
const jar = cloudscraper.jar

// cloudscraper.debug = true

const results = {}
const failing = '\u001B[0;31m\u001B[1m\u001B[5mx\u001B[0m'
const passing = '\u001B[0;32m\u001B[1m\u001B[5m✓\u001B[0m'

let id = -1, attempts = 0, max = 30, timeout = 60000

process.on('uncaughtException', console.error)
process.on('unhandledRejection', console.error)

ProxyLists.getProxies({ protocols: ['http'] })
  .on('data', proxies =>
    proxies.map(o => test(`http://${o.ipAddress}:${o.port}`)))
  .on('error', error => console.error(error.message))

async function test(proxy, url = 'https://pro-src.com') {
  if (attempts++ > max) return
  if (id === -1) id = setTimeout(stop, timeout)

  try {
    results[proxy] = 'timed out'
    const html = await cloudscraper.get({ proxy, url, jar: jar() })

    console.error(proxy, '\t\t', results[proxy] = passing)
    console.error('Result:\n', preview(html))
  } catch(error) {
    console.error(proxy, '\t\t', results[proxy] = failing)
    console.error(error.name + ':', error.message)
    if (error.response) {
      console.error('Result:\n', preview(error.response.body))
    }
  }
}

function preview(html) {
  try {
    return String(html).match(/<title>([\S\s]+)<\/title>/i)[0]
  }
  catch(e) {
    return html ? String(html).slice(0, 77) + '...' : html
  }
}

function stop() {
  for (let url in results) {
    // console.log(url, '\t\t', results[url])
    console.log(results[url] === passing ? 'pass:' : 'fail:', url)
  }

  process.exit(0)
}

results.txt

fail: http://182.23.7.226:8080
fail: http://150.95.151.68:8191
fail: http://2.230.19.211:8080
fail: http://77.50.91.199:8080
fail: http://98.172.142.99:8080
pass: http://201.249.61.60:8080
fail: http://151.248.63.154:8080
fail: http://101.50.1.2:80
fail: http://150.95.151.68:8197
fail: http://49.156.37.52:8080
fail: http://41.188.149.42:8080
fail: http://207.182.135.123:8118
fail: http://91.233.228.102:80
pass: http://36.91.156.75:8080
fail: http://46.209.150.83:8080
pass: http://181.196.245.78:8080
pass: http://49.156.35.22:8080
fail: http://150.95.151.68:8185
fail: http://180.247.5.10:8080
fail: http://122.152.138.139:8118
fail: http://208.110.114.65:8080
fail: http://91.217.42.4:8080
fail: http://134.209.180.95:80
fail: http://108.61.186.207:8080
fail: http://188.32.48.236:8081
fail: http://205.158.57.2:53281
pass: http://84.22.197.42:8080
fail: http://157.230.164.220:8118
pass: http://113.53.83.69:8080
pass: http://31.25.139.223:8080
fail: http://209.80.12.180:80

ghost · 2019-07-11T06:23:39Z

@brunogaspar

Based on those results, the problem is most likely the proxies. I had gotten much better results but I felt that output was the most realistic. Random proxies aren't reliable and they're bound to trigger a CAPTCHA at some point.

If despite a couple of runs, all of them fail, you're most likely dealing with the TLS/SSL issue that I mentioned above. However, I seriously doubt that to be the case due to the fact that it's working when you don't use a proxy.

If it's the worst case scenario, have a look at #229

Cheers

brunogaspar · 2019-07-11T09:30:22Z

Will see what i can come up, but it is a tad weird where it works fine browser wise.

One of the python implementations sort of works, of course the more requests you do, the likelihood of getting a captcha will be higher i suppose.

But thanks for the pointers!

izidan · 2019-07-11T09:56:04Z

@pro-src it works with anonymous proxies, example https://free-proxy-list.net/anonymous-proxy.html reason being is that once cloudflare marks your ip as suspicious it gets blocked and won't be able to get to the captcha page.

@brunogaspar the reason it works with browsers is because cloudflare uses cookies once the captcha page is passed hence browsers works fine even behind tor. tried it with firefox and opera.

problem still remains regards solving the captcha first to get to the cookies to be able to reuse it later regardless of making the nodejs requests direct, via http proxy or via tor

now when i came across cloudscraper i was under the impression that it takes care of solving the captcha, but when tried it with tor it failed to work, so not sure if the statement on the first page stands true here

This small library encapsulates logic which extracts challenge, solves it, submits and returns the request page body.

here is a simple 1 line code to test it with
node -e "require('cloudscraper').defaults({ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' } }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

it will be really great to get cloudscraper to works with socks proxies by just using proxy: 'socks://120.0.0.1:9050' as well as if you can swap-in node-fetch and make-fetch-happen

The strategy in place now is to cycle the requests through a list of anonymous proxies to workaround hitting the captcha page in first place.

izidan · 2019-07-11T10:02:10Z

on a side note, cloudflare will never block an ip for a mobile network operator no matter how many requests is done, as all of the mobile operator clients likely to share few internet gateways so try to test it via mobile hotspot :)

ghost · 2019-07-11T19:02:55Z

now when i came across cloudscraper i was under the impression that it takes care of solving the captcha, but when tried it with tor it failed to work, so not sure if the statement on the first page stands true here

There seems to be some general unawareness of Cloudflare challenges, CAPTCHA, the browser, and this project.

All I can say is learn to read.

ghost · 2019-07-12T02:36:00Z

Related to #234

ghost closed this as completed Jul 9, 2019

ghost reopened this Jul 9, 2019

ghost added the needs: more info label Jul 9, 2019

izidan closed this as completed Jul 9, 2019

ghost added invalid and removed needs: more info labels Jul 11, 2019

This was referenced Jul 11, 2019

Error: tunneling socket could not be established #236

Closed

Always Captcha (with proxies) #234

Closed

Proxies always throw errors #237

Closed

ghost removed the invalid label Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fails to work with tor/socks proxy #233

fails to work with tor/socks proxy #233

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

izidan commented Jul 9, 2019 •

edited

Loading

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019 •

edited

Loading

ghost commented Jul 9, 2019

ghost commented Jul 9, 2019

brunogaspar commented Jul 10, 2019

ghost commented Jul 11, 2019

ghost commented Jul 11, 2019

brunogaspar commented Jul 11, 2019

izidan commented Jul 11, 2019

izidan commented Jul 11, 2019 •

edited

Loading

ghost commented Jul 11, 2019 •

edited by ghost

Loading

ghost commented Jul 12, 2019

fails to work with tor/socks proxy #233

fails to work with tor/socks proxy #233

Comments

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

izidan commented Jul 9, 2019 • edited Loading

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019

ghost commented Jul 9, 2019

izidan commented Jul 9, 2019 • edited Loading

ghost commented Jul 9, 2019

ghost commented Jul 9, 2019

brunogaspar commented Jul 10, 2019

ghost commented Jul 11, 2019

ghost commented Jul 11, 2019

brunogaspar commented Jul 11, 2019

izidan commented Jul 11, 2019

izidan commented Jul 11, 2019 • edited Loading

ghost commented Jul 11, 2019 • edited by ghost Loading

ghost commented Jul 12, 2019

izidan commented Jul 9, 2019 •

edited

Loading

izidan commented Jul 9, 2019 •

edited

Loading

izidan commented Jul 11, 2019 •

edited

Loading

ghost commented Jul 11, 2019 •

edited by ghost

Loading