Skip to content
This repository has been archived by the owner on May 9, 2020. It is now read-only.

fails to work with tor/socks proxy #233

Closed
izidan opened this issue Jul 9, 2019 · 20 comments
Closed

fails to work with tor/socks proxy #233

izidan opened this issue Jul 9, 2019 · 20 comments

Comments

@izidan
Copy link

izidan commented Jul 9, 2019

Would you please provide sample on how it should work using tor/socks proxy.

tried both of the following settings using .defaults() and still no luck

{ proxy: "socks://127.0.0.1:9050" }

{ agentClass: SocksProxyAgent, agentOptions: { protocol: "socks:", host: "localhost", port: 9050 } }

@ghost
Copy link

ghost commented Jul 9, 2019

Hi @izidan,

If it works with request or request-promise then it'll work with Cloudscraper with very few exceptions.
This isn't one of those exceptions, those options will be passed to the underlying library as you'd expect.

Replace your calls to cloudscraper with request-promise to make sure your code works as intended with request first. There is documentation: https://github.com/request/request#requestoptions-callback

const cloudscraper = require('request-promise') // <-- Does it work in this case?

If the fault lies with request, the issue should be opened there.

@izidan
Copy link
Author

izidan commented Jul 9, 2019

it throws the following error when trying the following default settings even though it request module works by itself when calling directly using the same settings

{ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' }}

CaptchaError: captcha
at validate (.\node_modules\cloudscraper\index.js:236:11)
at onCloudflareResponse (.\node_modules\cloudscraper\index.js:197:5)
at onRequestResponse (.\node_modules\cloudscraper\index.js:176:5)
at Request. (.\node_modules\cloudscraper\index.js:137:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request. (.\node_modules\request\request.js:1161:10)
at Request.emit (events.js:198:13)
at IncomingMessage. (.\node_modules\request\request.js:1083:12)
at Object.onceWrapper (events.js:286:20)
at IncomingMessage.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1129:12)
at process._tickCallback (internal/process/next_tick.js:63:19)

@izidan
Copy link
Author

izidan commented Jul 9, 2019

same error as above when setup defaults with the following settings

{ agent: (new require('socks-proxy-agent'))('socks://127.0.0.1:9050') }

@izidan
Copy link
Author

izidan commented Jul 9, 2019

and when called with default settings being { proxy: 'socks://127.0.0.1:9050' } it throws the following error

RequestError: Error: tunneling socket could not be established, statusCode=501
at onRequestResponse (.\node_modules\cloudscraper\index.js:153:21)
at Request. (.\node_modules\cloudscraper\index.js:132:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request.onRequestError (.\node_modules\request\request.js:881:8)
at ClientRequest.emit (events.js:198:13)
at ClientRequest.onConnect (.\node_modules\tunnel-agent\index.js:168:23)
at Object.onceWrapper (events.js:286:20)
at ClientRequest.emit (events.js:198:13)
at Socket.socketOnData (_http_client.js:475:11)
at Socket.emit (events.js:198:13)
at addChunk (_stream_readable.js:288:12)
at readableAddChunk (_stream_readable.js:269:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:94:17) +0ms

@ghost
Copy link

ghost commented Jul 9, 2019

request doesn't throw exceptions when you get a CAPTCHA, Cloudscraper does
It's working with both libraries, or you wouldn't get the CAPTCHA error.

You may log the response that you're getting:

const cloudscraper = require('cloudscraper')
cloudscraper.get(url).then(console.log, error => console.error(error.response.body.toString('utf-8')))

and when called with default settings being { proxy: 'socks://127.0.0.1:9050' } it throws the following error

I'm guessing that those settings aren't working with request either.
I'm going to close this issue now since it is working when it works with the request library.

@ghost ghost closed this as completed Jul 9, 2019
@izidan
Copy link
Author

izidan commented Jul 9, 2019

using const cloudscraper = require('request-promise') with both of the agent or agentClass + agentOptions options works fine as the html is returned with the captcha check, although it fails to go past the security page

@ghost
Copy link

ghost commented Jul 9, 2019

Yeh, unfortunately Cloudscraper can't solve those for you but we do have some convenience methods to help solve them using a third party service. It's mentioned in the README and there are examples.

I hope that helps

@izidan
Copy link
Author

izidan commented Jul 9, 2019

I am really confused with your reply above, Cloudscraper already works to resolve the captcha and pass using normal http proxy yet it fails with socks proxy, and I believe this has nothing to do with request or request-promise

simply this works

node -e "require('cloudscraper').defaults({ proxy: 'http://54.37.21.29:3128' }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

whilst the same page over socks proxy fails

node -e "require('cloudscraper').defaults({ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' } }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

I shall leave it to you to figure out why and please reopen the issue as it is a bug in the captcha handler itself.

@ghost
Copy link

ghost commented Jul 9, 2019

I am really confused with your reply above, Cloudscraper already works to resolve the captcha and pass using normal http proxy yet it fails with socks proxy, and I believe this has nothing to do with request or request-promise

I apologize if my response wasn't clear but Cloudscraper has never handled CAPTCHA, has never solved CAPTCHA ever. If you have a CAPTCHA handler, please share the code so I'll know what we're dealing with here.

whilst the same page over socks proxy fails

You're connecting to the target server using two different IP's. It's most probable that the target has blocked Tor users. I don't find it odd in the least that the web proxy gets through without a CAPTCHA but Tor gets flagged.

I shall leave it to you to figure out why and please reopen the issue as it is a bug in the captcha handler itself.

Please identify a problem with the library as I have not. I'll reopen this for a while.

@ghost ghost reopened this Jul 9, 2019
@ghost ghost added the needs: more info label Jul 9, 2019
@izidan
Copy link
Author

izidan commented Jul 9, 2019

@pro-src you are right, it seems to be an issue with cloudflare blocking the tor proxy itself even though it works via the tor browser itself, I shall close the issue and do further investigation. Thanks for your time.

@izidan izidan closed this as completed Jul 9, 2019
@ghost
Copy link

ghost commented Jul 9, 2019

If it works in the Tor Browser then I'd definitely like to have it working in Cloudscraper. The only Tor related conversation besides this issue is #202.

I'd be glad to know what your investigation yields.

@ghost
Copy link

ghost commented Jul 9, 2019

To be clear, we want to imitate the browser to avoid the CAPTCHA being sent in the first place. This might be TLS/SSL related and could be pretty complicated... If you want, open another issue: "CAPTCHA when using Tor", the issue template would be great for this.

The Tor exit node is more than likely changing which could give you different results. It'd help if you could configure your instance of Tor to only use one exit node, the same exit node as Tor Browser is currently using...

Cheers

@brunogaspar
Copy link

@pro-src It's also failing on my end with regular proxies with or without user & password.

Trying to access your own website, without proxy, works as expected, with proxy, it fails with the same error as in the OP.

Though the interesting part is, if i make my network to use this proxy or even the browser alone, it works fine.

Do you have any pointers where to look? I'm a tad new into CF bypassing, but i can manage, just need a bit of pointers to where to look so i can also help you fixing this :)

Thanks!

@ghost
Copy link

ghost commented Jul 11, 2019

Hi @brunogaspar,

Give this a couple of tries

npm i --save cloudscraper proxy-lists
node issue-233.js > results.txt

issue-233.js
const ProxyLists = require('proxy-lists')
const cloudscraper = require('cloudscraper')
const jar = cloudscraper.jar

// cloudscraper.debug = true

const results = {}
const failing = '\u001B[0;31m\u001B[1m\u001B[5mx\u001B[0m'
const passing = '\u001B[0;32m\u001B[1m\u001B[5m✓\u001B[0m'

let id = -1, attempts = 0, max = 30, timeout = 60000

process.on('uncaughtException', console.error)
process.on('unhandledRejection', console.error)

ProxyLists.getProxies({ protocols: ['http'] })
  .on('data', proxies =>
    proxies.map(o => test(`http://${o.ipAddress}:${o.port}`)))
  .on('error', error => console.error(error.message))

async function test(proxy, url = 'https://pro-src.com') {
  if (attempts++ > max) return
  if (id === -1) id = setTimeout(stop, timeout)

  try {
    results[proxy] = 'timed out'
    const html = await cloudscraper.get({ proxy, url, jar: jar() })

    console.error(proxy, '\t\t', results[proxy] = passing)
    console.error('Result:\n', preview(html))
  } catch(error) {
    console.error(proxy, '\t\t', results[proxy] = failing)
    console.error(error.name + ':', error.message)
    if (error.response) {
      console.error('Result:\n', preview(error.response.body))
    }
  }
}

function preview(html) {
  try {
    return String(html).match(/<title>([\S\s]+)<\/title>/i)[0]
  }
  catch(e) {
    return html ? String(html).slice(0, 77) + '...' : html
  }
}

function stop() {
  for (let url in results) {
    // console.log(url, '\t\t', results[url])
    console.log(results[url] === passing ? 'pass:' : 'fail:', url)
  }

  process.exit(0)
}
results.txt
fail: http://182.23.7.226:8080
fail: http://150.95.151.68:8191
fail: http://2.230.19.211:8080
fail: http://77.50.91.199:8080
fail: http://98.172.142.99:8080
pass: http://201.249.61.60:8080
fail: http://151.248.63.154:8080
fail: http://101.50.1.2:80
fail: http://150.95.151.68:8197
fail: http://49.156.37.52:8080
fail: http://41.188.149.42:8080
fail: http://207.182.135.123:8118
fail: http://91.233.228.102:80
pass: http://36.91.156.75:8080
fail: http://46.209.150.83:8080
pass: http://181.196.245.78:8080
pass: http://49.156.35.22:8080
fail: http://150.95.151.68:8185
fail: http://180.247.5.10:8080
fail: http://122.152.138.139:8118
fail: http://208.110.114.65:8080
fail: http://91.217.42.4:8080
fail: http://134.209.180.95:80
fail: http://108.61.186.207:8080
fail: http://188.32.48.236:8081
fail: http://205.158.57.2:53281
pass: http://84.22.197.42:8080
fail: http://157.230.164.220:8118
pass: http://113.53.83.69:8080
pass: http://31.25.139.223:8080
fail: http://209.80.12.180:80

@ghost
Copy link

ghost commented Jul 11, 2019

@brunogaspar

Based on those results, the problem is most likely the proxies. I had gotten much better results but I felt that output was the most realistic. Random proxies aren't reliable and they're bound to trigger a CAPTCHA at some point.

If despite a couple of runs, all of them fail, you're most likely dealing with the TLS/SSL issue that I mentioned above. However, I seriously doubt that to be the case due to the fact that it's working when you don't use a proxy.

If it's the worst case scenario, have a look at #229

Cheers

@ghost ghost added invalid and removed needs: more info labels Jul 11, 2019
@brunogaspar
Copy link

Will see what i can come up, but it is a tad weird where it works fine browser wise.

One of the python implementations sort of works, of course the more requests you do, the likelihood of getting a captcha will be higher i suppose.

But thanks for the pointers!

@izidan
Copy link
Author

izidan commented Jul 11, 2019

@pro-src it works with anonymous proxies, example https://free-proxy-list.net/anonymous-proxy.html reason being is that once cloudflare marks your ip as suspicious it gets blocked and won't be able to get to the captcha page.

@brunogaspar the reason it works with browsers is because cloudflare uses cookies once the captcha page is passed hence browsers works fine even behind tor. tried it with firefox and opera.

problem still remains regards solving the captcha first to get to the cookies to be able to reuse it later regardless of making the nodejs requests direct, via http proxy or via tor

now when i came across cloudscraper i was under the impression that it takes care of solving the captcha, but when tried it with tor it failed to work, so not sure if the statement on the first page stands true here

This small library encapsulates logic which extracts challenge, solves it, submits and returns the request page body.

here is a simple 1 line code to test it with
node -e "require('cloudscraper').defaults({ agentClass: require('socks-proxy-agent'), agentOptions: { protocol: 'socks:', host: '127.0.0.1', port: '9050' } }).get('https://www.nicholashumphreys.com').then(console.log, console.error)"

it will be really great to get cloudscraper to works with socks proxies by just using proxy: 'socks://120.0.0.1:9050' as well as if you can swap-in node-fetch and make-fetch-happen

The strategy in place now is to cycle the requests through a list of anonymous proxies to workaround hitting the captcha page in first place.

@izidan
Copy link
Author

izidan commented Jul 11, 2019

on a side note, cloudflare will never block an ip for a mobile network operator no matter how many requests is done, as all of the mobile operator clients likely to share few internet gateways so try to test it via mobile hotspot :)

@ghost
Copy link

ghost commented Jul 11, 2019

now when i came across cloudscraper i was under the impression that it takes care of solving the captcha, but when tried it with tor it failed to work, so not sure if the statement on the first page stands true here

There seems to be some general unawareness of Cloudflare challenges, CAPTCHA, the browser, and this project.

All I can say is learn to read.

@ghost
Copy link

ghost commented Jul 12, 2019

Related to #234

@ghost ghost removed the invalid label Aug 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants