-
Notifications
You must be signed in to change notification settings - Fork 141
Post request fails after initial cloudflare bypass. #259
Comments
Hi @luizkc, I'm sure we can figure this out. Would you mind sharing the debug output? Redirect the stdout and stderr to a file or xclip and pastebin it: node index.js > out.txt 2>&1
node index.js |& xclip -i -sel clipboard
Which python module? The same name python module is of plagiarism, license issues, has known vulnerabilities, spits in the face of FOSS, and is not to be trusted. It's a rip-off of the original cfscrape. Use cfscrape instead, it's maintained. The challenge solving code in all of these libraries was written by me (including the plagiarized one) and they all generally work the same way if you're using equivalent options. The only exception being the redirect behavior. The python modules handle redirects in a non-standard way by always reusing the original request method instead of switching over to the A very similar and recently solved issue: #255
When the If you're trying to post JSON, try the const gen = await cloudscraper.post({
url: "https://nakedcph.com/auth/submit",
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
simple: false,
headers: headers,
json: {
_AntiCsrfToken: Csrf,
firstName: "CLOUDSCRAPER TEST",
email: "[email protected]",
password: "MyPassword123",
"g-recaptcha-response": String(captchaRes),
action: "register"
}
}) Cheers |
Hi @pro-src. Thanks for the quick reply. I tried sending the request as you said and it still did not work. I used this module in Python, is this the one that is dangerous to use? I have read issue 255 and tried everything in there, but in this case, it still didn't resolve the issue for me. I believe issue 255 had a similar problem but not the same, although it does try sending a Here is my out.txt file as you requested! I hope I'm doing something really stupid and that the solution is simple. I do apologize in advance if that is the case. Thanks again for the help! Edit: it was sending a |
Yw 😄
Yes❗ I used to own that pypi.org project. Unfortunately, it is owned by a very cunning individual now. You've been warned. I've noticed that you're attempting to send pseudo HTTP/2 headers. The underlying request library doesn't support HTTP/2 and adds the HAR file excerpt {
"request": {
"method": "GET",
"url": "https://www.nakedcph.com/auth/view?op=register",
"httpVersion": "http/2.0",
"headers": [
{
"name": ":method",
"value": "GET"
},
{
"name": ":authority",
"value": "www.nakedcph.com"
},
{
"name": ":scheme",
"value": "https"
},
{
"name": ":path",
"value": "/auth/view?op=register"
}
]
} All of the above headers should be omitted when imitating a browser's HTTP/1 request. I'm using https://httpbin.org which responds with the request info that you sent to demonstrate the difference between the Request - formData optionrequire('cloudscraper').get({
uri: 'https://httpbin.org/anything',
formData: { test: 'foobar' }
}).then(console.log) Request as reported by httpbin.org{
"args": {},
"data": "",
"files": {},
"form": {
"test": "foobar"
},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.8",
"Content-Length": "165",
"Content-Type": "multipart/form-data; boundary=--------------------------864593522106200008829719",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36"
},
"json": null,
"method": "GET",
"origin": "54.166.62.111, 54.166.62.111",
"url": "https://httpbin.org/anything"
} Request - json optionrequire('cloudscraper').get({
uri: 'https://httpbin.org/anything',
json: { test: 'foobar' }
}).then(console.log) Request as reported by httpbin.org{
args: {},
data: '{"test":"foobar"}',
files: {},
form: {},
headers: {
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.8',
'Content-Length': '17',
'Content-Type': 'application/json',
Host: 'httpbin.org',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36'
},
json: { test: 'foobar' },
method: 'GET',
origin: '54.166.62.111, 54.166.62.111',
url: 'https://httpbin.org/anything'
} Perform the same tests with your python code to ensure that everything lines up. If I had the debug output from python maybe I could pinpoint the issue. Prepend the following to your python code to generate similar debug output: Python code snippetimport logging
try:
from http.client import HTTPConnection # py3
except ImportError:
from httplib import HTTPConnection # py2
HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
# import cfscrape
# scraper = cfscrape.create_scraper()
# print(scraper.get('https://google.com')) |
With fresh eyes, the server expects the body to be const cloudscraper = require('cloudscraper')
const { headers: defaultHeaders } = cloudscraper.defaultParams
const uri = new URL('https://www.nakedcph.com/auth/view?op=register')
const response = await cloudscraper.post({
uri: new URL('/auth/submit', uri.href),
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
json: true,
simple: false,
headers: {
...defaultHeaders,
Origin: uri.origin,
Referer: uri.href,
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'X-AntiCsrfToken': csrf,
'X-Requested-With': 'XMLHttpRequest'
},
form: {
_AntiCsrfToken: csrf,
firstName: username,
email: username + '@gmail.com',
password: password,
'g-recaptcha-response': gRes,
action: 'register'
}
}) |
@pro-src I love you. Thanks. Following the above changes you made to the request, I was able to get a 500 response instead of a 403. The 500 response included a Takeaways:
Hope this helps anyone else having this issue and thank you so much @pro-src for all of the help. My issue is resolved 😊 |
There hasn't been a whole lot of feedback concerning the reCaptcha related API. Per your feedback, I've sent a PR(#260) to soft deprecate Secondly, I fixed a few bugs:
Finally, the regular expressions have been greatly improved. The siteKey can be found 4 times within Cloudflare's reCaptcha(v2) response, e.g. https://captcha.website, and this update is aware of all them. Previously, this would only match 2, the data-sitekey attribute and the fallback. Could I get you to test this and report back whether pinning the URL and/or siteKey is still necessary?
OR git clone --single-branch --branch recaptcha https://github.com/pro-src/cloudscraper
cd cloudscraper
# Feel free to replace npm with yarn in any of these commands
npm install # Optionally add --production, if skipping test
npm test # Optional but recommended
# If you're going to manually update your require calls, you're done
# Otherwise register cloudscraper with NPM globally
npm link
# Proceed to create a symlink to cloudscraper in your project's node_modules/
cd ../my-project
npm link cloudscraper
node index.js |
sorry for missing this! I was about to post another issue when I saw this. Will be performing these tests today and reporting back. |
@pro-src Did some more testing. I'm able to bypass the initial Cloudflare captcha page here. However, when sending the post request we revised above, I keep getting Any ideas on the fix? This is my code: var cloudscraper = require("cloudscraper")
const captchaAPI = require("imagetyperz-api")
const { headers: defaultHeaders } = cloudscraper.defaultParams
//cloudscraper.debug = true
let captchaRes
async function onCaptcha(options, response, body) {
const captcha = response.captcha
// solveReCAPTCHA is a method that you should come up with and pass it href and sitekey, in return it will return you a reponse
const token = await solveCaptcha(response.request.uri.href, captcha.siteKey)
captcha.form["g-recaptcha-response"] = token
captcha.submit()
}
// python sitekey = '6LeNqBUUAAAAAFbhC-CS22rwzkZjr_g4vMmqD_qo'
async function solveCaptcha(uri, sitekey) {
captchaAPI.set_access_key("MY CAPTCHA SOLVING API KEY")
const params = {
page_url: uri,
sitekey: sitekey
}
console.log("Solving captcha...")
const id = await captchaAPI.submit_recaptcha(params)
const token = await captchaAPI.retrieve_recaptcha(id)
captchaRes = token
return token
}
async function run() {
const req = await cloudscraper.get({
uri: "https://nakedcph.com",
onCaptcha: onCaptcha,
resolveWithFullResponse: true,
simple: false,
headers: {
"user-agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
Accept: "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br"
}
})
if (req.statusCode === 200) {
const AntiCsrfToken = req.body.match(
/setRequestHeader\('X-AntiCsrfToken', '(.+)'/
)[1]
const post = await cloudscraper.post({
uri: "https://nakedcph.com/auth/submit",
onCaptcha: onCaptcha,
resolveWithFullResponse: true,
followOriginalHttpMethod: true,
json: true,
simple: false,
headers: {
...defaultHeaders,
"user-agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
Accept: "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"x-AntiCsrfToken": AntiCsrfToken,
"x-Requested-With": "XMLHttpRequest",
origin: "https://www.nakedcph.com",
referer: "https://www.nakedcph.com/auth/view?op=register"
},
form: {
_AntiCsrfToken: AntiCsrfToken,
firstName: "My Name",
email: "[email protected]",
password: "MyPassword",
"g-recaptcha-response": captchaRes,
action: "register"
}
})
console.log("ACC RESULT:")
console.log(post.statusCode)
console.log(post.body)
return
}
}
run() Running the following code with Something to note: The sitekey I use to solve captchas in Python is different from the one scraped by Cloudscraper in the example above. Very weird that the sitekey commented out works in the Python version but yields Thanks for helping out, hopefully we can get this working and the module bugs sorted out ASAP! |
@pro-src any updates? |
Not as of yet. |
was the issue resolved? Just wondering why this was closed with no reply. I'm still having issues with the example I sent on the updated version! Any help is appreciated. Thanks. |
You said:
The issue was reopened to address the bugs that I discovered and subsequently closed automatically by Github once the (fix)PR was merged. I understand there's a new issue that's related to the old one but would you mind opening a new issue for that and just referencing this one. I have meant to attend your issue.
That's very weird indeed considering how the python regex is merely: 'data-sitekey="(.+?)"' Where as Cloudscraper's primary siteKey regex is robust (not mentioning the fallbacks): /\sdata-sitekey=["']?([^\s"'<>&]+)/ So if anything, the python code would be failing you. I would just create a simple (working) snippet to show the difference if there was one but I don't see it. Feel free to prove me wrong. Somebody will eventually get around to your issue. If you would like me personally to expedite your issue, consider becoming a patron. Thanks for your understanding. |
@pro-src i do not mind becoming a patron to get assisted ASAP! Just tell me how to go about doing that and how much I should pledge to be worth your time :) Would love to work on my issue specifically with you if possible. If becoming a patron is what it takes to get some 1-on-1 assistance from you I will definitely do it. |
@pro-src awesome! Just pledged. See you in Discord! Can't wait :) |
Sorry to comment on a closed issue, but was there a resolution to this? I am also experiencing the 500 issue |
Hi!
I'm using Cloudscraper version
4.14
on Node version12.10.0
.I'm attempting to access this website, which has a cloudflare protection page with a captcha.
I can bypass the cloudflare and access the site's homepage/any page, however, after bypassing, I am unable to successfully send a post request. The weirdest part about this is that my code works using the Python version of the module recreating the exact same requests.
When sending the post request, the console sometimes prints:
request received invalid json
when debug is on.In my second request I need a csrf token that gets returned with the first request's (the bypass request) response. Essentially I am trying to create an account on this website by first retrieving the csrf after the initial bypass (which I can do successfully) and then sending a post request with the account information.
Like I said, I can do this successfully in Python which leads me to believe the issue is related to the module's way of handling post requests, but of course I'm probably wrong. This is my code when sending both requests.
I get a 200 on the first request, and a 403 on the 2nd. This is what the server returns on the 403:
{ Response: null, StatusCode: 500, Status: '' }
Hopefully I'm being really stupid and there is a super simple solution to this. And like I said, my Python version is 100% functional doing the exact same request with the same headers and everything.
Thanks and sorry for any confusions and the long code. I've been looking at this for way longer than I should have and haven't been able to find the solution. Any help is much appreciated.
The text was updated successfully, but these errors were encountered: