-
-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCRAPER] - foodnetwork.com returns 403 due to user-agent string #4024
Comments
i scoured the code a bit and found the firefox user-agent string that mealie is using, and confirmed it is also responsible for throwing the 403 error, whereas the user-agent string from my linux firefox install is getting a 200:
|
seems like i still run into the problem even when dropping in my updated user-agent string, presumably because the headers being imported from i don't have time to keep digging into this but hopefully i've given enough info to help get someone on their way to fix this, since this is a major source of recipes and it's a real drag not being able to import them! |
It's really tricky to figure out what a solution to these kind of HTTP forbidden responses could be, because we can't really determine what the logic is that the host sites are using to determine why one client (not necessarily one person!) is worth blocking, while another is deemed worth providing a (potentially dynamic, personalized) response to. |
I have the same issue. Duplicated on demo site. Docker log:
Let me know if you need anything else to help. |
I faced this, too, so I swapped out the user agent and that solved my issue. I replaced it in both these files:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Bumping this as it's a genuine issue |
Would using something like fake-useragent be a solution here? Then that could be wrapped in a try-except block like so
|
following |
First Check
I used the GitHub search to find a similar issue and didn't find it.
I have verified that this issue is not related to the underlying library
hhyrsev/recipe-scrapers by 1) checking
the debugger and data is returned, 2)
verifying that there are errors in the log related to application level code, or
3) verified that the site provides recipe data, or is otherwise supported by
hhyrsev/recipe-scrapers
This issue can be replicated on the demo site (https://demo.mealie.io/)
Please provide 1-5 example URLs that are having errors
Upon troubleshooting the underlying recipe-scrapers library in a docker container with interactive python shell, i identified that it is the request that's being denied (using normal python
requests
library). Entering the user-agent string from the documentation on the recipe-scrapers readme, i received a 403, while running the same get request with the following user agent string succeeded (copied from my Firefox session):User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0
urls that i've tested (have tested more, but these are the last couple):
https://www.foodnetwork.com/recipes/ina-garten/garlic-roasted-potatoes-recipe-1913067
https://www.foodnetwork.com/recipes/ina-garten/1770-house-meatloaf-recipe-2109034
Please provide your logs for the Mealie container
docker logs <container-id> > mealie.logs
in an interactive python shell:
Deployment
Docker (Linux)
The text was updated successfully, but these errors were encountered: