-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for custom header and cookies for the initial request from kafka_monitor.py feed #182
Comments
Cookie support is already provided thanks to the As for custom request methods, the custom scheduler is where you want to look, as that translates the objects coming in into Scrapy Requests. I think the scheduler should be able to handle Post requests being yielded from the spider, thanks to the scrapy dict methods, but on initial request that is something that could be improved on. Scrapy Cluster purposefully does not store cookie information in each spider, because any single chain of requests might go to multiple spiders or machines. You would need to customize the setup a bit to pass those cookies through your calls so they are used in subsequent requests. Scrapy Cluster is most suited for large scale on demand crawling, and in its current form (because it is distributed) has some of the limitations or assumptions I noted above. I am always happy to look at or review a PR if you think it would be worthwhile to add to the project! |
Working towards it. I made the custom request working with headers and cookies. Working on shared cookie instances, shared via redis seperated by crawl/spider ids |
Below custom cookie middleware worked for me. Not sure if this is a right place to initiate redis_conn. Could not find a way to share ` from scrapy.downloadermiddlewares.cookies import CookiesMiddleware class SharedCookiesMiddleware(CookiesMiddleware):
` |
Thanks @knirbhay! This is a great start, I can try to incorporate this in or just leave it as a standalone file. You can make a PR and I can review it and get it merged in, otherwise there are just a couple of things I would like changed in it, but otherwise is great work. |
Sure I will also include custom request working with headers and cookies. I have enhanced Kafka feed API but I also need to check if it works with REST API of Scrapy cluster. |
I needed to request an URL with custom header and preset cookies. eg.
There is an API at
https://xyz.com/test_api/_id
which returns a json.and this should be called with api keys with custom header and few preset cookies in the request with a POST call.
How do I get it working with scrapy-cluster?
With Scrapy I used to override the start_request method and apply custom header and cookies.
Another problem looks like cookie jar issue where cookies are stored on one node and can not be passed to another node. This is activated when Server uses Set-Cookies method to store session details.
The text was updated successfully, but these errors were encountered: