You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran the Scrapy Cluster spider start code and I ended up getting this error message, I have no idea what this could be and have troubleshooted for a while. I was also wondering a few other things which I have below this error message. Thank you!
root@crawler:~/scrapy-cluster/crawler# scrapy runspider crawling/spiders/link_spider.py
2023-09-20 18:02:08,347 [sc-crawler] ERROR: Unable to connect to Kafka in Pipeline due to attempt to connect already-connected SSLSocket!, raising exit flag.
Unhandled error in Deferred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 245, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 249, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1905, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1815, in _cancellableInlineCallbacks
_inlineCallbacks(None, gen, status)
--- <exception caught here> ---
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 134, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 148, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/engine.py", line 99, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/scraper.py", line 109, in __init__
self.itemproc: ItemPipelineManager = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.10/dist-packages/scrapy/middleware.py", line 67, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.10/dist-packages/scrapy/middleware.py", line 44, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/misc.py", line 188, in create_instance
instance = objcls.from_crawler(crawler, *args, **kwargs)
File "/root/scrapy-cluster/crawler/crawling/pipelines.py", line 134, in from_crawler
return cls.from_settings(crawler.settings)
File "/root/scrapy-cluster/crawler/crawling/pipelines.py", line 124, in from_settings
sys.exit(1)
builtins.SystemExit: 1
Here are the other things I was wondering about Scrapy Cluster:
Does this command automatically start the crawler without anything having to be fed into it?
scrapy runspider crawling/spiders/link_spider.py
If so, is there a starting URL in the settings and does it branch off from there to crawl multiple URLs from the seed URL? If you do have to feed a URL into it to start it, does it then automatically start crawling other URLs from there? Sorry for the so many questions, thank you for your help!
The text was updated successfully, but these errors were encountered:
I ran the Scrapy Cluster spider start code and I ended up getting this error message, I have no idea what this could be and have troubleshooted for a while. I was also wondering a few other things which I have below this error message. Thank you!
Here are the other things I was wondering about Scrapy Cluster:
Does this command automatically start the crawler without anything having to be fed into it?
scrapy runspider crawling/spiders/link_spider.py
If so, is there a starting URL in the settings and does it branch off from there to crawl multiple URLs from the seed URL? If you do have to feed a URL into it to start it, does it then automatically start crawling other URLs from there? Sorry for the so many questions, thank you for your help!
The text was updated successfully, but these errors were encountered: