Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not scraping anything #8

Open
nickpezzotti1 opened this issue Mar 4, 2019 · 0 comments
Open

Not scraping anything #8

nickpezzotti1 opened this issue Mar 4, 2019 · 0 comments

Comments

@nickpezzotti1
Copy link

nickpezzotti1 commented Mar 4, 2019

When running the spider this is what is logged from the terminal (hashtag = happy):

2019-03-04 01:52:07 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: instagram_spider)
2019-03-04 01:52:07 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.2 (default, Feb 12 2019, 08:15:36) - [Clang 10.0.0 (clang-1000.11.45.5)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Darwin-18.2.0-x86_64-i386-64bit
2019-03-04 01:52:07 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'instagram_spider', 'FEED_URI': './scraped/%(name)s/%(hashtag)s/%(date)s', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'instagram_spider.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['instagram_spider.spiders']}
2019-03-04 01:52:07 [scrapy.extensions.telnet] INFO: Telnet Password: b592a55ecf71f8ae
2019-03-04 01:52:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
Name of the hashtag? happy
2019-03-04 01:52:08 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-03-04 01:52:08 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-03-04 01:52:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-03-04 01:52:08 [scrapy.core.engine] INFO: Spider opened
2019-03-04 01:52:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-03-04 01:52:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-03-04 01:52:08 [scrapy.core.engine] INFO: Closing spider (finished)
2019-03-04 01:52:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
 'downloader/request_bytes': 226,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 1892,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 3, 4, 1, 52, 8, 467412),
 'log_count/INFO': 9,
 'memusage/max': 50495488,
 'memusage/startup': 50491392,
 'response_received_count': 1,
 'robotstxt/forbidden': 1,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2019, 3, 4, 1, 52, 8, 143112)}
2019-03-04 01:52:08 [scrapy.core.engine] INFO: Spider closed (finished)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant