Skip to content

Latest commit

 

History

History
23 lines (14 loc) · 902 Bytes

README.md

File metadata and controls

23 lines (14 loc) · 902 Bytes

How to scrapy?

Environment setup

  1. pipenv install
  2. pipenv shell

Generate a new spider

  1. go into the crawler-system directory by cd crawler-system
  2. using scrapy build-in tools: scrapy genspider <spiderName> <targetUrl> to generate a spider template.

Configurations setup

  1. go into the directory there is a settings.py script file.
  2. you can turn on/off the logging, database, pipelines, middlewares, and other components in it (ref: pttCrawlerSystem/setting.py).

Develop a spider

How scrapy works

  1. go to main.py script file and add new line with cmdline.execute("scrapy crawl <spiderName>".split()), and comment other line with cmdline.execute(...) for testing your spider.
  2. learn scrapy official docs.