pipenv install
pipenv shell
- go into the crawler-system directory by
cd crawler-system
- using scrapy build-in tools:
scrapy genspider <spiderName> <targetUrl>
to generate a spider template.
- go into the directory there is a
settings.py
script file. - you can turn on/off the logging, database, pipelines, middlewares, and other components in it (ref: pttCrawlerSystem/setting.py).
- go to main.py script file and add new line with
cmdline.execute("scrapy crawl <spiderName>".split())
, and comment other line with cmdline.execute(...) for testing your spider. - learn scrapy official docs.