源码仓库:这里
通过 channel - 爬取珍爱网
- 【engine】
- 【fetcher】
- 【scheduler】
- 【parser】
- 【爬虫架构】
+--------------+ +--------------+ response +-----------------+
| | request | +------------------->+ |
| Seed +----------->+ Engine | requests,items | Parser |
| | | |<-------------------| |
+--------------+ ++-----------+-+ +-----------------+
+------^^ ^ ^------+
|-------| | |
|| | |
|| |url |response
vv | |
+---------------++------+ | +------+--------------+
| | | | |
| Task Queue | +-> Fetcher |
| | | |
+-----------------------+ +---------------------+
+----------------------------------------------+
| |
| Worker |
| |
| |
+--------------+ +--------------+ response +------- ---------+ |
| | request | | +------------------->+ | |
| Seed +----------->+ Engine| | requests,items | Parser | |
| | | | +<-------------------+ | |
+--------------+ +------------+-+ +-----------------+ |
+-------> | > ^------+ |
|-------+ | | | |
|| | | | |
|| | |url |response |
<> | | | |
+-----------------------+ | | +-------------+------------------+ |
| | | | | | |
| Task Queue | | -----------------------> Fetcher | |
| | | | | |
+-----------------------+ | +------------------+ |
| |
+----------------------------------------------+
+---------------+
| OutPut |
| |
+---------------+
<>
||
|| Items
||
||
+--------------+ +--------------+ +-----------------+
| | request | | requests,items | +-----------------+
| Seed +----------->+ Engine +--------------------+ | +-----------------+
| | | | | | | |
+--------------+ +-----------+--+ +-+ | Worker |
| +-+ |
| request +--+--------------+
| |
| |
v |
+-+------------------------------+----+
| |
| Scheduler |
| |
+-------------------------------------+
+-----------------+
| +-----------------+
| | +-----------------+
| | | |
+ +-+ | Worker |
| request +-+ |
| +--+--------------+
| ^
| | request
v |
+--------+---------+ create for +------------------+
| | each request | +------------------+
| Scheduler +------------------>+ | +-------------------+
| | | | | |
+------------------+ +-+ | Goroutine |
+-+ |
+-------------------+
+
|
| request
|
|
v +--------------------------------+
+-------+---------------+ | |
| | | +----------+ +----------+ |
| Scheduler +--------->+ | Request +---->+ Worker | |
| | | +----------+ +----------+ |
+-+-------------------+-+ | |
| | +--------------------------------+
| |
v |
+---------+-------+ +--------+---------+ +------------------+
| | | | | +------------------+
| Request Queue | | Worker Queue +-------------------+ | +-------------------+
| | | | | | | |
+-----------------+ +------------------+ +-+ | Worker |
+-+ |
+-------------------+