Skip to content
zhengchun edited this page Dec 7, 2017 · 2 revisions

Overview

Spider is a crawler handler that responds to received an HTTP response and send Item data to Item Pipeline.

A Spider Interface:

type Handler interface {
    ServeSpider(chan<- Item, *http.Response)
}

Sample code

type exampleSpider struct{}

func (s *exampleSpider) ServeSpider(c chan<- antch.Item, res *http.Response) {
	doc, err := antch.ParseHTML(res)
	for _, link := range htmlquery.Find(doc, "//a") {
		c <- htmlquery.SelectAttr(link, "href")
	}
}

htmlquery is an XPATH query package for HTML.

Register a spider handler into crawler.

crawler.Handle("example.com",&exampleSpider{})
Clone this wiki locally