-
Notifications
You must be signed in to change notification settings - Fork 41
Spider
zhengchun edited this page Dec 7, 2017
·
2 revisions
Spider
is a crawler handler that responds to received an HTTP response and send Item data to Item Pipeline
.
A Spider Interface:
type Handler interface {
ServeSpider(chan<- Item, *http.Response)
}
type exampleSpider struct{}
func (s *exampleSpider) ServeSpider(c chan<- antch.Item, res *http.Response) {
doc, err := antch.ParseHTML(res)
for _, link := range htmlquery.Find(doc, "//a") {
c <- htmlquery.SelectAttr(link, "href")
}
}
htmlquery is an XPATH query package for HTML.
Register a spider handler into crawler.
crawler.Handle("example.com",&exampleSpider{})