GitHub - qibinghua/ants-go: open source, distributed, restful crawler engine in golang

ants-go

open source, restful, distributed crawler engine

gitter

comming up

Persistence
Dynamic Master

design of ants-go

ants

I wrote a crawler engine named ants in python base on scrapy. But sometimes, dynamic language is chaos. So I start to write it in a compile language.

scrapy

I design the crawler framework by imitating scrapy. such as downloader,scraper,and the way user write customize spider, but in a compile way

elasticsearch

I design my distributed architecture by imitating elasticsearch. it spire me to do a engine for distributed crawler

requirement

go get github.com/PuerkitoBio/goquery
go get github.com/go-sql-driver/mysql

install

go get github.com/wcong/ants-go
go install github.com/wcong/ants-go

run

cd bin
./ants-go

check cluster status

curl 'http://localhost:8200/cluster'

get all spiders

curl 'http://localhost:8200/spiders'

start a spider

curl 'http://localhost:8200/crawl?spider=spiderName'

cluster in one computer

to test cluster in one computer,you can run it from different port in different terminal

one node,use the default port tcp 8300 http 8200

cd bin
./ants-go

the other node set tcp port and http port

cd bin
./ants-go -tcp 9300 -http 9200

flags

there are some flags you can set,check out the help message

./ants-go -h
./ants-go -help

Customize spider

go to spiders
write your spiders follow the example deap_loop_spider.go or go to the spider page
add you spider to spiderMap,follow the example in LoadAllSpiders in load_all_spider.go
install again

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
ants		ants
docs		docs
spiders		spiders
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SPIDER.md		SPIDER.md
VERSION		VERSION
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ants-go

gitter

comming up

design of ants-go

ants

scrapy

elasticsearch

requirement

install

run

check cluster status

get all spiders

start a spider

cluster in one computer

flags

Customize spider

About

Releases

Packages

Languages

License

qibinghua/ants-go

Folders and files

Latest commit

History

Repository files navigation

ants-go

gitter

comming up

design of ants-go

ants

scrapy

elasticsearch

requirement

install

run

check cluster status

get all spiders

start a spider

cluster in one computer

flags

Customize spider

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages