GitHub - gallyamow/go-crawler: A minimalistic web crawler

go-crawler

A minimalistic, concurrent web crawler written in Go.

Features

Concurrent Processing: Configurable number of worker goroutines
Graceful Shutdown: Proper cleanup and signal handling
Retry Logic: Exponential backoff with configurable retry attempts
Configuration Management: Environment variables and command-line flags
Error Handling: Error handling with detailed logging
Memory Management: Efficient memory usage with proper cleanup

Usage

go run cmd/crawler/main.go \
  --max-count 200 \
  --max-concurrent 20 \
  --url "https://go.dev/learn/" \
  --timeout 60s \
  --output-dir "./tmp"

./crawler --max-count=1000 --url "https://pikabu.ru/" --output-dir "./.tmp"

Options

Flag	Environment Variable	Default	Description
`--max-count`	`CRAWLER_MAX_COUNT`	100	Maximum pages to crawl
`--max-concurrent`	`CRAWLER_MAX_CONCURRENT`	10	Maximum concurrent workers
`--url`	`CRAWLER_URL`	""	Starting URL
`--timeout`	`CRAWLER_TIMEOUT`	30s	HTTP request timeout
`--retry-attempts`	`CRAWLER_RETRY_ATTEMPTS`	3	Number of retry attempts
`--retry-delay`	`CRAWLER_RETRY_DELAY`	1s	Delay between retries
`--output-dir`	`CRAWLER_OUTPUT_DIR`	./.tmp/	Output directory
`--log-level`	`CRAWLER_LOG_LEVEL`	info	Log level

Future Enhancements

Distributed crawling support
Advanced filtering and crawling rules (by size, file format)
Metrics & Monitoring: Comprehensive statistics and performance tracking
Handle redirects

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.idea		.idea
cmd/crawler		cmd/crawler
internal		internal
pkg		pkg
testdata		testdata
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
[email protected]		[email protected]
[email protected]		[email protected]
go-crawler.iml		go-crawler.iml
go.mod		go.mod
go.sum		go.sum
pkg.iml		pkg.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

go-crawler

Features

Usage

Options

Future Enhancements

To see

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gallyamow/go-crawler

Folders and files

Latest commit

History

Repository files navigation

go-crawler

Features

Usage

Options

Future Enhancements

To see

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages