Wee is a powerful and flexible web crawling library based on go-rod, providing a clean interface for writing web crawlers in Go. It offers advanced browser control and customization options.
- Easy-to-use API for web crawling tasks
- Support for headless and headed browser modes
- User mode for connecting to system Chrome browser
- Cookie management and persistence
- Customizable timeouts and error handling
- Humanized and stealth mode options
- Popover handling
- Advanced browser configuration options
- Support for Chrome extensions
- Proxy configuration
go get github.com/coghost/wee
Here's a basic example of using wee to crawl search results from Baidu:
package main
import (
"fmt"
"github.com/coghost/wee"
)
func main() {
bot := wee.NewBotDefault()
defer bot.BlockInCleanUp()
const (
url = `https://www.baidu.com`
input = `input[id="kw"]`
keyword = `golang`
items = `div.result[id] h3>a`
noItems = `div.this_should_not_existed`
)
bot.MustOpen(url)
bot.MustInput(input, keyword, wee.WithSubmit(true))
got := bot.MustAnyElem([]string{items, noItems})
if got == noItems {
fmt.Printf("%s has no results\n", keyword)
return
}
elems, err := bot.Elems(items)
if err != nil {
fmt.Printf("get items failed: %v\n", err)
return
}
for _, elem := range elems {
fmt.Printf("%s - %s\n", elem.MustText(), *elem.MustAttribute("href"))
}
}
Wee provides several ways to create a bot:
// Default bot
bot := wee.NewBotDefault()
// Headless bot
bot := wee.NewBotHeadless()
// Bot for debugging
bot := wee.NewBotForDebug()
// User mode bot (connects to system Chrome)
bot := wee.NewBotUserMode()
You can customize bot behavior using options:
bot := wee.NewBot(
wee.UserAgent("Custom User Agent"),
wee.AcceptLanguage("en-US"),
wee.WithCookies(true),
wee.Humanized(true),
wee.StealthMode(true),
wee.WithPopovers("popover-selector"),
)
Wee supports various ways to manage cookies:
// Use cookies from default location
wee.WithCookies(true)
// Specify cookie folder
wee.WithCookieFolder("/path/to/cookies")
// Use specific cookie file
wee.WithCookieFile("/path/to/cookiefile.json")
// Use cookies from clipboard (Copy as cURL)
wee.CopyAsCURLCookies(clipboardContent)
You can customize error handling behavior:
bot.SetPanicWith(wee.PanicByLogError)
Wee provides powerful options to configure the browser:
launcher, browser := wee.NewBrowser(
wee.BrowserHeadless(false),
wee.BrowserSlowMotion(1000),
wee.BrowserUserDataDir("/path/to/user/data"),
wee.BrowserPaintRects(true),
wee.BrowserProxy("http://proxy.example.com:8080"),
)
launcher, browser := wee.NewUserMode(
wee.BrowserUserDataDir("/path/to/user/data"),
wee.LaunchLeakless(true),
)
launcher := wee.NewLauncher(
wee.BrowserHeadless(true),
wee.BrowserExtensions("/path/to/extension1", "/path/to/extension2"),
wee.BrowserFlags("--disable-gpu", "--no-sandbox"),
)
Wee supports various browser options:
BrowserHeadless(bool)
: Run browser in headless modeBrowserSlowMotion(int)
: Set slow motion delay in millisecondsBrowserUserDataDir(string)
: Set user data directoryBrowserPaintRects(bool)
: Show paint rectangles (useful for debugging)BrowserProxy(string)
: Set proxy serverBrowserExtensions(...string)
: Load Chrome extensionsBrowserFlags(...string)
: Set additional Chrome flagsLaunchLeakless(bool)
: Use leakless mode when launching browserBrowserIncognito(bool)
: Launch browser in incognito modeBrowserIgnoreCertErrors(bool)
: Ignore certificate errors
Check out the examples folder for more detailed examples.
The "context canceled" issue usually occurs when an element is bound with a timeout on bot.Elem(xxx)
. To resolve this, you can use:
elem.CancelTimeout().Timeout(xxx)
This will rewrite the previous timeout value.
Contributions are welcome! Please feel free to submit a Pull Request.