AnyCrawl

📖 Overview

AnyCrawl is a high‑performance crawling and scraping toolkit:

SERP crawling: multiple search engines, batch‑friendly
Web scraping: single‑page content extraction
Site crawling: full‑site traversal and collection
High performance: multi‑threading / multi‑process
Batch tasks: reliable and efficient
AI extraction: LLM‑powered structured data (JSON) extraction from pages

LLM‑friendly. Easy to integrate and use.

🚀 Quick Start

📖 See full docs: Docs

📚 Usage Examples

💡 Use the Playground to test APIs and generate code in your preferred language.

If self‑hosting, replace https://api.anycrawl.dev with your own server URL.

Web Scraping (Scrape)

Example

curl -X POST https://api.anycrawl.dev/v1/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "cheerio"
}'

Parameters

Parameter	Type	Description	Default
url	string (required)	The URL to be scraped. Must be a valid URL starting with http:// or https://	-
engine	string	Scraping engine to use. Options: `cheerio` (static HTML parsing, fastest), `playwright` (JavaScript rendering with modern engine), `puppeteer` (JavaScript rendering with Chrome)	cheerio
proxy	string	Proxy URL for the request. Supports HTTP and SOCKS proxies. Format: `http://[username]:[password]@proxy:port`	(none)

More parameters: see Request Parameters.

LLM Extraction

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'

Site Crawling (Crawl)

Example

curl -X POST https://api.anycrawl.dev/v1/crawl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "playwright",
  "max_depth": 2,
  "limit": 10,
  "strategy": "same-domain"
}'

Parameters

Parameter	Type	Description	Default
url	string (required)	Starting URL to crawl	-
engine	string	Crawling engine. Options: `cheerio`, `playwright`, `puppeteer`	cheerio
max_depth	number	Max depth from the start URL	10
limit	number	Max number of pages to crawl	100
strategy	enum	Scope: `all`, `same-domain`, `same-hostname`, `same-origin`	same-domain
include_paths	array	Only crawl paths matching these patterns	(none)
exclude_paths	array	Skip paths matching these patterns	(none)
scrape_options	object	Per-page scrape options (formats, timeout, json extraction, etc.), same as Scrape options	(none)

More parameters and endpoints: see Request Parameters.

Search Engine Results (SERP)

Example

curl -X POST https://api.anycrawl.dev/v1/search \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "query": "AnyCrawl",
  "limit": 10,
  "engine": "google",
  "lang": "all"
}'

Parameters

Parameter	Type	Description	Default
`query`	string (required)	Search query to be executed	-
`engine`	string	Search engine to use. Options: `google`	google
`pages`	integer	Number of search result pages to retrieve	1
`lang`	string	Language code for search results (e.g., 'en', 'zh', 'all')	en-US

Supported search engines

Google

❓ FAQ

Can I use proxies? Yes. AnyCrawl ships with a high‑quality default proxy. You can also configure your own: set the proxy request parameter (per request) or ANYCRAWL_PROXY_URL (self‑hosting).
How to handle JavaScript‑rendered pages? Use the Playwright or Puppeteer engines.

🤝 Contributing

We welcome contributions! See the Contributing Guide.

📄 License

MIT License — see LICENSE.

🎯 Mission

We build simple, reliable, and scalable tools for the AI ecosystem.

_{Built with ❤️ by the Any4AI team}

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github/workflows		.github/workflows
apps		apps
docker		docker
docs		docs
packages		packages
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ai.config.example.json		ai.config.example.json
build.sh		build.sh
docker-compose.yml		docker-compose.yml
knip.json		knip.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
proxy.config.example.json		proxy.config.example.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AnyCrawl

📖 Overview

🚀 Quick Start

📚 Usage Examples

Web Scraping (Scrape)

Example

Parameters

LLM Extraction

Site Crawling (Crawl)

Example

Parameters

Search Engine Results (SERP)

Example

Parameters

Supported search engines

❓ FAQ

🤝 Contributing

📄 License

🎯 Mission

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors 3

Languages

License

any4ai/AnyCrawl

Folders and files

Latest commit

History

Repository files navigation

AnyCrawl

📖 Overview

🚀 Quick Start

📚 Usage Examples

Web Scraping (Scrape)

Example

Parameters

LLM Extraction

Site Crawling (Crawl)

Example

Parameters

Search Engine Results (SERP)

Example

Parameters

Supported search engines

❓ FAQ

🤝 Contributing

📄 License

🎯 Mission

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors 3

Languages

Packages