Web Crawler to check few SEO basics.
Use the collected data in your favorite spreadsheet software or retrieve them via your favorite language.
French documentation available : https://piedweb.com/seo/crawler
Via Packagist
$ composer create-project piedweb/seo-pocket-crawler
$ bin/console crawler:go $start
start Define where the crawl start. Eg: https://piedweb.com
You can specify an id from a previous crawl. Other options will not be listen.
You can use `last` to continue the last crawl (just stopped)
-l, --limit=LIMIT Define where a depth limit [default: 5]
-i, --ignore=IGNORE Virtual Robots.txt to respect (could be a string or an URL).
-u, --user-agent=USER-AGENT Define the user-agent used during the crawl. [default: "SEO Pocket Crawler - PiedWeb.com/seo/crawler"]
-w, --wait=WAIT In Microseconds, the time to wait between 2 requests. Default 0,1s. [default: 100000]
-c, --cache-method=CACHE-METHOD In Microseconds, the time to wait between two request. Default : 100000 (0,1s). [default: 2]
-r, --restart=RESTART Permit to restart a previous crawl. Values 1 = fresh restart, 2 = restart from cache
-h, --help Display this help message
-q, --quiet Do not output any message
-V, --version Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
-n, --no-interaction Do not ask any interactive question
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
$ bin/console crawler:external $id [--host]
--id
id from a previous crawl
You can use `last` too show external links from the last crawl.
--host -ho
flag permitting to get only host
Will update the previous data.csv
generated. Then you can explore your website with the PoC pagerank.html
(in a server npx http-server -c-1 --port 3000
).
$ bin/console crawler:pagerank $id
--id
id from a previous crawl
You can use `last` too calcul page rank from the last crawl.
$ composer test
- Better Links Harvesting and Recording (record context (list, nav, sentence...))
- Transform the PoC (Page Rank Visualizer)
- Complex Page Rank Calculator (with 301, canonical, nofollow, etc.)
Please see contributing
The MIT License (MIT). Please see License File for more information.