WebsiteCrawler is a C# console application that recursively crawls websites, extracts links, and displays the text content of web pages.
- Recursive Crawling: Crawl websites up to a specified depth.
- Link Extraction: Extract and format links from web pages.
- HTML to Text Conversion: Convert HTML content to plain text.
- Custom User-Agent: Mimic a real browser by setting custom headers.
-
Set the Starting URL: In the
Programclass, modify theurlconstant.const string url = "https://www.example.com";
-
Set the Maximum Depth: Adjust the
maxDepthconstant.const int maxDepth = 1;
-
Call the Crawl Command:
await WebsiteCrawler.Crawl(url, maxDepth);
- .NET SDK
TextifyNuGet package