description |
---|
Load data from Apify Website Content Crawler. |
Apify is a web scraping and data extraction platform that provides an app store with more than a thousand ready-made cloud tools called Actors.
The Website Content Crawler Actor can deeply crawl websites, clean their HTML by removing a cookies modals, footers, or navigation, and then transform the HTML into Markdown. This Markdown can then be stored in a vector database for semantic search or Retrieval-Augmented Generation (RAG).
Apify Website Content Crawler Node
- (Optional) Connect Text Splitter.
- Connect Apify API (create a new credential with your Apify API token).
- Input one or more URLs (separated by commas) where the crawler will start, e.g
https://docs.flowiseai.com/
. - Select the crawler type. Refer to Website Content Crawler documentation for more information.
- (Optional) Specify additional parameters such as maximum crawling depth and the maximum number of pages to crawl.
Loads website content as a Document.