Skip to content

A web-crawler that creates a semantic graph of a website

Notifications You must be signed in to change notification settings

Hawzen/Gopher-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

gopher-crawler

What

A web-crawler that creates a semantic graph of a website

Screen.Recording.2023-08-25.at.11.25.12.PM.mov

How

  • Use Go to crawl the target url
    • Visit this for more info
  • Use Dgraph to store the data
    • Visit this for more info
  • Use Flask to analyze the data
    • Visit this for more info

Run

Need

  • docker
  • go
  • python

Command

docker run --name dgraph -d -p "8181:8080" -p "9080:9080" -v dgraph-data:/dgraph dgraph/standalone:latest
docker run --name ratel  -d -p "8000:8000"  dgraph/ratel:latest
cd analyzer
pip install -r requirements.txt
python server.py & # Or open in a new terminal, without the & at the end
cd ../crawler
go run main.go <target_url>
  • Browse http://localhost:8000/
  • Execute query to see graph
{ 
  Page(func: eq(is_crawled, "true")) {
		title
    summary
    keywords
    related_pages {
	url
    }
  }
  
  Domain(func: has(name)) {
	name
  }
}

About

A web-crawler that creates a semantic graph of a website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages