GitHub - hollingsworthd/ScreenSlicer: Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.

ScreenSlicer will be re-worked to be a library easily incorporated into other applications. No maintenance will be done to the codebase in its current form.

Licensed under the Apache License v2.0 (details).

Download | Getting Started | API Docs | Build Guide | Also see: jBrowserDriver

Overview

ScreenSlicer is a web scraper. It requires no configuration, and it automatically queries search engines then extracts the results, optionally including the HTML at each result's URL. Using neural nets and tuned heuristics, ScreenSlicer is able to intelligently find a search box, enter a query, extract the results, and page forward in the results. In addition to keyword searches, structured form queries are supported too. And AJAX sites work just as well as static HTML ones. Sites with authentication (username/password) are also supported.

Clustering is built in. Each request accepts IP addresses of ScreenSlicer servers (see API docs linked below) and your requests will be balanced in a queue, with requests diverted away from busy servers. Messages between servers are encrypted with no need for SSL--just share a duplicate screenslicer.config file with each installation (this file is auto-generated at the root of the installation directory on first launch).

Proxying is supported and a proxy server can be specified on each request. SOCKS 5, SOCKS 4, HTTP, and SSL proxies can be used. The default proxy is a SOCKS 5 server running at 127.0.0.1:9050 (the standard connection for tor-socks which the installation instructions have you install). You can also use proxies which require username and password.

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
api		api
common		common
core		core
misc		misc
webapp		webapp
.gitignore		.gitignore
Contributors.md		Contributors.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

About

Releases 3

Packages

Languages

License

hollingsworthd/ScreenSlicer

Folders and files

Latest commit

History

Repository files navigation

Overview

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages