Alive is a simple url verifier tool that updates triples based on url availability.
The BCube triple store contains triples describing datasets and web services, this tool updates the status of these urls. If the URL is alive for subject ?s then this tool will send a request to the BCUBE INSERT request to the triple store as follows:
INSERT
{
?subject prov:atTime \"$TIME\"^^xsd:date .
?subject http:statusCodeValue \"$HTTP_RESPONSE\"^^xsd:integer .
}
WHERE
{
?subject vcard:hasURL \"$URL\" .
}
This way we can create semantic queries like
SELECT *
WHERE {
?subject vcard:hasURL ?base_url .
?subject http:statusCodeValue ?code .
?subject prov:atTime ?lastTimeChecked .
FILTER regex(?base_url, "NASA", "i")
FILTER (?code = 200)
}
which returns all the URLs that contain the word NASA and returned with an HTTP 200 OK response.
pip install -r requirements.txt
Using a virtual environment is highly recommended.
Running the tests
$PATH/TO/ALIVE/nosetests
python $PATH/TO/alive.py -s [http://BCUBE-ENDPOINT] -w [thread-number] -t [timeout for a URL] -v [verbose]
example:
python app/alive -a http://rest-endpoint/graph/dev -w 8 -t 2
this will query which URLs are in the http://rest-endpoint/graph/dev endpoint(graph dev) and will use up to 8 workers to request HTTP responses on these URLs. Each request will have a max timeout of 2 seconds.
- NOTE: The timeout parameter can affect how many HTTP 200 responses we get, if we set it too low we'll get a lot of 500s due remote server speeds and/or a poor local machine performance. 2 seconds worked fine in a 2 core laptop with low bandwidth.
- Create a profiler script
- Dockerize the app