Skip to content

Configuration

Scott Behrens edited this page Jun 16, 2015 · 11 revisions

The following configuration options are set in the root of Sketchy.

config-default.py is the default configuration used by Sketchy.

config-test.py is the configuration file used for the nose tests (located in tests.py).

Default Configuration File

import os

_basedir = os.path.abspath(os.path.dirname(__file__))

DEBUG = True

# Database setup
SQLALCHEMY_DATABASE_URI = 'sqlite:////tmp/sketchy-db.db'

# Set scheme and hostname:port of your server.
# Alterntively, you can export the 'host' variable on your system to set the
# host and port.
# If you are using Nginx with SSL, change the scheme to https.
BASE_URL = 'http://%s' % os.getenv('host', '127.0.0.1:8000')

# Broker configuration information, currently only supporting Redis
CELERY_BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'

# Local Screenshot storage
LOCAL_STORAGE_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'files')

# Maximum time to wait for PhantomJS to generate a screenshot
PHANTOMJS_TIMEOUT = 35

# Maximum number of Celery Job retries on failure
MAX_RETRIES = 1

# Seconds to sleep before retrying the task
COOLDOWN = 5

# Path to Phanotom JS
PHANTOMJS = '/usr/local/bin/phantomjs'

# S3 Specific configurations
# This will store your sketches, scrapes, and html in an S3 bucket
USE_S3 = os.getenv('use_s3', 'False').lower() == 'true'
S3_BUCKET_PREFIX = os.getenv('bucket_prefix', '')
S3_LINK_EXPIRATION = 6000000
S3_BUCKET_REGION_NAME = os.getenv('bucket_region_name', 'us-east-1')

# Token Auth Setup
REQUIRE_AUTH = False
AUTH_TOKEN = os.getenv('auth_token', 'test')

# Log file configuration (currently only logs errors)
SKETCHY_LOG_FILE = "sketchy.log"

# Perform SSL host validation (set to False if you want to scrape/screenshot sketchy websites)
SSL_HOST_VALIDATION = False

# Enable this option to screenshot webpages that generate 4xx or 5xx HTTP error codes
CAPTURE_ERRORS = True

Database Setup

Celery uses SQLAlchmey for object relational mapping. The Database URI can be set as an environmental variable. The Database URI is set in the following directive in config-default.py:

# Database setup
SQLALCHEMY_DATABASE_URI = os.getenv('sketchy_db', 'sqlite:////tmp/sketchy-db.db')

The config-default.py file is currently set to use SQLite as the DBMS. I suggest you use a more robust database management system, such as MySQL.

An example MySQL string for Amazon RDS may look like the following:

# Database setup
SQLALCHEMY_DATABASE_URI = 'mysql://sketchydb:[email protected]:3306/sketchy'

If you change your database URI, you will need to recreate the DB tables by running:

python manage.py create_db

Broker Configuration

Celery needs a broker to be configured. You need to specify the broker URL as well as the broker backend (which stores results for specific tasks run by Celery). The following example leverages Redis as a broker and backend:

# Broker configuration information, currently only supporting Redis
CELERY_BROKER_URL='redis://localhost:6379'
CELERY_RESULT_BACKEND='redis://localhost:6379'

Alternatively, you can use another broker such as ActiveMQ.

PhantomJS Configuration

You can configure the maximum time to wait for PhantomJS to generate a screenshot using the PHANTOMJS_TIMEOUT setting. An example configuration option for PhantomJS are below:

# Maximum time to wait for PhantomJS to generate a screenshot
PHANTOMJS_TIMEOUT = 35

Celery Configuration

You can configure the number of retries Celery will execute when a task fails using the MAX_RETRIES settings. You can also specify how long to wait between retries with the COOLDOWN setting. Some example configuration options for Celery are below:

# Maximum number of Celery task retries on failure
MAX_RETRIES = 1
# Seconds to sleep before retrying the task
COOLDOWN = 5

S3 Configuration

S3 can be configured to store captures. You simply need to specify a bucket, the region your S3 storage is setup in, as well as a link expiration for the links that are generated. These can be optionally set as environmental variables. The following is an example configuration for S3:

USE_S3 = os.getenv('use_s3', 'True').lower() == 'true'
S3_BUCKET_PREFIX = os.getenv('bucket_prefix', 'mytestbucket.foobar.net')
S3_LINK_EXPIRATION = 6000000
S3_BUCKET_REGION_NAME = os.getenv('bucket_region_name', 'us-east-1')

Note: Celery needs to have AWS session keys exported for this to work. This should automatically be setup if deploying on an AWS instance.

Token Auth Setup

Token authentication can be configured optionally for all API requests. If REQUIRE_AUTH is set to True, all requests to Sketchy will require the token header specified.

# Token Auth Setup
REQUIRE_AUTH = False
AUTH_TOKEN = 'your_token_here_and_what_not'

The header you need to send is Token. More information can be found in API Authentication

HTTP Request Settings

These settings control how http requests are handled within Sketchy. SSL_HOST_VALIDATION header is used to control if Sketchy should capture screenshots from web sites that have invalid SSL certificates. CAPTURE_ERRORS can be used to screenshot webpages that return 4xx or 5xx error http status codes.

# Perform SSL host validation (set to False if you want to scrape/screenshot sketchy websites)
SSL_HOST_VALIDATION = False
# Enable this option to screenshot webpages that generate 4xx or 5xx HTTP error codes
CAPTURE_ERRORS = True