This guide covers deploying a Funnel server that leverages DynamoDB for storage
-and Batch for task execution. You’ll need to set up several resources
-using either the Funnel CLI or through the provided Amazon web console.
For Funnel to execute tasks on Batch, you must define a Compute Environment,
-Job Queue and Job Definition. Additionally, you must define an IAM role for your
-Batch Job Definition. The role provides the job container with permissions to call
-the API actions that are specified in its associated policies on your behalf. For
-this configuration, these jobs need access to S3 and DynamoDB.
+
Get started by creating a compute environment, job queue and job definition using either
+the Funnel CLI or the AWS Batch web console. To manage the permissions of instanced
+AWS Batch jobs create a new IAM role. For the Funnel configuration outlined
+in this document, this role will need to provide read and write access to both S3 and DynamoDB.
-
Note, we recommend creating the Job Definition with Funnel by running: funnel aws batch create-job-definition.
-Funnel expects the JobDefinition to start a worker with a specific configuration. Only
-advanced users should consider making any substantial changes to this Job Definition.
+
Note: We recommend creating the Job Definition with Funnel by running: funnel aws batch create-job-definition.
+Funnel expects the JobDefinition to start a Funnel worker process with a specific configuration.
+Only advanced users should consider making any substantial changes to this Job Definition.
-
Create Resources With AWS
+
AWS Batch tasks, by default, launch the ECS Optimized AMI which includes
+an 8GB volume for the operating system and a 22GB volume for Docker image and metadata
+storage. The default Docker configuration allocates up to 10GB of this storage to
+each container instance. Read more about the default AMI. Due to these limitations, we
+recommend creating a custom AMI. Because AWS Batch has the same requirements for your
+AMI as Amazon ECS, use the default Amazon ECS-optimized Amazon Linux AMI as a base and change it
+to better suite your tasks.
-
Amazon provides a quick start guide with more information here.
Funnel provides a utility to create the resources you will need to get up and running.
-You will need to specify the AWS region to create these resources in using the --region flag.
-
-
Note: this command assumes your environment contains your AWS credentials. These
-can be configured with the aws configure command.
-
-
$ funnel aws batch create-all-resources
-
-Create a compute environment, job queue and job definition in a specified region
-
-Usage:
- funnel aws batch create-all-resources [flags]
-
-Flags:
- --ComputeEnv.InstanceTypes strings The instances types that may be launched. You can also choose optimal to pick instance types on the fly that match the demand of your job queues. (default [optimal])
- --ComputeEnv.MaxVCPUs int The maximum number of EC2 vCPUs that an environment can reach. (default 256)
- --ComputeEnv.MinVCPUs int The minimum number of EC2 vCPUs that an environment should maintain. (default 0)
- --ComputeEnv.SecurityGroupIds strings The EC2 security groups that are associated with instances launched in the compute environment. If none are specified all security groups will be used.
- --ComputeEnv.Subnets strings The VPC subnets into which the compute resources are launched. If none are specified all subnets will be used.
- --ComputeEnv.Name string The name of the compute environment. (default "funnel-compute-environment")
- --JobDef.Image string The docker image used to start a container. (default "docker.io/ohsucompbio/funnel:latest")
- --JobDef.JobRoleArn string The Amazon Resource Name (ARN) of the IAM role that the container can assume for AWS permissions. A role will be created if not provided.
- --JobDef.MemoryMiB int The hard limit (in MiB) of memory to present to the container. (default 128)
- --JobDef.Name string The name of the job definition. (default "funnel-job-def")
- --JobDef.VCPUs int The number of vCPUs reserved for the container. (default 1)
- --JobQueue.Name string The name of the job queue. (default "funnel-job-queue")
- --JobQueue.Priority int The priority of the job queue. Priority is determined in descending order. (default 1)
- --config string Funnel configuration file
- -h, --help help for create-resources
- --region string Region in which to create the Batch resources.
-
Below is an example of the configuration you would need for the server had you
-run funnel aws batch create-all-resources --region us-west-2. Note that the Key
-and Secret fields are left blank in the configuration of the componenets. This is because
+
This command will create a compute environment, job queue, IAM role and job definition.
+
+
Configuring the Funnel Server
+
+
Below is an example configuration. Note that the Key
+and Secret fields are left blank in the configuration of the components. This is because
Funnel will, by default, try to will try to automatically load credentials from the environment.
Alternatively, you may explicitly set the credentials in the config.
# Activate the Funnel scheduler.
-Backend: manual
+Compute: manual
Scheduler:
# How often to run a scheduler iteration.
@@ -326,42 +344,35 @@
Usage
# In nanoseconds.
NodeInitTimeout: 300000000000 # 5 minutes
- # Node config.
- Node:
- # If empty, a node ID will be automatically generated using the hostname.
- ID: ""
-
- # Files created during processing will be written in this directory.
- WorkDir: ./funnel-work-dir
-
- # If the node has been idle for longer than the timeout, it will shut down.
- # -1 means there is no timeout. 0 means timeout immediately after the first task.
- Timeout: -1
-
- # A Node will automatically try to detect what resources are available to it.
- # Defining Resources in the Node configuration overrides this behavior.
- Resources:
- # CPUs available.
- # Cpus: 0
- # RAM available, in GB.
- # RamGb: 0.0
- # Disk space available, in GB.
- # DiskGb: 0.0
-
- # For low-level tuning.
- # How often to sync with the Funnel server.
- # In nanoseconds.
- UpdateRate: 5000000000 # 5 seconds
-
- # RPC timeout for update/sync call.
- # In nanoseconds.
- UpdateTimeout: 1000000000 # 1 second
-
- Logger:
- # Logging levels: debug, info, error
- Level: info
- # Write logs to this path. If empty, logs are written to stderr.
- OutputFile: ""
+
+Node:
+ # If empty, a node ID will be automatically generated using the hostname.
+ ID: ""
+
+ # If the node has been idle for longer than the timeout, it will shut down.
+ # -1 means there is no timeout. 0 means timeout immediately after the first task.
+ Timeout: -1
+
+ # A Node will automatically try to detect what resources are available to it.
+ # Defining Resources in the Node configuration overrides this behavior.
+ Resources:
+ # CPUs available.
+ # Cpus: 0
+ # RAM available, in GB.
+ # RamGb: 0.0
+ # Disk space available, in GB.
+ # DiskGb: 0.0
+
+ # For low-level tuning.
+ # How often to sync with the Funnel server.
+ # In nanoseconds.
+ UpdateRate: 5000000000 # 5 seconds
+
+Logger:
+ # Logging levels: debug, info, error
+ Level: info
+ # Write logs to this path. If empty, logs are written to stderr.
+ OutputFile: ""
The Funnel server process needs to run on the same machine as the Grid Engine master.
Configure Funnel to use Grid Engine by including the following config:
By default, Funnel uses an embedded database named BoltDB to store task and scheduler data. This is great for development and a simple server without external dependencies, but it doesn’t scale well to larger clusters.
+
By default, Funnel uses an embedded database named BoltDB to store task
+and scheduler data. This is great for development and a simple server without
+external dependencies, but it doesn’t scale well to larger clusters.
Funnel supports storing tasks (but not scheduler data) in Google Cloud Datastore.
+
+
This implementation currently doesn’t work with Appengine, since Appengine places
+special requirements on the context of requests and requires a separate library.
+
+
Two entity types are used, “Task” and “TaskPart” (for larger pieces of task content,
+such as stdout/err logs).
+
+
Funnel will, by default, try to will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the config.
+
+
Config:
+
+
Database: datastore
+
+Datastore:
+ Project: ""
+ # Path to account credentials file.
+ # Optional. If possible, credentials will be automatically discovered
+ # from the environment.
+ CredentialsFile: ""
+
diff --git a/docs/docs/index.xml b/docs/docs/index.xml
index 50b288f19..c0f7e5f7e 100644
--- a/docs/docs/index.xml
+++ b/docs/docs/index.xml
@@ -16,19 +16,8 @@
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/compute/aws-batch/
- Amazon Batch This guide covers deploying a Funnel server that leverages DynamoDB for storage and Batch for task execution. You’ll need to set up several resources using either the Funnel CLI or through the provided Amazon web console.
-Create Required AWS Batch Resources For Funnel to execute tasks on Batch, you must define a Compute Environment, Job Queue and Job Definition. Additionally, you must define an IAM role for your Batch Job Definition.
-
-
-
- AWS S3
- https://ohsu-comp-bio.github.io/funnel/docs/storage/aws-s3/
- Mon, 01 Jan 0001 00:00:00 +0000
-
- https://ohsu-comp-bio.github.io/funnel/docs/storage/aws-s3/
- AWS S3 Funnel supports using AWS S3 for file storage.
-The S3 storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
-Worker: Storage: S3: Disabled: false AWS: # The maximum number of times that a request will be retried for failures. MaxRetries: 10 # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Example task { "name": "Hello world", "inputs": [{ "url": "s3://funnel-bucket/hello.
+ AWS Batch This guide covers deploying a Funnel server that leverages DynamoDB for storage and AWS Batch for task execution.
+Setup Get started by creating a compute environment, job queue and job definition using either the Funnel CLI or the AWS Batch web console. To manage the permissions of instanced AWS Batch jobs create a new IAM role. For the Funnel configuration outlined in this document, this role will need to provide read and write access to both S3 and DynamoDB.
@@ -61,6 +50,18 @@ $ export FUNNEL_SERVER_PASSWORD=abc123 $ funnel task list Known issues The basi
Databases
+
+ Datastore
+ https://ohsu-comp-bio.github.io/funnel/docs/databases/datastore/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/databases/datastore/
+ Google Cloud Datastore Funnel supports storing tasks (but not scheduler data) in Google Cloud Datastore.
+This implementation currently doesn’t work with Appengine, since Appengine places special requirements on the context of requests and requires a separate library.
+Two entity types are used, “Task” and “TaskPart” (for larger pieces of task content, such as stdout/err logs).
+Funnel will, by default, try to will try to automatically load credentials from the environment.
+
+
Deploying a cluster
https://ohsu-comp-bio.github.io/funnel/docs/compute/deployment/
@@ -88,7 +89,7 @@ A node is a service which runs on each machine in a cluster. The node connects t
https://ohsu-comp-bio.github.io/funnel/docs/databases/dynamodb/DynamoDB Funnel supports storing task data in DynamoDB. Storing scheduler data is not supported currently, so using the node scheduler with DynamoDB won’t work. Using AWS Batch for compute scheduling may be a better option. Funnel will, by default, try to will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the config.
Available Config:
-Server: Database: dynamodb Databases: DynamoDB: # Basename to use for dynamodb tables TableBasename: "funnel" AWS: # AWS region Region: "us-west-2" # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Worker config Using DynamoDB with AWS Batch requires that the worker be configured to connect to the database:
+Database: dynamodb DynamoDB: # Basename to use for dynamodb tables TableBasename: "funnel" # AWS region Region: "us-west-2" # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Known issues Dynamo does not store scheduler data.
@@ -97,10 +98,9 @@ Server: Database: dynamodb Databases: DynamoDB: # Basename to use for dynamodb t
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/databases/elasticsearch/
- Elasticsearch Funnel supports storing tasks and scheduler data in Elasticsearch.
+ Elasticsearch Funnel supports storing tasks and scheduler data in Elasticsearch.
Config:
-Server: Database: elastic Databases: Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200 Writing events from the worker The worker can be configured to write events directly to Elasticsearch, which avoids unnecessary RPC traffic to the Funnel server.
-Worker: ActiveEventWriters: - log - elastic EventWriters: Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200 Known issues We have an unpleasant duplication of config between the Worker and Server blocks.
+Database: elastic Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200
@@ -111,7 +111,7 @@ Worker: ActiveEventWriters: - log - elastic EventWriters: Elastic: # Prefix to u
https://ohsu-comp-bio.github.io/funnel/docs/databases/boltdb/ Embedded By default, Funnel uses an embedded database named BoltDB to store task and scheduler data. This is great for development and a simple server without external dependencies, but it doesn’t scale well to larger clusters.
Available config:
-Server: Database: boltdb Databases: BoltDB: # Path to database file Path: ./funnel-work-dir/funnel.db
+Database: boltdb BoltDB: # Path to database file Path: ./funnel-work-dir/funnel.db
@@ -141,9 +141,8 @@ Server: Database: boltdb Databases: BoltDB: # Path to database file Path: ./funn
https://ohsu-comp-bio.github.io/funnel/docs/storage/google-storage/Google Storage Funnel supports using Google Storage (GS) for file storage.
-The GS client is NOT enabled by default, you must enabled it in the config:
-Worker: Storage: GS: # Automatically discover credentials from the environment. - FromEnv: true # Path to account credentials file. AccountFile: In the near future, Google Storage will be enabled by default. See issue #332.
-Example task { "name": "Hello world", "inputs": [{ "url": "gs://funnel-bucket/hello.
+The Google storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+GoogleStorage: Disabled: false # Path to account credentials file. AccountFile: "" Example task { "name": "Hello world", "inputs": [{ "url": "gs://funnel-bucket/hello.txt", "path": "/inputs/hello.txt" }], "outputs": [{ "url": "gs://funnel-bucket/output.txt", "path": "/outputs/hello-out.
@@ -154,7 +153,7 @@ Example task { "name": "Hello world", &q
https://ohsu-comp-bio.github.io/funnel/docs/compute/grid-engine/Grid Engine Funnel can be configured to submit workers to Grid Engine by making calls to qsub.
The Funnel server process needs to run on the same machine as the Grid Engine master. Configure Funnel to use Grid Engine by including the following config:
-Backend: gridengine Backends: GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {{.WorkDir}}/funnel-stdout #$ -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#$ -pe mpi %d" .Cpus}} {{- end}} {{if ne .
+Compute: gridengine GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {{.WorkDir}}/funnel-stdout #$ -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#$ -pe mpi %d" .Cpus}} {{- end}} {{if ne .
@@ -165,7 +164,18 @@ Backend: gridengine Backends: GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {
https://ohsu-comp-bio.github.io/funnel/docs/compute/htcondor/HTCondor Funnel can be configured to submit workers to HTCondor by making calls to condor_submit.
The Funnel server process needs to run on the same machine as the HTCondor master. Configure Funnel to use HTCondor by including the following config:
-Backend: htcondor Backends: HTCondor: |universe = vanilla getenv = True executable = {{.Executable}} arguments = worker run --config {{.Config}} --task-id {{.TaskId}} log = {{.WorkDir}}/condor-event-log error = {{.WorkDir}}/funnel-stderr output = {{.
+Compute: htcondor HTCondor: |universe = vanilla getenv = True executable = {{.Executable}} arguments = worker run --config {{.Config}} --task-id {{.TaskId}} log = {{.WorkDir}}/condor-event-log error = {{.WorkDir}}/funnel-stderr output = {{.WorkDir}}/funnel-stdout should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT {{if ne .
+
+
+
+ HTTP(S)
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/http/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/http/
+ HTTP(S) Funnel supports downloading files from public URLs via GET reqests. No authentication mechanism is allowed. This backend can be used to fetch objects from cloud storage providers exposed using presigned URLs.
+The HTTP storage client is enabled by default, but may be explicitly disabled in the worker config:
+HTTPStorage: Disabled: false # Timeout for http(s) GET requests. # In nanoseconds. Timeout: 60000000000 # 60 seconds Example task { "name": "Hello world", "inputs": [{ "url": "http://fakedomain.
@@ -174,8 +184,8 @@ Backend: htcondor Backends: HTCondor: |universe = vanilla getenv = True executab
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/events/kafka/
- Kafka Funnel supports writing task events to a Kafka topic. To use this, add an event writer to the worker config:
-Worker: ActiveEventWriters: - kafka - log - rpc EventWriters: Kafka: Servers: - localhost:9092 Topic: funnel-events
+ Kafka Funnel supports writing task events to a Kafka topic. To use this, add an event writer to the config:
+EventWriters: - kafka - log Kafka: Servers: - localhost:9092 Topic: funnel-events
@@ -187,7 +197,7 @@ Worker: ActiveEventWriters: - kafka - log - rpc EventWriters: Kafka: Servers: -
Local Funnel supports using the local filesystem for file storage.
Funnel limits which directories may be accessed, by default only allowing directories under the current working directory of the Funnel worker.
Config:
-Worker: Storage: Local: # Whitelist of local directory paths which Funnel is allowed to access. AllowedDirs: - ./ - /path/to/allowed/dir - ...etc Example task Files must be absolute paths in file:///path/to/file.txt URL form.
+LocalStorage: # Whitelist of local directory paths which Funnel is allowed to access. AllowedDirs: - ./ - /path/to/allowed/dir - ...etc Example task Files must be absolute paths in file:///path/to/file.txt URL form.
{ "name": "Hello world", "inputs": [{ "url": "file:///path/to/funnel-data/hello.
@@ -197,10 +207,9 @@ Worker: Storage: Local: # Whitelist of local directory paths which Funnel is all
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/databases/mongodb/
- MongoDB Funnel supports storing tasks and scheduler data in MongoDB.
+ MongoDB Funnel supports storing tasks and scheduler data in MongoDB.
Config:
-Server: Database: mongodb Databases: MongoDB: # Addresses for the seed servers. Addrs: - "localhost" # Database name used within MongoDB to store funnel data. Database: "funnel" Username: "" Password: "" Writing events from the worker The worker can be configured to write events directly to Mongo, which avoids unnecessary RPC traffic to the Funnel server.
-Worker: ActiveEventWriters: - log - mongodb EventWriters: MongoDB: Addrs: - "localhost" Database: "funnel" Username: "" Password: "" Known issues We have an unpleasant duplication of config between the Worker and Server blocks.
+Database: mongodb MongoDB: # Addresses for the seed servers. Addrs: - "localhost" # Database name used within MongoDB to store funnel data. Database: "funnel" Username: "" Password: ""
@@ -210,9 +219,8 @@ Worker: ActiveEventWriters: - log - mongodb EventWriters: MongoDB: Addrs: - &
https://ohsu-comp-bio.github.io/funnel/docs/storage/swift/OpenStack Swift Funnel supports using OpenStack Swift for file storage.
-The Swift client is NOT enabled by default, you must explicitly give the credentials in the worker config:
-Worker: Storage: Swift: UserName: Password: AuthURL: TenantName: TenantID: RegionName: The config currently only supports OpenStack v2 auth. See issue #336.
-As always, if you set the password in this file, make sure you protect it appropriately. Alternatively, the Swift client can pull credentials from these environment variables: https://godoc.
+The Swift storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+Swift: Disabled: false UserName: "" Password: "" AuthURL: "" TenantName: "" TenantID: "" RegionName: "" Example task { "name": "Hello world", "inputs": [{ "url": "swift://funnel-bucket/hello.txt", "path": "/inputs/hello.txt" }], "outputs": [{ "url": "swift://funnel-bucket/output.
@@ -223,7 +231,18 @@ As always, if you set the password in this file, make sure you protect it approp
https://ohsu-comp-bio.github.io/funnel/docs/compute/pbs-torque/PBS/Torque Funnel can be configured to submit workers to PBS/Torque by making calls to qsub.
The Funnel server process needs to run on the same machine as the PBS master. Configure Funnel to use PBS by including the following config:
-Backend: pbs Backends: PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}}/funnel-stdout #PBS -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#PBS -l nodes=1:ppn=%d" .Cpus}} {{- end}} {{if ne .RamGb 0.
+Compute: pbs PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}}/funnel-stdout #PBS -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#PBS -l nodes=1:ppn=%d" .Cpus}} {{- end}} {{if ne .RamGb 0.0 -}} {{printf "#PBS -l mem=%.
+
+
+
+ S3
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/s3/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/s3/
+ S3 Amazon S3 Funnel supports using AWS S3 for file storage.
+The S3 storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+AmazonS3: Disabled: false # The maximum number of times that a request will be retried for failures. MaxRetries: 10 Key: "" Secret: "" Other S3 API Providers Funnel also supports using non-Amazon S3 API providers (Ceph, Cleversafe, Minio, etc.
@@ -244,7 +263,7 @@ Backend: pbs Backends: PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}
Slurm Funnel can be configured to submit workers to Slurm by making calls to sbatch.
The Funnel server process needs to run on the same machine as the Slurm master.
Configure Funnel to use Slurm by including the following config:
-Backend: slurm Backends: SLURM: |#!/bin/bash #SBATCH --job-name {{.TaskId}} #SBATCH --ntasks 1 #SBATCH --error {{.WorkDir}}/funnel-stderr #SBATCH --output {{.WorkDir}}/funnel-stdout {{if ne .Cpus 0 -}} {{printf "#SBATCH --cpus-per-task %d" .Cpus}} {{- end}} {{if ne .
+Compute: slurm Slurm: |#!/bin/bash #SBATCH --job-name {{.TaskId}} #SBATCH --ntasks 1 #SBATCH --error {{.WorkDir}}/funnel-stderr #SBATCH --output {{.WorkDir}}/funnel-stdout {{if ne .Cpus 0 -}} {{printf "#SBATCH --cpus-per-task %d" .Cpus}} {{- end}} {{if ne .
diff --git a/docs/docs/security/basic/index.html b/docs/docs/security/basic/index.html
index 2a7e404c5..72d270ddc 100644
--- a/docs/docs/security/basic/index.html
+++ b/docs/docs/security/basic/index.html
@@ -92,18 +92,18 @@
Funnel supports using Google Storage (GS) for file storage.
-
The GS client is NOT enabled by default, you must enabled it in the config:
+
The Google storage client is enabled by default, and will try to automatically
+load credentials from the environment. Alternatively, you
+may explicitly set the credentials in the worker config:
-
Worker:
- Storage:
- GS:
- # Automatically discover credentials from the environment.
- - FromEnv: true
- # Path to account credentials file.
- AccountFile:
+
Funnel supports downloading files from public URLs via GET reqests. No authentication
+mechanism is allowed. This backend can be used to fetch objects from cloud storage
+providers exposed using presigned URLs.
+
+
The HTTP storage client is enabled by default, but may be explicitly disabled in the
+worker config:
+
+
HTTPStorage:
+ Disabled: false
+ # Timeout for http(s) GET requests.
+ # In nanoseconds.
+ Timeout: 60000000000 # 60 seconds
+
Funnel supports using the local filesystem for file storage.
-
Funnel limits which directories may be accessed, by default only allowing directories under the current working directory of the Funnel worker.
+
Funnel limits which directories may be accessed, by default only allowing directories
+under the current working directory of the Funnel worker.
Config:
-
Worker:
- Storage:
- Local:
- # Whitelist of local directory paths which Funnel is allowed to access.
- AllowedDirs:
- - ./
- - /path/to/allowed/dir
- - ...etc
+
LocalStorage:
+ # Whitelist of local directory paths which Funnel is allowed to access.
+ AllowedDirs:
+ - ./
+ - /path/to/allowed/dir
+ - ...etc
Example task
@@ -323,11 +340,19 @@
Example task
File hard linking behavior
-
For efficiency, Funnel will attempt not to copy the input files, instead trying create a hard link to the source file. In some cases this isn’t possible. For example, if the source file is on a network file system mount (e.g. NFS) but the Funnel worker’s working directory is on the local scratch disk, a hard link would cross a file system boundary, which is not possible. In this case, Funnel will copy the file.
+
For efficiency, Funnel will attempt not to copy the input files, instead trying
+create a hard link to the source file. In some cases this isn’t possible. For example,
+if the source file is on a network file system mount (e.g. NFS) but the Funnel worker’s
+working directory is on the local scratch disk, a hard link would cross a file system
+boundary, which is not possible. In this case, Funnel will copy the file.
File ownership behavior
-
One difficult area of files and Docker containers is file owner/group management. If a Docker container runs as root, it’s likely that the file will end up being owned by root on the host system. In this case, some step (Funnel or another task) will likely fail to access it. This is a tricky problem with no good solution yet. See issue 66.
+
One difficult area of files and Docker containers is file owner/group management.
+If a Docker container runs as root, it’s likely that the file will end up being owned
+by root on the host system. In this case, some step (Funnel or another task) will
+likely fail to access it. This is a tricky problem with no good solution yet.
+See issue 66.
The S3 storage client is enabled by default, and will try to automatically
+load credentials from the environment. Alternatively, you
+may explicitly set the credentials in the worker config:
+
+
AmazonS3:
+ Disabled: false
+ # The maximum number of times that a request will be retried for failures.
+ MaxRetries: 10
+ Key: ""
+ Secret: ""
+
+
+
Other S3 API Providers
+
+
Funnel also supports using non-Amazon S3 API providers (Ceph,
+Cleversafe, Minio, etc.) for file storage.
+
+
These other S3 storage clients are NOT enabled by default.
+You must configure them.
The Swift client is NOT enabled by default, you must explicitly give the credentials
-in the worker config:
+
The Swift storage client is enabled by default, and will try to automatically
+load credentials from the environment. Alternatively, you
+may explicitly set the credentials in the worker config:
The “BASIC” doesn’t include some fields such as stdout/err logs, because these fields may be potentially large.
In order to get everything, use the “FULL” view:
The configuration file structure has been refactored to simplify, remove large duplicated blocks and deep nesting. Most structures live at the root level now.
+
Added some basic config validation to catch misspelled or unknown fields, a common source of issues.
+
Added most config values to the available CLI flags.
+
+
+
Failure tolerance
+
+
+
Added retries with exponential backoff and jitter to database and RPC clients.
+
+
Databases
+
+
+
Added Google Cloud Datastore database backend.
+
+
Events
+
+
+
Added “task created” event type.
+
Added full support for writing events to Kafka.
+
Added storage of system log events.
+
+
Web dashboard
+
+
+
Display per-task system logs.
+
Tweaked display of large text fields such as stdout/err and input content.
+
+
Storage
+
+
+
Ensure all file handles are closed consistently.
+
Added retries to Swift storage.
+
Produce warning on empty directory download.
+
Better defaults for chunk size in Swift backend.
+
Added generic S3 (i.e. not Amazon S3) storage backend/client, based on the Minio client library.
+
Also added the ability to configure/enable multiple, separate S3 backends simultaneously.
+
Added HTTP storage backend, which currently supports read-only operations (write/put is not supported).
+
+
Removed the autoscaler code. This code was getting old and outdated, and nobody seemed to be using it. A fresh version will be rewritten in the future.
+
Lots of other bugfixes.
+
+
0.4.1
Date: Nov 16, 2017
diff --git a/docs/index.html b/docs/index.html
index a1912a494..fc6f522f3 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -90,7 +90,7 @@
"echo",
"hello world"
],
- "stdout": "/tmp/stdout"
}
],
"logs": [
diff --git a/docs/index.xml b/docs/index.xml
index 29e7ac1c4..ce9289079 100644
--- a/docs/index.xml
+++ b/docs/index.xml
@@ -16,19 +16,8 @@
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/compute/aws-batch/
- Amazon Batch This guide covers deploying a Funnel server that leverages DynamoDB for storage and Batch for task execution. You’ll need to set up several resources using either the Funnel CLI or through the provided Amazon web console.
-Create Required AWS Batch Resources For Funnel to execute tasks on Batch, you must define a Compute Environment, Job Queue and Job Definition. Additionally, you must define an IAM role for your Batch Job Definition.
-
-
-
- AWS S3
- https://ohsu-comp-bio.github.io/funnel/docs/storage/aws-s3/
- Mon, 01 Jan 0001 00:00:00 +0000
-
- https://ohsu-comp-bio.github.io/funnel/docs/storage/aws-s3/
- AWS S3 Funnel supports using AWS S3 for file storage.
-The S3 storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
-Worker: Storage: S3: Disabled: false AWS: # The maximum number of times that a request will be retried for failures. MaxRetries: 10 # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Example task { "name": "Hello world", "inputs": [{ "url": "s3://funnel-bucket/hello.
+ AWS Batch This guide covers deploying a Funnel server that leverages DynamoDB for storage and AWS Batch for task execution.
+Setup Get started by creating a compute environment, job queue and job definition using either the Funnel CLI or the AWS Batch web console. To manage the permissions of instanced AWS Batch jobs create a new IAM role. For the Funnel configuration outlined in this document, this role will need to provide read and write access to both S3 and DynamoDB.
@@ -61,6 +50,18 @@ $ export FUNNEL_SERVER_PASSWORD=abc123 $ funnel task list Known issues The basi
Databases
+
+ Datastore
+ https://ohsu-comp-bio.github.io/funnel/docs/databases/datastore/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/databases/datastore/
+ Google Cloud Datastore Funnel supports storing tasks (but not scheduler data) in Google Cloud Datastore.
+This implementation currently doesn’t work with Appengine, since Appengine places special requirements on the context of requests and requires a separate library.
+Two entity types are used, “Task” and “TaskPart” (for larger pieces of task content, such as stdout/err logs).
+Funnel will, by default, try to will try to automatically load credentials from the environment.
+
+
Deploying a cluster
https://ohsu-comp-bio.github.io/funnel/docs/compute/deployment/
@@ -86,7 +87,7 @@ A node is a service which runs on each machine in a cluster. The node connects t
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/download/
- Download linux [funnel-linux-amd64-0.4.1.tar.gz] mac [funnel-darwin-amd64-0.4.1.tar.gz] Windows is not supported (yet), sorry! Funnel is a single binary.
+ Download linux [funnel-linux-amd64-0.5.0.tar.gz] mac [funnel-darwin-amd64-0.5.0.tar.gz] Windows is not supported (yet), sorry! Funnel is a single binary.
Funnel requires Docker.
Funnel is beta quality. APIs might break, bugs exist, data might be lost.
Homebrew brew tap ohsu-comp-bio/formula brew install funnel Install the lastest development version optional In order to build the latest code, run:
@@ -101,7 +102,7 @@ $ go get github.com/ohsu-comp-bio/funnel Funnel requires Go 1.8+. Check out the
https://ohsu-comp-bio.github.io/funnel/docs/databases/dynamodb/DynamoDB Funnel supports storing task data in DynamoDB. Storing scheduler data is not supported currently, so using the node scheduler with DynamoDB won’t work. Using AWS Batch for compute scheduling may be a better option. Funnel will, by default, try to will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the config.
Available Config:
-Server: Database: dynamodb Databases: DynamoDB: # Basename to use for dynamodb tables TableBasename: "funnel" AWS: # AWS region Region: "us-west-2" # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Worker config Using DynamoDB with AWS Batch requires that the worker be configured to connect to the database:
+Database: dynamodb DynamoDB: # Basename to use for dynamodb tables TableBasename: "funnel" # AWS region Region: "us-west-2" # AWS Access key ID Key: "" # AWS Secret Access Key Secret: "" Known issues Dynamo does not store scheduler data.
@@ -110,10 +111,9 @@ Server: Database: dynamodb Databases: DynamoDB: # Basename to use for dynamodb t
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/databases/elasticsearch/
- Elasticsearch Funnel supports storing tasks and scheduler data in Elasticsearch.
+ Elasticsearch Funnel supports storing tasks and scheduler data in Elasticsearch.
Config:
-Server: Database: elastic Databases: Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200 Writing events from the worker The worker can be configured to write events directly to Elasticsearch, which avoids unnecessary RPC traffic to the Funnel server.
-Worker: ActiveEventWriters: - log - elastic EventWriters: Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200 Known issues We have an unpleasant duplication of config between the Worker and Server blocks.
+Database: elastic Elastic: # Prefix to use for indexes IndexPrefix: "funnel" URL: http://localhost:9200
@@ -124,7 +124,7 @@ Worker: ActiveEventWriters: - log - elastic EventWriters: Elastic: # Prefix to u
https://ohsu-comp-bio.github.io/funnel/docs/databases/boltdb/ Embedded By default, Funnel uses an embedded database named BoltDB to store task and scheduler data. This is great for development and a simple server without external dependencies, but it doesn’t scale well to larger clusters.
Available config:
-Server: Database: boltdb Databases: BoltDB: # Path to database file Path: ./funnel-work-dir/funnel.db
+Database: boltdb BoltDB: # Path to database file Path: ./funnel-work-dir/funnel.db
@@ -154,9 +154,8 @@ Server: Database: boltdb Databases: BoltDB: # Path to database file Path: ./funn
https://ohsu-comp-bio.github.io/funnel/docs/storage/google-storage/Google Storage Funnel supports using Google Storage (GS) for file storage.
-The GS client is NOT enabled by default, you must enabled it in the config:
-Worker: Storage: GS: # Automatically discover credentials from the environment. - FromEnv: true # Path to account credentials file. AccountFile: In the near future, Google Storage will be enabled by default. See issue #332.
-Example task { "name": "Hello world", "inputs": [{ "url": "gs://funnel-bucket/hello.
+The Google storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+GoogleStorage: Disabled: false # Path to account credentials file. AccountFile: "" Example task { "name": "Hello world", "inputs": [{ "url": "gs://funnel-bucket/hello.txt", "path": "/inputs/hello.txt" }], "outputs": [{ "url": "gs://funnel-bucket/output.txt", "path": "/outputs/hello-out.
@@ -167,7 +166,7 @@ Example task { "name": "Hello world", &q
https://ohsu-comp-bio.github.io/funnel/docs/compute/grid-engine/Grid Engine Funnel can be configured to submit workers to Grid Engine by making calls to qsub.
The Funnel server process needs to run on the same machine as the Grid Engine master. Configure Funnel to use Grid Engine by including the following config:
-Backend: gridengine Backends: GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {{.WorkDir}}/funnel-stdout #$ -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#$ -pe mpi %d" .Cpus}} {{- end}} {{if ne .
+Compute: gridengine GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {{.WorkDir}}/funnel-stdout #$ -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#$ -pe mpi %d" .Cpus}} {{- end}} {{if ne .
@@ -178,7 +177,18 @@ Backend: gridengine Backends: GridEngine: |#!/bin/bash #$ -N {{.TaskId}} #$ -o {
https://ohsu-comp-bio.github.io/funnel/docs/compute/htcondor/HTCondor Funnel can be configured to submit workers to HTCondor by making calls to condor_submit.
The Funnel server process needs to run on the same machine as the HTCondor master. Configure Funnel to use HTCondor by including the following config:
-Backend: htcondor Backends: HTCondor: |universe = vanilla getenv = True executable = {{.Executable}} arguments = worker run --config {{.Config}} --task-id {{.TaskId}} log = {{.WorkDir}}/condor-event-log error = {{.WorkDir}}/funnel-stderr output = {{.
+Compute: htcondor HTCondor: |universe = vanilla getenv = True executable = {{.Executable}} arguments = worker run --config {{.Config}} --task-id {{.TaskId}} log = {{.WorkDir}}/condor-event-log error = {{.WorkDir}}/funnel-stderr output = {{.WorkDir}}/funnel-stdout should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT {{if ne .
+
+
+
+ HTTP(S)
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/http/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/http/
+ HTTP(S) Funnel supports downloading files from public URLs via GET reqests. No authentication mechanism is allowed. This backend can be used to fetch objects from cloud storage providers exposed using presigned URLs.
+The HTTP storage client is enabled by default, but may be explicitly disabled in the worker config:
+HTTPStorage: Disabled: false # Timeout for http(s) GET requests. # In nanoseconds. Timeout: 60000000000 # 60 seconds Example task { "name": "Hello world", "inputs": [{ "url": "http://fakedomain.
@@ -187,8 +197,8 @@ Backend: htcondor Backends: HTCondor: |universe = vanilla getenv = True executab
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/events/kafka/
- Kafka Funnel supports writing task events to a Kafka topic. To use this, add an event writer to the worker config:
-Worker: ActiveEventWriters: - kafka - log - rpc EventWriters: Kafka: Servers: - localhost:9092 Topic: funnel-events
+ Kafka Funnel supports writing task events to a Kafka topic. To use this, add an event writer to the config:
+EventWriters: - kafka - log Kafka: Servers: - localhost:9092 Topic: funnel-events
@@ -200,7 +210,7 @@ Worker: ActiveEventWriters: - kafka - log - rpc EventWriters: Kafka: Servers: -
Local Funnel supports using the local filesystem for file storage.
Funnel limits which directories may be accessed, by default only allowing directories under the current working directory of the Funnel worker.
Config:
-Worker: Storage: Local: # Whitelist of local directory paths which Funnel is allowed to access. AllowedDirs: - ./ - /path/to/allowed/dir - ...etc Example task Files must be absolute paths in file:///path/to/file.txt URL form.
+LocalStorage: # Whitelist of local directory paths which Funnel is allowed to access. AllowedDirs: - ./ - /path/to/allowed/dir - ...etc Example task Files must be absolute paths in file:///path/to/file.txt URL form.
{ "name": "Hello world", "inputs": [{ "url": "file:///path/to/funnel-data/hello.
@@ -210,10 +220,9 @@ Worker: Storage: Local: # Whitelist of local directory paths which Funnel is all
Mon, 01 Jan 0001 00:00:00 +0000https://ohsu-comp-bio.github.io/funnel/docs/databases/mongodb/
- MongoDB Funnel supports storing tasks and scheduler data in MongoDB.
+ MongoDB Funnel supports storing tasks and scheduler data in MongoDB.
Config:
-Server: Database: mongodb Databases: MongoDB: # Addresses for the seed servers. Addrs: - "localhost" # Database name used within MongoDB to store funnel data. Database: "funnel" Username: "" Password: "" Writing events from the worker The worker can be configured to write events directly to Mongo, which avoids unnecessary RPC traffic to the Funnel server.
-Worker: ActiveEventWriters: - log - mongodb EventWriters: MongoDB: Addrs: - "localhost" Database: "funnel" Username: "" Password: "" Known issues We have an unpleasant duplication of config between the Worker and Server blocks.
+Database: mongodb MongoDB: # Addresses for the seed servers. Addrs: - "localhost" # Database name used within MongoDB to store funnel data. Database: "funnel" Username: "" Password: ""
@@ -223,9 +232,8 @@ Worker: ActiveEventWriters: - log - mongodb EventWriters: MongoDB: Addrs: - &
https://ohsu-comp-bio.github.io/funnel/docs/storage/swift/OpenStack Swift Funnel supports using OpenStack Swift for file storage.
-The Swift client is NOT enabled by default, you must explicitly give the credentials in the worker config:
-Worker: Storage: Swift: UserName: Password: AuthURL: TenantName: TenantID: RegionName: The config currently only supports OpenStack v2 auth. See issue #336.
-As always, if you set the password in this file, make sure you protect it appropriately. Alternatively, the Swift client can pull credentials from these environment variables: https://godoc.
+The Swift storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+Swift: Disabled: false UserName: "" Password: "" AuthURL: "" TenantName: "" TenantID: "" RegionName: "" Example task { "name": "Hello world", "inputs": [{ "url": "swift://funnel-bucket/hello.txt", "path": "/inputs/hello.txt" }], "outputs": [{ "url": "swift://funnel-bucket/output.
@@ -247,7 +255,18 @@ Tasks are accessed via the funnel task command.
https://ohsu-comp-bio.github.io/funnel/docs/compute/pbs-torque/PBS/Torque Funnel can be configured to submit workers to PBS/Torque by making calls to qsub.
The Funnel server process needs to run on the same machine as the PBS master. Configure Funnel to use PBS by including the following config:
-Backend: pbs Backends: PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}}/funnel-stdout #PBS -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#PBS -l nodes=1:ppn=%d" .Cpus}} {{- end}} {{if ne .RamGb 0.
+Compute: pbs PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}}/funnel-stdout #PBS -e {{.WorkDir}}/funnel-stderr {{if ne .Cpus 0 -}} {{printf "#PBS -l nodes=1:ppn=%d" .Cpus}} {{- end}} {{if ne .RamGb 0.0 -}} {{printf "#PBS -l mem=%.
+
+
+
+ S3
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/s3/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/s3/
+ S3 Amazon S3 Funnel supports using AWS S3 for file storage.
+The S3 storage client is enabled by default, and will try to automatically load credentials from the environment. Alternatively, you may explicitly set the credentials in the worker config:
+AmazonS3: Disabled: false # The maximum number of times that a request will be retried for failures. MaxRetries: 10 Key: "" Secret: "" Other S3 API Providers Funnel also supports using non-Amazon S3 API providers (Ceph, Cleversafe, Minio, etc.
@@ -268,7 +287,7 @@ Backend: pbs Backends: PBS: |#!/bin/bash #PBS -N {{.TaskId}} #PBS -o {{.WorkDir}
Slurm Funnel can be configured to submit workers to Slurm by making calls to sbatch.
The Funnel server process needs to run on the same machine as the Slurm master.
Configure Funnel to use Slurm by including the following config:
-Backend: slurm Backends: SLURM: |#!/bin/bash #SBATCH --job-name {{.TaskId}} #SBATCH --ntasks 1 #SBATCH --error {{.WorkDir}}/funnel-stderr #SBATCH --output {{.WorkDir}}/funnel-stdout {{if ne .Cpus 0 -}} {{printf "#SBATCH --cpus-per-task %d" .Cpus}} {{- end}} {{if ne .
+Compute: slurm Slurm: |#!/bin/bash #SBATCH --job-name {{.TaskId}} #SBATCH --ntasks 1 #SBATCH --error {{.WorkDir}}/funnel-stderr #SBATCH --output {{.WorkDir}}/funnel-stdout {{if ne .Cpus 0 -}} {{printf "#SBATCH --cpus-per-task %d" .Cpus}} {{- end}} {{if ne .
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 78e1db90c..ef7273af3 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -10,10 +10,6 @@
https://ohsu-comp-bio.github.io/funnel/docs/compute/aws-batch/
-
- https://ohsu-comp-bio.github.io/funnel/docs/storage/aws-s3/
-
-
https://ohsu-comp-bio.github.io/funnel/docs/security/basic/
@@ -31,6 +27,10 @@
https://ohsu-comp-bio.github.io/funnel/docs/databases/
+
+ https://ohsu-comp-bio.github.io/funnel/docs/databases/datastore/
+
+
https://ohsu-comp-bio.github.io/funnel/docs/compute/deployment/
@@ -80,6 +80,10 @@
https://ohsu-comp-bio.github.io/funnel/docs/compute/htcondor/
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/http/
+
+
https://ohsu-comp-bio.github.io/funnel/docs/events/kafka/
@@ -104,6 +108,10 @@
https://ohsu-comp-bio.github.io/funnel/docs/compute/pbs-torque/
+
+ https://ohsu-comp-bio.github.io/funnel/docs/storage/s3/
+
+
https://ohsu-comp-bio.github.io/funnel/docs/security/