Skip to content
yoshimotob edited this page Jan 7, 2014 · 10 revisions

Variables in a workflow

Drake lets you define and use variables throughout your workflow.

Here's a simple example of defining and then using a variable:

MYVAR=some_value

out.txt <- in.txt
  echo $[MYVAR]

The first line defines the value of MYVAR. The $[MYVAR] syntax tells Drake to substitute the value of MYVAR before interpreting the rest of the step.

There is also a conditional definition, :=, for variables. For example:

MYVAR:=some_value

The := operator tells Drake to only set the value of MYVAR to some_value if MYVAR is not already defined. (We'll talk later about ways to define variables outside of the workflow.)

Here's another example, demonstrating a simple profiles concept:

PROFILE:=default_profile
%include $[PROFILE]

out.txt <- $[INFILE]
  echo $[MYVAR]

The above example assumes we've organized a set of variables into profile files, such as a file called default_profile. The workflow uses %include to pull that file in, thereby allowing that file to define its custom values for our variables. Our workflow then makes use of those variables, such as INFILE and MYVAR.

An example default_profile file would be like:

INFILE=in.json
MYVAR=some_other_value

This approach is handy when you have a set of variables in your workflow that need to change based on well known scenarios. You can organize each scenario into a set of profiles. Then you can simply specify the appropriate profile before you run the workflow. (See below for details on specifying variables outside the worklow.)

Defining variables outside the workflow

There are two ways to pass variables to a Drake workflow:

Way #1: Define environment variables before running the workflow

When Drake runs a workflow, it will pull in variable values from the environment. For example, if you've defined the environment variable PROFILE like this...

export PROFILE=profile_A

... your workflow can refer to that variable as $[PROFILE]. You just need to be sure you've done the export before you run your workflow.

Way #2: Set variables via the --vars command line option

You can set variables when you run drake on the command line by using the --vars switch, then supplying a comma delimited list of variable names and values separated by =. For example:

drake --vars "PROFILE=profile_A,MYVAR=my-value,ANOTHERVAR=aValue" -w mywork.Drakefile

This approach takes precedence over any variables set up in the shell environment, but is superseded by any variable definitions within the workflow itself.

This approach is severely limited by syntax considerations. Your variable values cannot contain syntax that would confuse the command line, such as an = symbol. Variable values with a comma in them must be within double quotes. For example:

drake --vars "PROFILE=profile_A,MYVAR=\"my,value\",ANOTHERVAR=aValue" -w mywork.Drakefile

Using the $ syntax in shell commands

Before running shell steps, Drake makes sure to load the shell environment with any workflow variables that have been defined. This means your shell commands can refer to those variables with the traditional $ shell syntax. For example:

MYVAR=some_value

out.txt <-
  echo $MYVAR > $OUTPUT

A big difference here is that Drake is not doing the substituting. Drake is only prepping the shell environment with MYVAR. When the echo statement runs, it literally runs on the shell as echo $MYVAR, and it's the shell that does the substitution.

Invoke a step via a variable name

You can invoke a step in your workflow with a pre-assigned variable as the step's output. For example, let's say you have the following Drakefile:

my_data="hdfs:/path/to/data"

$[my_data] <-
     echo "some output" > $OUTPUT

If you then want to run the step with output to hdfs:/path/to/data, you can run...

drake $my_data

... which will figure out the step you want to run by evaluating the variable inside of the Drakefile.

Requiring variables to be defined

If Drake encounters a variable reference such as $[MYVAR] in the workflow, that variable must be defined by that point in the workflow or else Drake will error out with a message telling you that MYVAR is not defined.

Drake does not provide any checking for variables you refer to in your shell commands using the $ syntax. If you refer to a variable in such a way and it is not defined when the workflow runs, the behaviour depends on shell.

If you want to make a variable definition optional for the user, it's handy to use the := operator to conditionally define the variable early in the workflow, setting it to the preferred default. This way, the user can override the default only if they need to.

Precedence and order

Variable definition precedence, in ascending order:

  1. environment
  2. --vars
  3. inline with the workflow

Variables can be re-defined in the workflow. For example:

;; some steps. MYVAR will be loaded from the environment, or
;; the --vars option, or undefined

MYVAR=value1

;; some more steps; MYVAR will be value1

MYVAR=value2

;; some more steps; MYVAR will be value2 ...

Further reading

Please see the "Variables" section of the official spec for more details on the use of variables with Drake.