-
Notifications
You must be signed in to change notification settings - Fork 110
Variables
Drake lets you define and use variables throughout your workflow.
Here's a simple example of defining and then using a variable:
MYVAR=some_value
out.txt <- in.txt
echo $[MYVAR]
The first line defines the value of MYVAR. The $[MYVAR] syntax tells Drake to substitute the value of MYVAR before interpreting the rest of the step.
There is also a conditional definition, :=
, for variables. For example:
MYVAR:=some_value
The :=
operator tells Drake to only set the value of MYVAR to some_value if MYVAR is not already defined. (We'll talk later about ways to define variables outside of the workflow.)
Here's another example, demonstrating a simple profiles concept:
PROFILE:=default_profile
%include $[PROFILE]
out.txt <- $[INFILE]
echo $[MYVAR]
The above example assumes we've organized a set of variables into profile files, such as a file called default_profile
. The workflow uses %include
to pull that file in, thereby allowing that file to define its custom values for our variables. Our workflow then makes use of those variables, such as INFILE
and MYVAR
.
An example default_profile
file would be like:
INFILE=in.json
MYVAR=some_other_value
This approach is handy when you have a set of variables in your workflow that need to change based on well known scenarios. You can organize each scenario into a set of profiles. Then you can simply specify the appropriate profile before you run the workflow. (See below for details on specifying variables outside the worklow.)
There are two ways to pass variables to a Drake workflow:
When Drake runs a workflow, it will pull in variable values from the environment. For example, if you've defined the environment variable PROFILE
like this...
export PROFILE=profile_A
... your workflow can refer to that variable as $[PROFILE]. You just need to be sure you've done the export before you run your workflow.
You can set variables when you run drake on the command line by using the --vars
switch, then supplying a comma delimited list of variable names and values separated by =. For example:
drake --vars "PROFILE=profile_A,MYVAR=my-value,ANOTHERVAR=aValue" -w mywork.Drakefile
This approach takes precedence over any variables set up in the shell environment, but is superseded by any variable definitions within the workflow itself.
This approach is severely limited by syntax considerations. Your variable values cannot contain syntax that would confuse the command line, such as an =
symbol. Variable values with a comma in them must be within double quotes. For example:
drake --vars "PROFILE=profile_A,MYVAR=\"my,value\",ANOTHERVAR=aValue" -w mywork.Drakefile
Before running shell steps, Drake makes sure to load the shell environment with any workflow variables that have been defined. This means your shell commands can refer to those variables with the traditional $ shell syntax. For example:
MYVAR=some_value
out.txt <-
echo $MYVAR > $OUTPUT
A big difference here is that Drake is not doing the substituting. Drake is only prepping the shell environment with MYVAR. When the echo statement runs, it literally runs on the shell as echo $MYVAR
, and it's the shell that does the substitution.
You can invoke a step in your workflow with a pre-assigned variable as the step's output. For example, let's say you have the following Drakefile:
my_data="hdfs:/path/to/data"
$[my_data] <-
echo "some output" > $OUTPUT
If you then want to run the step with output to hdfs:/path/to/data
, you can run...
drake $my_data
... which will figure out the step you want to run by evaluating the variable inside of the Drakefile.
If Drake encounters a variable reference such as $[MYVAR]
in the workflow, that variable must be defined by that point in the workflow or else Drake will error out with a message telling you that MYVAR is not defined.
Drake does not provide any checking for variables you refer to in your shell commands using the $
syntax. If you refer to a variable in such a way and it is not defined when the workflow runs, the behaviour depends on shell.
If you want to make a variable definition optional for the user, it's handy to use the :=
operator to conditionally define the variable early in the workflow, setting it to the preferred default. This way, the user can override the default only if they need to.
Variable definition precedence, in ascending order:
- environment
- --vars
- inline with the workflow
Variables can be re-defined in the workflow. For example:
;; some steps. MYVAR will be loaded from the environment, or
;; the --vars option, or undefined
MYVAR=value1
;; some more steps; MYVAR will be value1
MYVAR=value2
;; some more steps; MYVAR will be value2 ...
Please see the "Variables" section of the official spec for more details on the use of variables with Drake.