utils

Introduction

The utils library houses modules for simplifying the experimental process.

File Input / Output

There are a few basic file io functions available:

`read_file(file_name)`	Read the contents of a text file into an array of strings.
`write_file(file_name, contents)`	Write a string (or alternatively an array of strings) to a text file.
`load_CSV(filename, delimiter = ',')`	Load a delimiter-separated-value file into a 2d array of strings. Note: The delimiter argument is optional.
`save_CSV(data, filename, delimiter = ',')`	Save a 2d array of items as a delimiter-separated-value file. Note: The delimiter argument is optional, and the data items will be converted to strings.

Additionally, the following function can be used to obtain a list of files in a directory (useful when running experiments with a benchmark set of examples):

get_file_list(dir_name, forbidden_list = None, match_list = None): Returns a list of files in the given directory subject to constraints.
dir_name: The path of the directory to locate files in.
forbidden_list: List of strings that, when matched to a filename, causes the file to be ignored. e.g. ['.svn', 'extra-directory', '.o', ...]
match_list: List of strings that the found files should have in their name. e.g. ['.foo', 'problem', ...]

Example

Say you have a directory foo/ with the following files: data1.csv, data2.csv, data3.csv, and readme.txt. Imagine you want to read each of the comma separated files in, and write them out as tab separated values, and display first 4 lines of the readme.txt file. The following code would achieve this:

from krrt.utils import read_file, load_CSV, save_CSV, get_file_list

#--- Load and print the first 4 lines of the readme.txt
readme_lines = read_file('foo/readme.txt')
print lines[:4]

#--- Locate all of the csv files
file_list = get_file_list('foo', forbidden_list = ['readme.txt'])
# Note: We could have used match_list=['.csv'] instead

#-- Iterate over each file
for file_name in file_list:
    #- Load the file as comma separated data
    data = load_CSV(file_name)

    #- Replace the .csv extension with .tsv
    new_file_name = file_name[:-4] + '.tsv'

    #- Write the file as tab separated data
    save_CSV(data, new_file_name, delimiter = "\t")

Running an Experiment

There is one main function used to simplify the setup of experimental evaluation: run_experiment. The function has a number of arguments, most of which are optional.

Arguments

base_directory: The base directory that the experiments should be run from. (default: ".")
base_command: The base command to be executed. This argument is mandatory.
single_arguments: A dictionary where the key is the name of an argument list (which is not included in the command), and the value is a list of arguments that should be used. For example if one (and only one) of flagA, flagB, and flagC should be included as a command-line option, then the key/value pair 'flags': ['flagA', 'flagB', 'flagC'] should be in the single_arguments dictionary. (default: None)
parameters: A dictionary where the key values are the command-line key name options, and the value is a list of command-line values for the associated key. For example, if the software being tested has -input <filename> as a command-line option then the dictionary would have an entry with the key '-input' and a value being a list of files for input. (default: None)
time_limit: The number of seconds the software should be permitted to run. (default: 15)
memory_limit: The number of megabytes the software should be limited to. (default: -1 (i.e. unlimited))
results_dir: Directory to store the output of each program execution. (default: "results")
progress_file: The file that should contain text indicating the progress of the experiment as a percentage. If None is passed in, standard output is used. (default: "/dev/null")
processors: The number of cores to be used simultaneously. (default: 1)

Results

The data structure returned by the run_experiment method tries to capture all of the information needed to filter results based on certain parameters. Returned is a ResultSet object that has the following functionality / attributes.

ResultSet

`res_set.size`	The number of results contained.
`res_set.get_ids()`	Returns a list of key's that can be used to select specific results.
`res_set[id]`	Returns a Result object associated with id.
`res_set.add_result(res)`	Adds a result object res to the ResultSet object.
`res_set.filter_parameter(param, value)`	Returns a ResultSet with only the results that match the param / value pair specified.
`res_set.filter_argument`	Returns a ResultSet with only the results that match the argument / value pair specified.
`res_set.filter(func)`	Returns a ResultSet with only the results that pass a user-defined function pointer, func.

Note: The parameter and argument filter functions are just syntactic sugar for the generic filter function.

Result

The Result object contains information corresponding to a single run of your experiment. Specifically it has the following attributes:

`result.id`	The id of the run (typically a number).
`result.command`	The full command executed.
`result.output_file`	The absolute path to the output captured from the command.
`result.single_args`	A dictionary mapping argument names to the value for this run.
`result.parameters`	A dictionary mapping parameter names to their setting for this run.
`result.runtime`	The runtime for this command to complete.
`result.timed_out`	A boolean value indicating whether or not this command timed out.

Example

from krrt.utils import run_experiment

# Run your program with different parameters, command-line arguments, etc
results = run_experiment(
    base_directory = '/path/to/command/',
    base_command = './command do_stuff',
    single_arguments = {
        'light_switch': ['-on', '-off'],
        'args': ['-arg1', '-arg2', '-arg3'],
        'flytype': ['-superfly', '']
      },
    parameters = {
        '-parameter_1': [5, 25, 100],
        '-parameter_2': [5, 25, 100],
        '-parameter_3': [.1, .25, .35]
      },
    time_limit = 900, # 15minute time limit (900 seconds)
    memory_limit = 1000, # 1gig memory limit (1000 megs)
    results_dir = "results",
    progress_file = None, # Print the progress to stdout
    processors = 8 # You've got 8 cores, right?
)

# (for whatever reason) Find all of the runs that had -superfly as an argument
superfly_results = results.filter_argument('flytype', '-superfly')

# Partition the results that didn't timeout into lists depending on -parameter_1
good_results = results.filter(lambda result: not result.timed_out, results)

p1_results = {}

for result in good_results:
    p1_results.setdefault(result.parameters['-parameter_1'], []).append(result)

# p1_results is now a dict with the keys '5', '25', and '100' and a list of
#  results corresponding to those values for -parameter_1

Parsing Output

The following functions are available for common parsing tasks that you may want to perform when building your experimental framework.

get_value

The get_value(file_name, regex, value_type = float) function is used to retrieve a single value from an output file.

Arguments

file_name: Path of the output file.
regex: Regex string that is used to match for the value. (e.g. .*size:(\d+).*)
value_type: (optional) Parameter to specify the type of the value (e.g. int)

Example

from krrt.utils import get_value

#--- Get the runtime from the file 'output' that is of the form "runtime:3.02sec"
runtime = get_value('output', '.*runtime:([0-9]+\.?[0-9]+)sec.*', float)

match_value

The match_value(file_name, regex) function is used to check if a regex appears inside a file anywhere.

Arguments

file_name: Path of the output file.
regex: Regex string that is used to match for the value. (e.g. .*Timeout.*)

Example

from krrt.utils import match_value

#--- Check if the file 'output' has the string "Timeout" inside of it.
timed_out = match_value('output', '.*Timeout.*')

get_lines

The get_lines(file_name, lower_bound = None, upper_bound = None) function is used to retrieve a contiguous sequence of lines from a file based on lines that surround the targeted text (non-inclusive). If lower_bound is not supplied, then all lines from the start of the file are included (similarly with upper_bound).

Arguments

file_name: Path of the output file.
lower_bound: (optional) Parameter for indicating the lower bounding line to match on.
upper_bound: (optional) Parameter for indicating the upper bounding line to match on.

Example

from krrt.utils import match_value

#--- Get the lines of the output file between the lines "start_results" and "end_results"
result_lines = get_lines('output', lower_bound = 'start_results', upper_bound = 'end_results')

Misc

Additionally the utils package provides the following functionality:

get_opts(): Returns a tuple (opts, flags) of command line parameters, where:
opts: Dictionary of options where the key is of the form -<option> and the value is just a string.
flags: List of strings that weren't part of an -<option> <value> pair.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils

Introduction

File Input / Output

Example

Running an Experiment

Arguments

Results

ResultSet

Result

Example

Parsing Output

get_value

Arguments

Example

match_value

Arguments

Example

get_lines

Arguments

Example

Misc

Clone this wiki locally