Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fio: Add flags to capture latency, bandwidth and IOPS logs #1205

Merged
merged 1 commit into from
Nov 21, 2016

Conversation

meteorfox
Copy link
Contributor

@meteorfox meteorfox commented Nov 17, 2016

Add the ability to optionally collect the latency, bandwidth and IOPS logs
for a fio job. This also includes the flag fio_log_avg_msec that averages
the log entries over the given interval.

These log files produced by fio consist of the current format:

time, rate/latency, data direction, block size

where:

Time: in milliseconds.
Rate/latency: For bandwidth, this is in KB/sec. For latency, it's microseconds.
Data direction: 0 is read, 1 is write.
Block size: block size in bytes.

Since PKB needs to pull the logs from the VMs, we could not simply use the --fio_parameters flag. This PR add the ability to fetch those logs.

The generated logs are copied from the VMs into the local machine under the generated temporary directory /tmp/pkb/runs/<run_uri>.

Using these logs more detail graphs such as histograms, or time series can be produced. Example
image

Note: The graph above was generated using R and ggplot2 as follows:

pkb_iops <- read.csv(file = 'Downloads/pkb_fio_avg_iops.log',header = FALSE)
pkb_iops <- data.frame(time_ms = pkb_iops$V1, iops = pkb_iops$V2, dir = factor(pkb_iops$V3, levels=c(0, 1), labels=c("read", "write")), blocksize = pkb_iops$V4)
p <- ggplot(data=pkb_iops, aes(x=time_ms, y=iops, group=dir, color=dir)) + geom_line() + geom_point() + ylim(0, 500) + ggtitle("Random Read 4K")

Add the ability to optionally collect the latency, bandwidth and IOPS logs
for a fio job. This is also include the flag fio_log_avg_msec that averages
the log entries over the given interval.
@meteorfox
Copy link
Contributor Author

FYI... This has not been tested on Windows. Can someone verify if it works on Windows?

@meteorfox
Copy link
Contributor Author

@noahl PTAL, or please recommend someone that might be able to review this. Thanks!

@noahl
Copy link
Contributor

noahl commented Nov 18, 2016

@meteorfox great patch! Unfortunately I am moving on from spending most of my time on PKB. I think @tedsta is the right person to review this now.

@@ -460,6 +489,9 @@ def Run(benchmark_spec):
stdout, stderr = vm.RobustRemoteCommand(fio_command, should_log=True)
samples = fio.ParseResults(job_file_string, json.loads(stdout))

if collect_logs:
vm.PullFile(vm_util.GetTempDir(), '%s_*.log' % PKB_FIO_LOG_FILE_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird to be getting the tmp dir from two different sources. Suggest using vm_util.VM_TMP_DIR as above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I'll make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While going over your suggestion, I remembered now why I didn't use vm_util.VM_TMP_DIR. I want to store the collected logs under /tmp/pkb/runs/<run_uri>/ so they are grouped with the results for that run, vm_util.VM_TMP_DIR is the path to just /tmp/pkb/ directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything passed to fio telling to put the logs there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Currently, it defaults to the working directory of where fio is called from. Then vm.PullFile(...) essentially does an scp for files ending with _*.log from fio's working directory to the local directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see.

'Whether to collect a bandwidth log of the fio jobs.')
flags.DEFINE_boolean('fio_iops_log', False,
'Whether to collect an IOPS log of the fio jobs.')
flags.DEFINE_integer('fio_log_avg_msec', 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fio's --log_avg_msec defaults to 0. Why do we default to 1000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, it will log data for every IO completion which can create a huge file. I thought 1 second, was a sensible default, so it aggregates the metrics per 1 second interval.

If you feel, we should use the same as fio as to not "surprise" the user, I can make the change.

(FLAGS.fio_iops_log, '--write_iops_log=%(filename)s',),
(collect_logs, '--log_avg_msec=%(interval)d',)]
fio_command_flags = ' '.join([flag for given, flag in fio_log_flags if given])
return fio_command_flags % {'filename': PKB_FIO_LOG_FILE_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want all these logs written to the same file? FIO can have concurrent jobs, so does FIO ensure that writes to any one of these logs is the only write to these logs occurring at any given time? (atomic)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see below that these are supposed to be different file names. After staring at this code for several minutes it still looks like they all get the same file name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the argument we pass to the --write_(iops|lat|bw)_log flags, it's used as a prefix for the log name.

For each of those, fio will create the log files using this format:

<name>_<log_type>.<job_number>.log

Some examples:

pkb_fio_bw.1.log - Bandwidth log for job 1
pkb_fio_clat.1.log - Completed IO latency log for job 1
pkb_fio_slat.1.log - Submitted IO latency log for job 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok thanks for clarifying

@tedsta tedsta merged commit 41ea18b into GoogleCloudPlatform:master Nov 21, 2016
@meteorfox meteorfox deleted the capture_fio_io_logs branch November 22, 2016 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants