Storm Benchmark

How do we measure storm performance

The benchmark set contains 9 workloads. They fall into two categories. The first category is "simple resource benchmark", the goal is to test how storm performs under pressure of certain resource. The second category is to measure how storm performs in real-life typical use cases.

Simple resource benchmarks:
- wordcount, CPU sensitive
- sol, network sensitive
- rollingsort, memory sensitive
Typical use-case benchmark:
- rollingcount
- trident
- uniquevisitor
- pageview
- grep
- dataclean
- drpc

In real-life use cases, Kafka is often used for data ingestion. To acccount for that, most use-case benchmarks read data from Kafka and they could be categorized by the corresponding data generators:

data generated by FileReadKafkaProducer
- dataclean
- drpc
- pageview
- uniquevisitor
data generated by PageViewKafkaProducer
- grep
- trident

The data generators are already provided and they are Storm applications as well.

How to use

We assume a Storm cluster is already set up locally.

Build.

First, build storm-benchmark.

  git clone https://github.com/manuzhang/storm-benchmark.git
  mvn package

Run. We use SOL as an example.

  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/sol.yaml -c topology.workers=2 storm.benchmark.tools.Runner storm.benchmark.benchmarks.SOL

-storm directs stormbench to look for the storm command
-jar sets the benchmark jar with all the dependencies in
-conf is for user to provide a yaml conf file like storm/conf/storm.yaml. Check the storm-benchmark/conf folder where conf files are already provided for existing benchmarks
-c allows user to set conf through command line without modifying conf files every time

Check. The benchmark results will be stored at config path METRICS_PATH(default is: reports). It contains throughput data and latency of the whole cluster.

The result of SOL contains two files

1. `SOL_metrics_1402148415021.csv`. Performance data.
2. `SOL_metrics_1402148415021.yaml`. The config used to run this test.

How to run a benchmark ingesting data from Kafka

We assume Storm and Kafka have been set up locally. (No need to create Kafka topic beforehand, which could be auto created when the producer sends messages to Kafka). Also, assume Storm Benchmark has been built successfully.

Here's how we run uniquevisitor, for instance.

Launch PageViewKafkaProducer.

  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/pageview_producer.yaml storm.benchmark.tools.Runner storm.benchmark.tools.producer.kafka.PageViewKafkaProducer

Launch UniqueVisitor.

  bin/stormbench -storm ${STORM_HOME}/bin/storm -jar ./target/storm-benchmark-${VERSION}-jar-with-dependencies.jar -conf ./conf/uniquevisitor.yaml storm.benchmark.tools.Runner storm.benchmark.benchmarks.UniqueVisitor

Then, we could check the metrics data as in the previous section.

Supports

Please contact:

Manu Zhang: tianlun.zhang@intel.com
Sean Zhong: xiang.zhong@intel.com

Acknowledgement

We use the SOL benchmark code(https://github.com/yahoo/storm-perf-test) from yahoo. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Storm Benchmark

How do we measure storm performance

How to use

How to run a benchmark ingesting data from Kafka

Supports

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Storm Benchmark

How do we measure storm performance

How to use

How to run a benchmark ingesting data from Kafka

Supports

Acknowledgement