Getting Started

Note: this is for HiBench 5.0

System setup.

(1) Setup JDK, Hadoop-YARN, Spark runtime environment properly.

(2) For HiBench V4.0 and later, python 2.x(>=2.6) is required.

(3) Download/checkout HiBench benchmark suite

(4) Build HiBench with Maven. Please specify Spark version and Map Reduce version. For example, for Spark 1.5 and MR2, run
```
  cd src
  mvn clean package -D spark1.5 -D MR2
```
Optionally you can run <HiBench_Root>/bin/build-all.sh to build HiBench for all known Spark and MR versions.
HiBench Configurations.

For minimum requirements: create & edit conf/99-user_defined_properties.conf：
```
   cd conf
   cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
```
And Make sure below properties has been set:
```
   hibench.hadoop.home      The Hadoop installation location
   hibench.spark.home       The Spark installation location
   hibench.hdfs.master      HDFS master
   hibench.spark.master     SPARK master
```
Note: For YARN mode, set hibench.spark.master to yarn-client. (yarn-cluster is not supported yet)

If the spark and hadoop version is not auto probed correctly, please set hibench.hadoop.executable, hibench.hadoop.version and hibench.spark.version in 99-user_defined_properties.conf.

To run HiBench on HDP, please specify hibench.hadoop.mapreduce.home to the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specify hibench.hadoop.release to "hdp".
Run. For example, to run a single workload wordcount on Spark.
```
    workloads/wordcount/prepare/prepare.sh
    workloads/wordcount/spark/scala/bin/run.sh
```
You can also try <HiBench_Root>/bin/run-all.sh to run all workloads. Note: The same configuration may not work for all workloads.
View the report:

Goto <HiBench_Root>/report to check for the final report:
- report/hibench.report: Overall report about all workloads.
- report/<workload>/<language APIs>/bench.log: Raw logs on client side.
- report/<workload>/<language APIs>/monitor.html: System utilization monitor results.
- report/<workload>/<language APIs>/conf/<workload>.conf: Generated environment variable configurations for this workload.
- report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf: Generated configuration for this workloads, which is used for mapping to environment variable.
- report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf: Generated configuration for spark.
[Optional] Execute <HiBench root>/bin/report_gen_plot.py report/hibench.report to generate report figures.

Note: report_gen_plot.py requires python2.x and python-matplotlib.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Getting Started

Clone this wiki locally