Skip to content
Carson Wang edited this page Nov 9, 2016 · 5 revisions

Getting Started

Note: this is for HiBench 5.0

  1. System setup.

    (1) Setup JDK, Hadoop-YARN, Spark runtime environment properly.

    (2) For HiBench V4.0 and later, python 2.x(>=2.6) is required.

    (3) Download/checkout HiBench benchmark suite

    (4) Build HiBench with Maven. Please specify Spark version and Map Reduce version. For example, for Spark 1.5 and MR2, run

      cd src
      mvn clean package -D spark1.5 -D MR2
    

    Optionally you can run <HiBench_Root>/bin/build-all.sh to build HiBench for all known Spark and MR versions.

  2. HiBench Configurations.

    For minimum requirements: create & edit conf/99-user_defined_properties.conf

       cd conf
       cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf
    

    And Make sure below properties has been set:

       hibench.hadoop.home      The Hadoop installation location
       hibench.spark.home       The Spark installation location
       hibench.hdfs.master      HDFS master
       hibench.spark.master     SPARK master
    

    Note: For YARN mode, set hibench.spark.master to yarn-client. (yarn-cluster is not supported yet)

    If the spark and hadoop version is not auto probed correctly, please set hibench.hadoop.executable, hibench.hadoop.version and hibench.spark.version in 99-user_defined_properties.conf.

    To run HiBench on HDP, please specify hibench.hadoop.mapreduce.home to the mapreduce home, normally it should be "/usr/hdp/current/hadoop-mapreduce-client". Also please specify hibench.hadoop.release to "hdp".

  3. Run. For example, to run a single workload wordcount on Spark.

        workloads/wordcount/prepare/prepare.sh
        workloads/wordcount/spark/scala/bin/run.sh
    

    You can also try <HiBench_Root>/bin/run-all.sh to run all workloads. Note: The same configuration may not work for all workloads.

  4. View the report:

    Goto <HiBench_Root>/report to check for the final report:

    • report/hibench.report: Overall report about all workloads.
    • report/<workload>/<language APIs>/bench.log: Raw logs on client side.
    • report/<workload>/<language APIs>/monitor.html: System utilization monitor results.
    • report/<workload>/<language APIs>/conf/<workload>.conf: Generated environment variable configurations for this workload.
    • report/<workload>/<language APIs>/conf/sparkbench/<workload>/sparkbench.conf: Generated configuration for this workloads, which is used for mapping to environment variable.
    • report/<workload>/<language APIs>/conf/sparkbench/<workload>/spark.conf: Generated configuration for spark.

    [Optional] Execute <HiBench root>/bin/report_gen_plot.py report/hibench.report to generate report figures.

    Note: report_gen_plot.py requires python2.x and python-matplotlib.


Clone this wiki locally