Skip to content

Building Apache Flume

aborkar-ibm edited this page Sep 14, 2022 · 14 revisions

Building Apache Flume

The instructions provided below specify the steps to build Apache Flume 1.8 on Linux on IBM Z for the following distributions:

  • RHEL (7.5, 7.6, 7.7, 8.0, 8.1)
  • Ubuntu (16.04, 18.04, 19.10)

General Notes:

  • When following the steps below please use a standard permission user unless otherwise specified. 90MB of local disk space is required.
  • A directory /<source_root>/ will be referred to in these instructions, this is a temporary writable directory anywhere you'd like to place it.

Step 1: Building Apache Flume

1.1. Install Dependencies

  • RHEL (7.5, 7.6, 7.7, 8.0, 8.1)

    • With IBM SDK:

      • RHEL (7.x, 8.1)

        sudo yum install tar wget git telnet java-1.8.0-ibm java-1.8.0-ibm-devel
      • RHEL 8.0

        sudo yum install -y tar wget gzip
        • Download IBM Java 8 SDK from here binary from IBM Java 8 and follow the instructions as per given in the link.
    • With OpenJDK:

      sudo yum install tar wget git telnet java-1.8.0-openjdk java-1.8.0-openjdk-devel 
    • With AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9)

      sudo yum install tar wget git telnet  
      • Download and install AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9) from here.
  • Ubuntu (16.04, 18.04, 19.10)

    • With IBM SDK:

      sudo apt-get update
      sudo apt-get install  maven protobuf-compiler tar wget git telnet
      • To install IBM Java 8, download IBM Java 8 sdk binary from IBM Java 8 and follow the instructions as per given in the link.
    • With OpenJDK:

      sudo apt-get update
      sudo apt-get install  maven protobuf-compiler tar wget git telnet openjdk-8-jdk
    • With AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9)

      sudo apt-get update
      sudo apt-get install  maven protobuf-compiler tar wget git telnet
      • Download and install AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9) from here.

1.2. Build and install Protobuf (Only For RHEL) by following the instructions here.

1.3. Install Maven (Only for RHEL)

cd /<source_root>/
wget https://archive.apache.org/dist/maven/maven-3/3.5.2/binaries/apache-maven-3.5.2-bin.tar.gz
tar -xvzf apache-maven-3.5.2-bin.tar.gz

1.4. Set Environment Variables

export JAVA_HOME=<path_to_java_installation>
export M2_HOME=/usr/share/maven (#For Ubuntu)
export M2_HOME=/<source_root>/apache-maven-3.5.2 (#For RHEL)
export PATH=$JAVA_HOME/bin:$PATH:$M2_HOME/bin

1.5 Checkout the Source

cd /<source_root>/
git clone https://github.com/apache/flume.git flume
cd flume
git checkout flume-1.8

Apache Flume build requires more memory than the default configuration.

export MAVEN_OPTS="-Xms1024m -Xmx1024m -XX:MaxPermSize=1024m"

Increase the maximum number of files that are allowed to be opened on the system by setting ulimit -n 4096.

1.6. Build the code

  • Build the code without running the tests (note: use mvn install, not mvn package, since we deploy Jenkins SNAPSHOT jars daily, and Flume is a multi-module project)
    mvn install -DskipTests -Drat.numUnapprovedLicenses=100

Step 2: Run Tests(Optional)

mvn install -fn -Drat.numUnapprovedLicenses=100

Note:
1. Failures of below modules can be ignored as they are seen on x86 also:
Flume Kafka Source
NG Core
NG Morphline Solr Sink Flume NG Integration Tests

2. NG file-based channel module may fail intermittently, it passes on rerun.

3. NG Spillable Memory channel module may also fail, it passes by changing surefire version to 2.21.0 in pom.xml.

4. NG Sinks modules may fail with error "failure to login", to resolve this issue:
export HADOOP_USER_NAME="hadoop"

Step 3: Verify the build

3.1. Create example.conf to set up a single node data streaming server

cd /<source_root>/flume/flume-ng-dist/target/apache-flume-1.8.0-bin/apache-flume-1.8.0-bin
vi example.conf

Input the following content to example.conf

  # Name the components on this agent
   a1.sources = r1
   a1.sinks = k1
   a1.channels = c1

  # Describe/configure the source
     a1.sources.r1.type = netcat
     a1.sources.r1.bind = localhost
     a1.sources.r1.port = 44444

  # Describe the sink
     a1.sinks.k1.type = logger

  # Use a channel which buffers events in memory
   a1.channels.c1.type = memory
   a1.channels.c1.capacity = 1000
   a1.channels.c1.transactionCapacity = 100

  # Bind the source and sink to the channel
   a1.sources.r1.channels = c1
   a1.sinks.k1.channel = c1

3.2. Run Flume NG

bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

3.3. Verify

Open another session and input the following command:

telnet 127.0.0.1 44444

It will start an interactive window. Typing "Hello World" in the interactive window will stream the message in the server console.

Reference

Flume User Guide: https://flume.apache.org/FlumeUserGuide.html

Apache in Confluence Community: https://cwiki.apache.org/confluence/display/FLUME/Getting+Started

Clone this wiki locally