Installing and Running

Table of Contents Prerequisites CUDA CUDNN Java Maven Installing on MacOSX Yosemite or later Installing the Latest Snapshot Installing a Particular Version with Maven BIDMach as a Maven Dependency Compiling with sbt Installing a Release Bundle Using (or Disabling) Multiple GPUs Running Remotely in Windows Developing Compiling Native GPU Libraries Compiling Native CPU Libraries Compiling CPU Utilities Using Eclipse UTF-8 Encoding

Prerequisites

To benefit from the native libraries you need a compatible CPU/OS and GPU. CPU native libraries were built for 64-bit IA (Intel Architecture) machines using the Intel Compiler on Windows, Linux and Mac. To use GPU acceleration you need an NVIDIA Fermi or later architecture (GTX 500 or higher) that supports CUDA. A number of advanced kernels only run on Kepler (GTX 600 or higher). We have tested BIDMach on

Windows 7, 8, 8.1 and 10 64-bit
Linux Redhat Enterprise 6.2, Amazon Linux and Ubuntu 14 and 16
Mac OS 10.5 - 10.12

CUDA

Make sure a supported version of CUDA is installed on the machine (9.2 now), and that the CUDA runtime libraries are in the library path

LD_LIBRARY_PATH on Linux
%PATH% on Windows
On Mac OSX, link ~/Library/Java/Extensions to point to the CUDA library path

The library paths depend on where the CUDA toolkit was installed, but on Linux the default it /usr/local/cuda/lib etc.

To build the libraries, you also need the CUDA compiler nvcc in your $PATH. $PATH should contain the cuda binary path for the version of CUDA you plan to use, e.g. /usr/local/cuda-9.2/bin. To check this, do

nvcc --version

which will print out the CUDA version for the compiler.

CUDNN

To use GPU-accelerated deep learning kernels, you should install CUDNN 7.6 for CUDA 9.2.

Java

You will need a version of Java 8 JDK, either the Oracle JDK, or OpenJDK on Linux.

Note that you dont need Scala installed on your machine. Maven installs the dependencies in BIDMach's libraries and the bidmach script loads IScala.

Maven

BIDMach is built with maven 3.x

Installing on MacOSX Yosemite or later

Due to the additional security features on recent Mac OSes, its no longer possible to pass the path to the CUDA libraries through DYLD_LIBRARY_PATH. The library path is hard-wired and cannot be changed. The easiest way to add the libraries is to make a symlink from one of the (unused) library extensions. i.e.

mkdir ~/Library/Java
ln -s /usr/local/cuda/lib ~/Library/Java/Extensions

Installing the Latest Snapshot

You can install the latest snapshot of BIDMach by doing:

git clone https://github.com/BIDData/BIDMach
cd BIDMach
mvn clean install

You can build incrementally without the "clean", but you should include it whenever you pull from the repository since it will remove old jars. Make sure that your $PATH includes the bin directory of the CUDA version you want to use. Then you can start BIDMach with

./bidmach

will start a BIDMach interpreter. If you have Ipython/Jupyter installed, you can also run bidmach inside a notebook by doing:

./bidmach notebook

Installing a Particular Version with Maven

The BIDMach jar can be built and run with several versions of CUDA under Maven. First download the latest native and dependent libraries with

git clone https://github.com/BIDData/BIDMach
cd BIDMach

Make sure you have a Maven version >=3.0. You can build BIDMach for a specific CUDA version by setting those properties inside the pom file. Edit it for the BIDMat and CUDA/JCUDA version you want to use. Use the default value for CUDA 8.0. For older versions, set the version in pom.xml to "1.1.0-cuda7.0" or "1.1.0-cuda7.5". Then run

mvn clean install

To start an interpreter do either:

./bidmach

or

mvn scala:console

NOTE: There are problems with the Scala Maven plugin and Cygwin, and the last command probably wont work from a Cygwin Terminal. If you're on Windows, open a Windows command shell and run the same command.

NOTE: Its possible that the latest BIDMach source snapshot may be incompatible with the last BIDMat release on Bintray which Maven uses, leading to compiler errors above. To fix this, go to a suitable source directory and do

git clone https://github.com/BIDData/BIDMat
cd BIDMat
mvn install

which will install an updated copy of the main BIDMat jar in your local maven repository.

BIDMach as a Maven Dependency

You can modify the source code to customize your own version of BIDMach, although of course its better to include an unmodified bidmach dependency in your maven projects. The appropriate dependency code is in bidmach.pom. It will look something like this:

  <repositories>
    <repository>
      <id>bintray-biddata-BIDData</id>
      <name>bintray</name>
      <url>http://dl.bintray.com/biddata/BIDData</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
  </repositories>
  <dependencies>
    <dependency>
      <groupId>BIDMach</groupid>
      <artifactId>BIDMach</artifactid>
      <version>2.0.10-cuda8.0beta</version>
      <type>pom</type>
    </dependency> 
  </dependencies>

Compiling with sbt

Maven is the easiest tool to use when working with releases, and it will correctly download the right native libraries for your platform. We include build files for Scala's sbt (simple build tool), but you will first have to compile with maven in order to pull the correct library dependencies. i.e. do mvn clean install. Then from the BIDMach directory, do

./sbt package

you can then run this package with "./sbt console".

Installing a Release Bundle

The executable bundles for BIDMach are available from the main project Blog site here. The only prerequisites for the full bundles are a Java 7 or Java 8 runtime. The "thin" bundles also require an installation of the CUDA SDK for GPU use.

To install, unzip the bundle (on windows) or unpack on Linux (use tar xvzf <from_file>). The main directory contains an executable called "bidmach", which can be run from either a native shell on linux, or a cygwin shell on windows. If you dont have cygwin installed, you can still call ".\bidmach" from a windows command shell and it will invoke a windows command script: bidmach.cmd.

Using (or Disabling) Multiple GPUs

BIDMach is designed to use coarse parallelism over multiple GPUs when they are available. The various flavors of the "learnPar" method in the BIDMach.Learner class take care of the book-keeping. You can also control which GPU is used by default to run non-parallel routines with

setGPU(i)

where i is an integer index for the GPU you want to use.

CUDA numbers the available GPUs from 0,...,k. It does not distinguish them based on performance. This causes problems e.g. if you have installed a high-end GPU for compute work, but still have a CUDA-capable low-end graphics card in the computer. You can however control which GPUs are available to use within CUDA with the shell command:

export CUDA_VISIBLE_DEVICES=1,2

which specifies that CUDA should only use GPU numbers 1 and 2 (not zero). Do this before you start BIDMach. You will have to determine which GPU is which. If you have an up-to-date NVIDIA driver installed, you can query the devices with the <NVIDIA_DRIVER_DIR>\NVSMI\nvidia-smi command (Windows) or nvida-smi (should be in /usr/bin in Linux). UPDATE: nvidia-smi seems to give inconsistent numbers. You may have to check the GPU identity from within BIDmach. Calling "GPUmem" will tell you the amount of memory for the current GPU which is usually a giveaway to what model it is.

Running Remotely in Windows

CUDA works seamlessly from remote machines in Linux. Using CUDA remotely with Windows can be more challenging. High-end GPUs (Tesla series) run in "TCC" mode in windows, which means that they are running outside the graphics system. Other GPUs can only run in "WDM" mode, which means that are using graphics drivers and are part of the graphics system. Remote desktop disables access to the graphics system on a remote machine, which means that you cannot use CUDA with a commodity GPU on a machine you are using with Remote Desktop. This is not a problem with Tesla devices. You can use another remote desktop technology to access a commodity GPU, and specifically most flavors of VNC (Virtual Network Computing). We use RealVNC for our own work on remote machines.

Finally, Commodity NVIDIA GPUs do an energy-saving shutdown if they are not connected to a monitor, independent of whether a compute process is trying to use them. This is true for Windows, Linux or MacOS. So commodity devices must always be connected to a physical monitor, or to a dummy load, to be used by CUDA. You can build dummy loads with a few pennies of hardware following these directions

Developing

We are moving to Maven as the preferred development/deployment tool. Use the commands under "Installing with Maven" above to set up build directory.

Compiling Native GPU Libraries

BIDMach includes some native CPU and GPU code in /BIDMach/jni/src. Its integral to Random Forests, Gibbs LDA, GLM and Neural Networks on the GPU. To compile it you will need the matching CUDA SDK (currently 7.0 or 7.5) and a compiler supported by CUDA. These are listed on the CUDA SDK site. There is a simple configure script in that directory, and in most cases you can do:

./configure
make
make installcudalib

make install places a shared library in the src/main/resources/lib subdirectory of your BIDMach tree. If this doesnt work, check the configure script for a missing or erroneous shell variable.

From there, you can include that native code by running either sbt or maven. For sbt, simply do:

./sbt package

which will include the native libs in the BIDMach jar that it creates. For maven, you have to indicate with some properties that you want to include local native libs rather than those from the remote repository.

mvn -Dcpu -Dgpu install

There are two native libraries, one for CPU code and one for GPU code. If you only need to build one of them, use only one flag in the command above.

We can't provide much support for developing your own GPU code, but it is not difficult. You can see from BIDMach's own kernels it involves a CUDA main C function in a .cu file in BIDMach/jni/src, a JNI wrapper in BIDMACH_CUMACH.cpp, and a native function declaration in Java in BIDMach/src/main/java/edu/berkeley/bid/CUMACH.java. Arrays are passed from Java to C using JCuda's Pointer class.

Compiling Native CPU Libraries

BIDMach includes a number of CPU-accelerated routines that rely on Intel's compiler and the Intel Math Kernel Library (MKL). While it would be great to use free compilers, unfortunately only Intel's build toolchain includes acceleration of matrix operations, transcendental functions, random number generation, sparse matrix operations, and portable threading (opemMP) all of which are critical for performance. You therefore need a version of e.g. Intel's "C++ Composer XE" for your platform. Then do

> ./configure
> make
> make installcpulib

make install just copies the libraries into BIDMat/src/main/resources/lib. We rarely add to these, so you'll probably find the pre-compiled libs in the bundles for BIDMach are enough. If you are going to build both CPU and GPU libraries, do:

> ./configure
> make
> make install

Compiling CPU Utilities

BIDMach includes a few C routines for pre-processing large data files without having to start a Scala instance. C text processing is also typically faster than java/scala, which makes a big difference on large data. The C source files are in BIDMach/src/main/C/newparse. Their use is discussed in the Data Wrangling section of the docs. To compile, you should be able to do:

> ./configure
> make
> make install

If you running Cygwin on a Windows machine, by default configure will prepare for compilation using gcc. To use a visual studio compiler instead, do ./configure win. make install just copies the executables into BIDMach/cbin.

Using Eclipse

Eclipse has good scala integration and you can download a bundle with Eclipse and the scala plugin preinstalled from here. You should receive .project and .classpath files when you fetch BIDMach from github, and these define the main project settings you need for eclipse. But you will also need to connect Eclipse to the native libraries. There are two ways to do this. First you can set the "VM arguments" in your run configuration something like this:

-Xmx12G
-Xms128M
-Dfile.encoding=UTF-8
-Djava.library.path=c:/code/BIDMach/lib;C:/PROGRA~1/NVIDIA~2/CUDA/v7.5/bin

the library path entry should point to both the library subdirectory of BIDMach, and the CUDA shared library path.

Secondly you can add the shared library paths to the "Java Build Path" for the project. Under Project settings, select "Java Build Path", then "Source", and expand either the java or scala package. You will see an entry for "Native Library Location". Point it to either the BIDMach lib directory or the CUDA bin directory. Do the same for the other package so that both library path entries are in one of the packages.

UTF-8 Encoding

BIDMach uses UTF-8 encoding in order to be able to use math characters as operators. You have to set the UTF-8 encoding for editors in Eclipse, which is under windows->Preferences->General->Workspace->Text Encoding. You will need UTF-8 support in your sbt build files. It is already there in the build files that come with the BIDMach distribution.

Not all fonts include UTF-8 math characters, or print well in command-line windows. We have found Deja-Vu fonts to be very good in both respects, and we strongly recommend you use them. They're available here. We use them in Eclipse, in Cygwin and Putty command windows, and in other editors (Emacs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly