CAM2DistributedBackend

A distributed back-end for the CAM² project using PySpark and HDFS.

Architecture

>> A visual presentation for the architecture alongside accompanying projects is available here.

The distributed back-end uses Apache Spark for processing and Apache Hadoop Distributed File System (HDFS) for storage. From one end, Spark Standalone cluster setup consists of one master for coordination and several slaves to carry out the actual computation. From the other end, HDFS cluster setup consists of one namenode that stores meta-data information and several datanodes for the actual data storage.

The recommended cluster setup is to run the Spark master daemon and the HDFS namenode daemon on one node (manager node). Similarly, it is recommended to run the Spark slave daemon and the HDFS datanode daemon on the other nodes (worker nodes).

Requirements

Requisite software

Java

Download and extract a release (tested with Java 8).

Apache Hadoop

Make sure the requisite software is installed. For example, on Debian-based Linux, issue the command:
```
sudo apt install ssh pdsh
```
Download and extract a release (tested with 2.7.4).
Prepare its configuration in the etc/hadoop/ directory as follows:
- In etc/hadoop/hadoop-env.sh find the line export JAVA_HOME=${JAVA_HOME} and change it to point to where Java resides.
- In etc/hadoop/core-site.xml add the following property to the configuration tag:
```
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://namenode_url:9000</value>
</property>
```
  where namenode_url is the namenode host name or IP (must be reachable from the datanodes).
- In etc/hadoop/hdfs-site.xml add the following property to the configuration tag:
```
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
```

For more details: check the docs.

Apache Spark

Download and extract a release (tested with 2.2.0).

CAM2Environment

Create a file ~/CAM2Environment and use it to set the environment variables JAVA_HOME, HADOOP_HOME and SPARK_HOME to point to the where each software resides:

JAVA_HOME=/path/to/java
HADOOP_HOME=/path/to/hadoop
SPARK_HOME=/path/to/spark

Alternatively, set the environment variables manually and make sure they are available for the upcoming scripts.

Installation

Using pip:

pip install git+https://github.com/muhammad-alaref/CAM2DistributedBackend

Start the cluster

On the manager node:

CAM2StartManager

On the worker nodes:

CAM2StartWorker manager_host maximum_concurrent_tasks

where manager_host is the manager host name or IP (must be reachable from the workers) and maximum_concurrent_tasks is the maximum number of tasks (cameras) assigned to this worker concurrently.

Stop the cluster

On the worker nodes:

CAM2StopWorker

On the manager node:

CAM2StopManager

>> Note the reversed order.

Local setup

Issue the previous (manager + worker) commands on the same machine with manager_host=localhost.

Usage

The recommended way is to use the RESTful API project.
Alternatively, the CAM2DistributedBackend command can be used on any node on the cluster (preferably the manager node) with the parameters explained by the CAM2DistributedBackend --help command.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
CAM2DistributedBackend		CAM2DistributedBackend
bin		bin
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAM2DistributedBackend

Architecture

Requirements

Requisite software

Java

Apache Hadoop

Apache Spark

CAM2Environment

Installation

Start the cluster

Stop the cluster

Local setup

Usage

About

Releases

Packages

Languages

License

malaref/CAM2DistributedBackend

Folders and files

Latest commit

History

Repository files navigation

CAM2DistributedBackend

Architecture

Requirements

Requisite software

Java

Apache Hadoop

Apache Spark

CAM2Environment

Installation

Start the cluster

Stop the cluster

Local setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages