This is a virtual machine sandbox image to practice and learn Big Data and Data Science applications.
Running Big Data applications (Spark / Cassandra / Hadoop) can be a little convoluted because of all the dependencies. This can be even more of a hassle in Windows. We hope this VM Sandbox will make things easier.
Elephant Scale teaches Big Data & AI / Data Science classes. This sandbox is a replica of our virtualized environment.
Checkout our training classes in Big Data and Data Science
Currently OVA based virtual machine image is available. Docker images coming 'soon'.
Note : These are LARGE downloads (10G+ in size). Download when you have good bandwidth.
- Latest version : v6
- Release date : 2017-11-12
- Download link
- For older versions see changelog
- You need a virtual machine 'player'. Any of these would work:
- Download the latest sandbox image
- Double click on the 'OVA' file open it.
Login : student
password : bigdata123
See intro lab for a screencast.
Connectivity:
- Use VM GUI : when you open this OVA file in a VM environment you will be logged into the Ubuntu desktop
- SSH via port 22
- from host machine
$ ssh -l student -p 2222 localhost
This VM is tested with following Big Data stack.
- Spark v2.x and v1.6
- BigDL 0.3+
- Cassandra v3.x
- Kafka v0.10
- Storm v1.x
- Zookeeper v3.4.8
If you are enrolled in our classes, you will get a lab bundle. Also you can run any open source labs as well.
Checkout our Sandbox channel for more videos.
- Based on Ubuntu 16.04 LTS
- Most software is in /usr/local/apps (also ~/apps)
- Java / Scala
- Metrics
- Python
- Python 3.6
- Anaconda v4.3.1
- Jupyter
- Editors :
- IDEs
- Eclipse Neon - ~/apps/eclipse/java-neon/eclipse/eclipse
- IntelliJ Community Edition - ~/apps/idea/bin/idea.sh
- Big Data applications supported:
- Spark
- BigDL
- Cassandra
- Kafka
- Storm
- Zookeeper
See version history in changelog
We welcome your feedback about the sandbox.
- send an email to [email protected]
- or open a issue at the Github page