Skip to content

markthebault/importCSVSparkCassandra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Import CSV files with spark

This short example will show you how easily is to import CSV files from your AWS S3 buckets using spark into cassandra.

Setting up your applicaiton

Clone this repository git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.git Open the file 'src/main/ressources/project.conf' and change your settings.

You need to change the following values:

  • Cassandra
    • host
    • port
    • keyspace
    • table
  • AWS
    • accessKey
    • secretKey
    • bucket
    • fileName

Build a jar

To build the Jar of your application you just need to run the command sbt clean assembly

Deploy the Jar on a spark cluster

To deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside. You have to push this Jar to a S3 public bucket aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jar

Once you have done that, you just need to run the spark-submit command as following:

$SPARK_HOME/bin/spark-submit \
	--verbose \
	--master spark://IP_SPARK_MASTER:PORT \
	--deploy-mode cluster \
	--driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \
	--class Application \
	https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jar

Note: Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link: http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jar please consider of the http link have to be encoded you can use this website to encode the link

Next

If you want to contribute to this project feel free to do it, if you see some mistake please leave me an issue.

About

How to import a file from S3 into cassandra using spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages