Skip to content

mganta/adls-spark-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

A spark program to read and write from ADLS

Here is how you can use livy to submit jobs to process data from ADLS

The flow is:

spark job via rest api --> livy --> spark-dispatcher --> mesos --> spark job (read/write from adls)

After ssh-tunnel

with credentials hard-coded:

curl -v -H "Content-Type: application/json" -X POST -d '{ "file": "https:///adls-spark-examples-0.0.1-SNAPSHOT-jar-with-dependencies.jar", "className":"TestSparkWordCount" }' 'http://localhost/service/livy-spark2/batches'

with credentials as parameters:

curl -v -H "Content-Type: application/json" -X POST -d '{ "file": "https://myjars.blob.core.windows.net/myjars/adls-spark-examples-0.0.1-SNAPSHOT-jar-with-dependencies.jar", "className":"TestHiveTable", "args": ["val for credentail", "val for clientid", "https://login.microsoftonline.com/blah-blah/oauth2/token"] }' 'http://localhost/service/livy-spark2/batches'

(ADLS jars in src/main/resources as maven central do not have the jars for hadoop 2.x)

About

spark adls read write

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages