Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conf/zeppelin-site.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@

<property>
<name>zeppelin.interpreters</name>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter,org.apache.zeppelin.tachyon.TachyonInterpreter</value>
<value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.angular.AngularInterpreter,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,org.apache.zeppelin.tajo.TajoInterpreter,org.apache.zeppelin.flink.FlinkInterpreter,org.apache.zeppelin.lens.LensInterpreter,org.apache.zeppelin.ignite.IgniteInterpreter,org.apache.zeppelin.ignite.IgniteSqlInterpreter,org.apache.zeppelin.cassandra.CassandraInterpreter,org.apache.zeppelin.geode.GeodeOqlInterpreter,org.apache.zeppelin.postgresql.PostgreSqlInterpreter,org.apache.zeppelin.jdbc.JDBCInterpreter,org.apache.zeppelin.springxd.SpringXdStreamInterpreter,org.apache.zeppelin.springxd.SpringXdJobInterpreter,org.apache.zeppelin.phoenix.PhoenixInterpreter,org.apache.zeppelin.kylin.KylinInterpreter,org.apache.zeppelin.elasticsearch.ElasticsearchInterpreter,org.apache.zeppelin.scalding.ScaldingInterpreter,org.apache.zeppelin.tachyon.TachyonInterpreter</value>
<description>Comma separated interpreter configurations. First interpreter become a default</description>
</property>

Expand Down
1 change: 1 addition & 0 deletions docs/_includes/themes/zeppelin/_navigation.html
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
<li><a href="{{BASE_PATH}}/interpreter/spark.html">Spark</a></li>
<li><a href="{{BASE_PATH}}/interpreter/tachyon.html">Tachyon</a></li>
<li><a href="{{BASE_PATH}}/pleasecontribute.html">Tajo</a></li>
<li><a href="{{BASE_PATH}}/interpreter/springxd.html">SpringXD</a></li>
</ul>
</li>
<li>
Expand Down
150 changes: 150 additions & 0 deletions docs/interpreter/springxd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
layout: page
title: "SpringXD Interpreter"
description: ""
group: manual
---
{% include JB/setup %}


## SpringXD Interpreter for Apache Zeppelin

### Overview
[Spring XD](http://docs.spring.io/spring-xd/docs/current/reference/html/#_reference_guide) is a unified, distributed, and extensible service for data ingestion, real time analytics, batch processing, and data export. SpringXD helps to build big-data applications that integrate many disparate systems into one cohesive solution across a range of use-cases.
SpringXD supports two processing abstractions: (1) [Streams](http://docs.spring.io/spring-xd/docs/current/reference/html/#streams) for event driven data, and (2) [Jobs](http://docs.spring.io/spring-xd/docs/current/reference/html/#jobs) like MR, PIG, Hive, Cascading, SQL and so on, for batch data types. Spring XD tries to blur the boundaries between the two using a channel abstraction, so that, for example, a stream can trigger a batch job, and a batch job can send events and, in turn, trigger other streams. An expressive DSL is provided for defining streams and jobs:

```
myStream = http | filter --expression=payload='foo' | log
```

Zeppelin provides interpreters for SpringXD `Streams` and `Jobs`:

<table class="table-configuration">
<tr>
<th>Name</th>
<th>Class</th>
<th>Description</th>
</tr>
<tr>
<td>%xd.stream</td>
<td>SpringXdStreamInterpreter</td>
<td>Provides an environment to create SpringXD Streams - defines the ingestion of event driven data from a <code>source</code> to a <code>sink</code> that passes through any number of <code>processors</code>. </td>
</tr>
<tr>
<td>%xd.job</td>
<td>SpringXdJobInterpreter</td>
<td>Provides an environment to create SpringXD Jobs - combine interactions with standard enterprise systems (e.g. RDBMS) as well as Hadoop operations (e.g. MapReduce, HDFS, Pig, Hive or HBase). </td>
</tr>
</table>

> Note: A single Zeppelin `paragraph` can contain multiple `streams` or multiple `jobs`, but it can not mix-up jobs and streams in the same paragraph!

<br/>
[<img align="right" src="http://img.youtube.com/vi/uSUBe8Hgk8w/0.jpg" alt="zeppelin-view" hspace="10" width="250"></img>](https://www.youtube.com/watch?v=uSUBe8Hgk8w)

Following [video tutorial](https://www.youtube.com/watch?v=uSUBe8Hgk8w) illustrates some of the features provided by the `SpringXD Interpreter`. It shows how to build SpringXD pipeline to ingest a real-time Tweet stream into Geode, HDFS and HAWQ backends.

### Create Interpreter

By default Zeppelin creates one `SpringXD` interpreter instance that communicates with the backend SpringXD system

To create new SpringXD instance open the `Interpreter` section and click the `+Create` button. Pick a `Name` of your choice and from the `Interpreter` drop-down select `xd`. Then follow the configuration instructions and `Save` the new instance.

> Note: The `Name` of the instance is used only to distinct the instances while binding them to the `Notebook`. The `Name` is irrelevant inside the `Notebook`. In the `Notebook` you must use `%xd.stream` or `%xd.job` tag.

### Bind to Notebook
In the `Notebook` click on the `settings` icon in the top right corner. Then select/deselect the XD interpreter to be bound with the `Notebook`.

### Configuration
You can modify the configuration of the SpringXD from the `Interpreter` section. The SpringXD interpreter express the following properties:

<table class="table-configuration">
<tr>
<th>Property Name</th>
<th>Description</th>
<th>Default Value</th>
</tr>
<tr>
<td>springxd.url</td>
<td>The URL for SpringXD REST API. </td>
<td>http://localhost:9393</td>
</tr>
</table>

### How to use
Following sections will explain how to create and destroy Streams and Jobs inside the Zeppelin paragraphs.
> Tip: Use `CTRL + .` for SpringXD auto-completion.


#### XD Streams

* Start the paragraphs with the full `%xd.stream` prefix tag to identifie the SpringXD Streams interpreter.
* Use `Ctrl+.` for auto-completion.
* In the paragraph define one or more stream definitions - each on a separate line.
* Every stream definition must have name. Follow the convention: `stream name = stream definition`. Streams without names are ignored. The stream definition follows the Spring XD DSL: [DSL Guide](http://docs.spring.io/spring-xd/docs/current/reference/html/#dsl-guide).
* Every time a Paragraph is Run it destroys any previous streams created in the paragraph. The previous streams are destroyed even if their names have been changed or removed.
* Paragraph `Run` command will `Create` and automatically `Deploy` all streams defined in the Paragraph.
* If one stream fails to create or to deploy then all streams in the paragraph are destroyed.
* Zeppelin returns after the streams deployment (e.g. Zeppelin goes into Finished state).
* When streams have successfully been deployed the result contains a `button` that lists the just deployed streams and status `DEPLOYED`.
* To destroy streams in a paragraph press the `Annular Button` in the paragraph result section. The button state will switch from `DEPLOYED` to `DESTROYED`.
* Streams can refer other streams or jobs in any paragraph and even different notebooks.

###### Stream Examples:
Following example creates a stream that ingests a live tweeter stream and stores it into HDFS files:

```
%xd.stream
tweets = twittersearch --query=Zeppelin --outputType=application/json | hdfs
```

Another example is to fork the above `tweets` stream, extract the tweets's IDs and use the SpringXD [Counter Sink](http://docs.spring.io/spring-xd/docs/current/reference/html/#counters-and-gauges) to count the number for tweets:

```
%xd.stream
tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter --name=tweetCounter
```

#### XD Jobs

* Start the paragraphs with the full `%xd.job` prefix tag.
* Use `Ctrl+.` for auto-completion.
* In the paragraph define one or more Job definitions - each on a separate line.
* Jobs must have names. Follow the convention: `job name = job definition`. Streams without names are ignored. The job definition follows the Spring XD DSL for job definition.
* Every new Paragraph `Run` will destroy any previous running jobs, created in the same paragraph. The jobs are destroyed even if their names have been changed or removed.
* A Paragraph Run will `Create`, automatically `Deploy` and `Start` the jobs defined in the Paragraph.
* If a job deployment fail then all jobs in the paragraph are destroyed.
* Zeppelin returns after the jobs have started (e.g. Zeppelin goes into `Finished` state).
* When job have successfully been deployed the result contains a button that lists the just deployed jobs and status `DEPLOYED`.
* To destroy all jobs in the paragraph, press the `Annular button` in the result section. Button state should switch from `DEPLOYED` to `DESTROYED`.
* Jobs can refer other streams or jobs in any paragraph and even different notebooks.

###### Job Examples:
Example module which loads [CSV files into a JDBC table](http://docs.spring.io/spring-xd/docs/current/reference/html/#file-jdbc) using a single batch job.

```
%xd.job
myJob = filejdbc --resources=file:///mycsvdir/*.csv --names=forename,surname,address --url=jdbc:mysql://localhost:3306/myTable --username=jdbcUser --password=jdbcPassword --tableName=people --initializeDatabase=true
```
The job should be defined with the resources parameter defining the files which should be loaded. It also requires a names parameter (for the CSV field names) and these should match the database column names into which the data should be stored. You can either pre-create the database table or the module will create it for you if you use --initializeDatabase=true when the job is created. The table initialization is configured in a similar way to the JDBC sink and uses the same parameters. The default table name is the job name and can be customized by setting the tableName parameter. As an example, if you run the command

Another example illustrates how a Spark Application can be deployed and launched from Spring XD as a batch job. SparkTasklet submits the Spark application into Spark cluster manager using org.apache.spark.deploy.SparkSubmit:

```
%xd.job
SparkPiExample = sparkapp --appJar=<the location of spark-examples-1.2.1 jar> --name=MyApp --master=<spark master url or local> --mainClass=org.apache.spark.examples.SparkPi
```

#### Apply Zeppelin Dynamic Forms

You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html) inside your stream and job definitions. You can use both the `text input` and `select form` parameterization features. For example to parameterize the search parameter in the twitter search stream :

```
%xd.stream

tweets = twittersearch --query=${MySearchParameter=DefaultSearchKeyword} --outputType=application/json | hdfs
```


### Auto-completion
The SpringXD Interpreter provides a comprehensive auto-completion for `Job` and `Stream` DSL definitions. On `Ctrl+.` it lists the most relevant suggestions in a pop-up window. Implementation leverages [SpringXD's Completion API](http://docs.spring.io/spring-xd/docs/current/reference/html/#completions) and provides completion experience similar to the SpringXD Shell.
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
<module>phoenix</module>
<module>postgresql</module>
<module>jdbc</module>
<module>springxd</module>
<module>tajo</module>
<module>flink</module>
<module>ignite</module>
Expand Down
152 changes: 152 additions & 0 deletions springxd/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<artifactId>zeppelin</artifactId>
<groupId>org.apache.zeppelin</groupId>
<version>0.6.0-incubating-SNAPSHOT</version>
</parent>

<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-springxd</artifactId>
<packaging>jar</packaging>
<version>0.6.0-incubating-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>org.apache.zeppelin</groupId>
<artifactId>zeppelin-interpreter</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.springframework.xd</groupId>
<artifactId>spring-xd-rest-client</artifactId>
<version>1.0.1.RELEASE</version>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.10.12</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.7</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>

<plugin>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.3.1</version>
<executions>
<execution>
<id>enforce</id>
<phase>none</phase>
</execution>
</executions>
</plugin>

<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.8</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/springxd</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
</configuration>
</execution>
<execution>
<id>copy-artifact</id>
<phase>package</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/../../interpreter/springxd</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
<includeScope>runtime</includeScope>
<artifactItems>
<artifactItem>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<type>${project.packaging}</type>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

<repositories>
<repository>
<id>spring-release</id>
<name>Spring Releases</name>
<url>http://repo.spring.io/libs-release</url>
</repository>
</repositories>
<name>Zeppelin: SpringXD interpreter</name>
</project>
Loading