Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/themes/zeppelin/img/screenshots/maps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
109 changes: 109 additions & 0 deletions docs/development/datavalidation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
layout: page
title: "Data Validation Service"
description: "Data Validation Service"
group: development
---

## Data Validation

Data validation is a process of ensuring data in zeppelin is clean, correct and according to the data schema model. Data validation provides certain well-defined rule set for fitness, and consistency checking for zeppelin charts.

#### Where the data validator is used in zeppelin?

Data validator is used in zeppelin before drawing charts or analyzing data.

### Why the data validator is used?

In drawing charts you can validate dataset if under validate data model schema, example. Before visualizing the dataset in charts, dataset needs to validated against data model schema for a particular chart type.
This is because different chart types have different data models. eg: Pie charts, Bar charts and Area charts have label and a number. Scatter charts and Bubble charts have two numbers for x axis and y axis at minimum in their data mdoels.

### Why the data validator is important?

When user request to draw any visualization of a dataset, data validation services will run through the dataset and check if the dataset is valid against the data schema. If unsuccess it will give a message which record is mismatched against the data schema. So the user gets a more accurate visualization and correct decision finally. Also researchers and data analytics use it to verify the dataset is clean and the preprocessing is done correctly.

### How Data Validation is done?

Data Validation consists of service, factories and configs.Data Validation is exposed as Angular services. Data validation factory, which is extendable contains functional implementation. Schemas are defined as constants in config. It contains basic data type validation by default

Developers can introduce new data validation factories for their chart types by extending data validator factory. If a new chart consists of the same data schema existing data validators can be used.

### How to used exisiting Data Validation services
Zeppelin Data Validation is exposed as service in Zeppelin Web application. It can be called and the dataset can be passed as a parameter.

`dataValidatorSrv.<dataModelValidateName>(data);`

This will return a message as below

```javascript
{
'error': true / false,
'msg': 'error msg / notification msg'
}
```

<br />
### How to Add New Data Validation Schema

Data Validation is implemented as factory model. Therefore customized Data Validation factory can be created by extending `DataValidator` (zeppelin-web/src/components/data-validator/data-validator-factory.js)

Data model schema in 'dataModelSchemas' can be configured.

```javascript
'MapSchema': {
type: ['string', 'string', 'number', 'number', 'number']
}
```
If beyond data type validation is needed a function for validating the record can be introduced. If Range and constraint validation, Code and Cross-reference validation or Structured validation are needed they can be added to the Data Validation factory.

<br />
### How to Expose New Data Validation Schema in Service
After adding a new data validation factory it needs to be exposed in `dataValidatorSrv` (zeppelin-web/src/components/data-validator/data-validator-service.js)

```javascript
this.validateMapData = function(data) {
var mapValidator = mapdataValidator;
doBasicCheck(mapValidator,data);
//any custom validation can be called in here
return buildMsg(mapValidator);
};
```

### Adding new Data Range Validation

Data Range Validation is important with regard to some datasets. As an example Geographic Information dataset will contain geographic coordinates, Latitude measurements ranging from 0° to (+/–)90° and Longitude measurements ranging from 0° to (+/–)180°. All the values of Latitude and Longitude must to be inside a particular range. Therefore you can define range in schema and range validation function for factory as below.

Adding range for `MapSchema`

```javascript
'MapSchema': {
type: ['string', 'string', 'number', 'number', 'number'],
range: {
latitude: {
low: -90,
high: 90
},
longitude: {
low: -180,
high: 180
}
}
}
```

Validating latitude in `mapdataValidator` factory

```javascript
//Latitude measurements range from 0° to (+/–)90°.
function latitudeValidator(record, schema) {
var latitude = parseFloat(record);
if(schema.latitude.low < latitude && latitude < schema.latitude.high) {
msg += 'latitudes are ok | ';
} else {
msg += 'Latitude ' + record + ' is not in range | ';
errorStatus = true;
}
}
```

Few other sample validators can be found in zeppelin-web/src/components/data-validator/ directoy.
64 changes: 64 additions & 0 deletions docs/development/introducingchartlibrary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
layout: page
title: "Introducing New Chart Library"
description: "Introducing New Chart Library"
group: development
---
{% include JB/setup %}

### Why Charts are important in zeppelin?
Zeppelin is mostly used for data analysis and visualization. Depending on the user requirements and datasets the types of charts needed could differ. So Zeppelin let user to add different chart libraries and chart types.

<br />
### Add New Chart Library
When needed a new JS chart library than D3 (nvd3) which is included in zeppelin, a new JS library for zeppelin-web is added by adding name in zeppelin-web/bower.json

eg: Adding map visualization to Zeppelin using leaflet

```
"leaflet": "~0.7.3" for dependencies
```

<br />
### Add New Chart Type

Firstly add a button to view the new chart. Append to paragraph.html (zeppelin-web/src/app/notebook/paragraph/paragraph.html) the following lines depending on the chart you use.

```xml
<button type="button" class="btn btn-default btn-sm"
ng-class="{'active': isGraphMode('mapChart')}"
ng-click="setGraphMode('mapChart', true)"><i class="fa fa-globe"></i>
</button>
```

After successful addition the zeppelin user will be able to see a new chart button added to the button group as follows.

<div class="row">
<div class="col-md-8">
<img class="img-responsive" src="./../../assets/themes/zeppelin/img/screenshots/new_map_button.png" />
</div>
</div>

Defining the chart area of the new chart type.
To define the chart view of the new chart type add the following lines to paragraph.html

```html
<div ng-if="getGraphMode()=='mapChart'"
id="p{{paragraph.id}}_mapChart">
<leaflet></leaflet>
</div>
```

Different charts have different attributes and features. To handle such features of the new chart type map those beahaviours and features in the function `setGraphMode()` in the file paragraph.controller.js as follows.

```javascript
if (!type || type === 'mapChart') {
//setup new chart type
}
```
The current Dataset can be retrived by `$scope.paragraph.result` inside the `setGraphMode()` function.

<br />
### Best Practices for setup a new chart.

A new function can be used to setup the new chart types. Afterwards that function could be called inside the `setMapChart()` function.
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ group: nav-right
### Tutorial

* [Tutorial](./tutorial/tutorial.html)
* [Tutorial with Map Visualization](./tutorial/tutorialonmapvisualization.html)

### Interpreter

Expand Down Expand Up @@ -44,6 +45,6 @@ group: nav-right
* [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html)
* [How to contribute (code)](./development/howtocontribute.html)
* [How to contribute (website)](./development/howtocontributewebsite.html)



* [Data Validation](./development/datavalidation.html)
* [Introducing New Chart Library](./development/introducingchartlibrary.html)
* [How to contribute](./development/howtocontribute.html)
84 changes: 84 additions & 0 deletions docs/tutorial/tutorialonmapvisualization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
layout: page
title: "Tutorial with Map Visualization"
description: "Tutorial with Map Visualization"
group: tutorial
---

## Tutorial with Map Visualization

Zeppelin is using leaflet which is an open source and mobile friendly interactive map library.

Before starting the tutorial you will need dataset with geographical information. Dataset should contain location coordinates representing, longitude and latitude. Here the online csv file will be used for the next steps.

```scala
import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset


// load map data
val myMapText = sc.parallelize(
IOUtils.toString(
new URL("https://mysite/data.csv"),
Charset.forName("utf8")).split("\n"))
```

<br />
#### Refine Data

Next to transform data from csv format into RDD of Map objects, run the following script. This will remove the csv headers using filter function.

```scala
case class Map(Country:String, Name:String, lat : Float, lan : Float, Altitude : Float)

val myMap = myMapText.map(s=>s.split(",")).filter(s=>s(0)!="Country").map(
s=>Map(s(0),
s(1),
s(2).toFloat,
s(3).toFloat,
s(4).toFloat
)


// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use myMap.registerTempTable("myMap") instead.
myMap.toDF().registerTempTable("myMap")
```

<br />
#### Data Retrieval and Data Validation

Here is how the dataset is viewed as a table

<div class="row">
<div class="col-md-12">
<img class="img-responsive" src="./../../assets/themes/zeppelin/img/screenshots/map_dataset.png" />
</div>
</div>


Dataset can be vaildated by calling `dataValidatorSrv`. It will validate longitude and latitude. If any record is out of range it will point out the recordId and record value with a meaningful error message.

```javascript
var msg = dataValidatorSrv.validateMapData(data);
```
Now data distributions can be viewed on geographical map as below.

```sql
%sql
select * from myMap
where Country = "${Country="United States"}
```

```sql
%sql
select * from myMap
where Altitude > ${Altitude=300}
```
<div class="row">
<div class="col-md-12">
<img class="img-responsive" src="./../../assets/themes/zeppelin/img/screenshots/maps.png" />
</div>
</div>