diff --git a/assets/themes/zeppelin/img/screenshots/map_dataset.png b/assets/themes/zeppelin/img/screenshots/map_dataset.png new file mode 100644 index 00000000000..ff38dbc7b8b Binary files /dev/null and b/assets/themes/zeppelin/img/screenshots/map_dataset.png differ diff --git a/assets/themes/zeppelin/img/screenshots/maps.png b/assets/themes/zeppelin/img/screenshots/maps.png new file mode 100644 index 00000000000..412069358bd Binary files /dev/null and b/assets/themes/zeppelin/img/screenshots/maps.png differ diff --git a/assets/themes/zeppelin/img/screenshots/new_map_button.png b/assets/themes/zeppelin/img/screenshots/new_map_button.png new file mode 100644 index 00000000000..d411af61a4a Binary files /dev/null and b/assets/themes/zeppelin/img/screenshots/new_map_button.png differ diff --git a/docs/development/datavalidation.md b/docs/development/datavalidation.md new file mode 100644 index 00000000000..5e23932f548 --- /dev/null +++ b/docs/development/datavalidation.md @@ -0,0 +1,109 @@ +--- +layout: page +title: "Data Validation Service" +description: "Data Validation Service" +group: development +--- + +## Data Validation + +Data validation is a process of ensuring data in zeppelin is clean, correct and according to the data schema model. Data validation provides certain well-defined rule set for fitness, and consistency checking for zeppelin charts. + +#### Where the data validator is used in zeppelin? + +Data validator is used in zeppelin before drawing charts or analyzing data. + +### Why the data validator is used? + +In drawing charts you can validate dataset if under validate data model schema, example. Before visualizing the dataset in charts, dataset needs to validated against data model schema for a particular chart type. +This is because different chart types have different data models. eg: Pie charts, Bar charts and Area charts have label and a number. Scatter charts and Bubble charts have two numbers for x axis and y axis at minimum in their data mdoels. + +### Why the data validator is important? + +When user request to draw any visualization of a dataset, data validation services will run through the dataset and check if the dataset is valid against the data schema. If unsuccess it will give a message which record is mismatched against the data schema. So the user gets a more accurate visualization and correct decision finally. Also researchers and data analytics use it to verify the dataset is clean and the preprocessing is done correctly. + +### How Data Validation is done? + +Data Validation consists of service, factories and configs.Data Validation is exposed as Angular services. Data validation factory, which is extendable contains functional implementation. Schemas are defined as constants in config. It contains basic data type validation by default + +Developers can introduce new data validation factories for their chart types by extending data validator factory. If a new chart consists of the same data schema existing data validators can be used. + +### How to used exisiting Data Validation services +Zeppelin Data Validation is exposed as service in Zeppelin Web application. It can be called and the dataset can be passed as a parameter. + +`dataValidatorSrv.(data);` + +This will return a message as below + +```javascript +{ + 'error': true / false, + 'msg': 'error msg / notification msg' +} +``` + +
+### How to Add New Data Validation Schema + +Data Validation is implemented as factory model. Therefore customized Data Validation factory can be created by extending `DataValidator` (zeppelin-web/src/components/data-validator/data-validator-factory.js) + +Data model schema in 'dataModelSchemas' can be configured. + +```javascript +'MapSchema': { + type: ['string', 'string', 'number', 'number', 'number'] +} +``` +If beyond data type validation is needed a function for validating the record can be introduced. If Range and constraint validation, Code and Cross-reference validation or Structured validation are needed they can be added to the Data Validation factory. + +
+### How to Expose New Data Validation Schema in Service +After adding a new data validation factory it needs to be exposed in `dataValidatorSrv` (zeppelin-web/src/components/data-validator/data-validator-service.js) + +```javascript + this.validateMapData = function(data) { + var mapValidator = mapdataValidator; + doBasicCheck(mapValidator,data); + //any custom validation can be called in here + return buildMsg(mapValidator); + }; +``` + +### Adding new Data Range Validation + +Data Range Validation is important with regard to some datasets. As an example Geographic Information dataset will contain geographic coordinates, Latitude measurements ranging from 0° to (+/–)90° and Longitude measurements ranging from 0° to (+/–)180°. All the values of Latitude and Longitude must to be inside a particular range. Therefore you can define range in schema and range validation function for factory as below. + +Adding range for `MapSchema` + +```javascript +'MapSchema': { + type: ['string', 'string', 'number', 'number', 'number'], + range: { + latitude: { + low: -90, + high: 90 + }, + longitude: { + low: -180, + high: 180 + } + } +} +``` + +Validating latitude in `mapdataValidator` factory + +```javascript +//Latitude measurements range from 0° to (+/–)90°. +function latitudeValidator(record, schema) { +var latitude = parseFloat(record); +if(schema.latitude.low < latitude && latitude < schema.latitude.high) { +msg += 'latitudes are ok | '; +} else { +msg += 'Latitude ' + record + ' is not in range | '; +errorStatus = true; +} +} +``` + +Few other sample validators can be found in zeppelin-web/src/components/data-validator/ directoy. diff --git a/docs/development/introducingchartlibrary.md b/docs/development/introducingchartlibrary.md new file mode 100644 index 00000000000..6dd27afe970 --- /dev/null +++ b/docs/development/introducingchartlibrary.md @@ -0,0 +1,64 @@ +--- +layout: page +title: "Introducing New Chart Library" +description: "Introducing New Chart Library" +group: development +--- +{% include JB/setup %} + +### Why Charts are important in zeppelin? +Zeppelin is mostly used for data analysis and visualization. Depending on the user requirements and datasets the types of charts needed could differ. So Zeppelin let user to add different chart libraries and chart types. + +
+### Add New Chart Library +When needed a new JS chart library than D3 (nvd3) which is included in zeppelin, a new JS library for zeppelin-web is added by adding name in zeppelin-web/bower.json + +eg: Adding map visualization to Zeppelin using leaflet + +``` +"leaflet": "~0.7.3" for dependencies +``` + +
+### Add New Chart Type + +Firstly add a button to view the new chart. Append to paragraph.html (zeppelin-web/src/app/notebook/paragraph/paragraph.html) the following lines depending on the chart you use. + +```xml + +``` + +After successful addition the zeppelin user will be able to see a new chart button added to the button group as follows. + +
+
+ +
+
+ +Defining the chart area of the new chart type. +To define the chart view of the new chart type add the following lines to paragraph.html + +```html +
+ +
+``` + +Different charts have different attributes and features. To handle such features of the new chart type map those beahaviours and features in the function `setGraphMode()` in the file paragraph.controller.js as follows. + +```javascript +if (!type || type === 'mapChart') { + //setup new chart type +} +``` +The current Dataset can be retrived by `$scope.paragraph.result` inside the `setGraphMode()` function. + +
+### Best Practices for setup a new chart. + +A new function can be used to setup the new chart types. Afterwards that function could be called inside the `setMapChart()` function. diff --git a/docs/index.md b/docs/index.md index 5481b2490c5..8f457732a5b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -14,6 +14,7 @@ group: nav-right ### Tutorial * [Tutorial](./tutorial/tutorial.html) +* [Tutorial with Map Visualization](./tutorial/tutorialonmapvisualization.html) ### Interpreter @@ -44,6 +45,6 @@ group: nav-right * [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) * [How to contribute (code)](./development/howtocontribute.html) * [How to contribute (website)](./development/howtocontributewebsite.html) - - - +* [Data Validation](./development/datavalidation.html) +* [Introducing New Chart Library](./development/introducingchartlibrary.html) +* [How to contribute](./development/howtocontribute.html) diff --git a/docs/tutorial/tutorialonmapvisualization.md b/docs/tutorial/tutorialonmapvisualization.md new file mode 100644 index 00000000000..44741d6bc02 --- /dev/null +++ b/docs/tutorial/tutorialonmapvisualization.md @@ -0,0 +1,84 @@ +--- +layout: page +title: "Tutorial with Map Visualization" +description: "Tutorial with Map Visualization" +group: tutorial +--- + +## Tutorial with Map Visualization + +Zeppelin is using leaflet which is an open source and mobile friendly interactive map library. + +Before starting the tutorial you will need dataset with geographical information. Dataset should contain location coordinates representing, longitude and latitude. Here the online csv file will be used for the next steps. + +```scala +import org.apache.commons.io.IOUtils +import java.net.URL +import java.nio.charset.Charset + + +// load map data +val myMapText = sc.parallelize( + IOUtils.toString( + new URL("https://mysite/data.csv"), + Charset.forName("utf8")).split("\n")) +``` + +
+#### Refine Data + +Next to transform data from csv format into RDD of Map objects, run the following script. This will remove the csv headers using filter function. + +```scala +case class Map(Country:String, Name:String, lat : Float, lan : Float, Altitude : Float) + +val myMap = myMapText.map(s=>s.split(",")).filter(s=>s(0)!="Country").map( + s=>Map(s(0), + s(1), + s(2).toFloat, + s(3).toFloat, + s(4).toFloat + ) + + +// Below line works only in spark 1.3.0. +// For spark 1.1.x and spark 1.2.x, +// use myMap.registerTempTable("myMap") instead. +myMap.toDF().registerTempTable("myMap") +``` + +
+#### Data Retrieval and Data Validation + +Here is how the dataset is viewed as a table + +
+
+ +
+
+ + +Dataset can be vaildated by calling `dataValidatorSrv`. It will validate longitude and latitude. If any record is out of range it will point out the recordId and record value with a meaningful error message. + +```javascript +var msg = dataValidatorSrv.validateMapData(data); +``` +Now data distributions can be viewed on geographical map as below. + +```sql +%sql +select * from myMap +where Country = "${Country="United States"} +``` + +```sql +%sql +select * from myMap +where Altitude > ${Altitude=300} +``` +
+
+ +
+