diff --git a/notebook/Zeppelin Tutorial/Using Mahout_2BYEZ5EVK.zpln b/notebook/Miscellaneous Tutorial/Using Mahout_2BYEZ5EVK.zpln similarity index 100% rename from notebook/Zeppelin Tutorial/Using Mahout_2BYEZ5EVK.zpln rename to notebook/Miscellaneous Tutorial/Using Mahout_2BYEZ5EVK.zpln diff --git a/notebook/Zeppelin Tutorial/Using Pig for querying data_2C57UKYWR.zpln b/notebook/Miscellaneous Tutorial/Using Pig for querying data_2C57UKYWR.zpln similarity index 100% rename from notebook/Zeppelin Tutorial/Using Pig for querying data_2C57UKYWR.zpln rename to notebook/Miscellaneous Tutorial/Using Pig for querying data_2C57UKYWR.zpln diff --git a/notebook/Python Tutorial/IPython Basic_2EYDJKFFY.zpln b/notebook/Python Tutorial/1. IPython Basic_2EYDJKFFY.zpln similarity index 100% rename from notebook/Python Tutorial/IPython Basic_2EYDJKFFY.zpln rename to notebook/Python Tutorial/1. IPython Basic_2EYDJKFFY.zpln diff --git a/notebook/Python Tutorial/IPython Visualization Tutorial_2F1S9ZY8Z.zpln b/notebook/Python Tutorial/2. IPython Visualization Tutorial_2F1S9ZY8Z.zpln similarity index 100% rename from notebook/Python Tutorial/IPython Visualization Tutorial_2F1S9ZY8Z.zpln rename to notebook/Python Tutorial/2. IPython Visualization Tutorial_2F1S9ZY8Z.zpln diff --git a/notebook/Python Tutorial/Keras Binary Classification (IMDB)_2F2AVWJ77.zpln b/notebook/Python Tutorial/3. Keras Binary Classification (IMDB)_2F2AVWJ77.zpln similarity index 100% rename from notebook/Python Tutorial/Keras Binary Classification (IMDB)_2F2AVWJ77.zpln rename to notebook/Python Tutorial/3. Keras Binary Classification (IMDB)_2F2AVWJ77.zpln diff --git a/notebook/Python Tutorial/Matplotlib (Python, PySpark)_2C2AUG798.zpln b/notebook/Python Tutorial/4. Matplotlib (Python, PySpark)_2C2AUG798.zpln similarity index 100% rename from notebook/Python Tutorial/Matplotlib (Python, PySpark)_2C2AUG798.zpln rename to notebook/Python Tutorial/4. Matplotlib (Python, PySpark)_2C2AUG798.zpln diff --git a/notebook/R Tutorial/R Basics_2BWJFTXKJ.zpln b/notebook/R Tutorial/1. R Basics_2BWJFTXKJ.zpln similarity index 100% rename from notebook/R Tutorial/R Basics_2BWJFTXKJ.zpln rename to notebook/R Tutorial/1. R Basics_2BWJFTXKJ.zpln diff --git a/notebook/R Tutorial/Shiny App_2EZ66TM57.zpln b/notebook/R Tutorial/2. Shiny App_2EZ66TM57.zpln similarity index 100% rename from notebook/R Tutorial/Shiny App_2EZ66TM57.zpln rename to notebook/R Tutorial/2. Shiny App_2EZ66TM57.zpln diff --git a/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln b/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln new file mode 100644 index 00000000000..28dd67cce5c --- /dev/null +++ b/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln @@ -0,0 +1,508 @@ +{ + "paragraphs": [ + { + "title": "", + "text": "%md\n\n# Introduction\n\nThis tutorial is for how to use Spark Interpreter in Zeppelin.\n\n1. Specify `SPARK_HOME` in interpreter setting. If you don\u0027t specify `SPARK_HOME`, Zeppelin will use the embedded spark which can only run in local mode. And some advanced features may not work in this embedded spark.\n2. Specify `master` for spark execution mode.\n * `local[*]` - Driver and Executor would both run in the same host of zeppelin server. It is only for testing and POC, not for production. \n * `yarn-client` - Driver would run on the same host of zeppelin server which means it would increase memory pressure on the machine of zeppelin server.\n * `yarn-cluster` - Driver would run in a remote node of yarn cluster, it is supported only after 0.8.0. And yarn-cluster is preffered over yarn-client as it mitigate the memeory pressure of zeppelin server.\n * `standalone` - Just specify master to be the spark master address. e.g. spark://HOST:PORT\n * `mesos` - There\u0027s no suffient test on these mode in Zeppelin. So you may hit weird issues when using these modes.\n3. Create different spark interpreter for different spark version. If you want to use different spark version in the same Zeppelin instance, you can create different spark interpreter for each spark version. And for each interpreter, you need to specify its `SPARK_HOME` properly which point to the correct spark distribution. e.g. You can use the default spark interpreter named `spark` for spark 2.4 and create another spark interpreter named `spark3` for spark 3.0\n", + "user": "anonymous", + "dateUpdated": "2020-05-04 13:44:39.482", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis tutorial is for how to use Spark Interpreter in Zeppelin.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eSpecify \u003ccode\u003eSPARK_HOME\u003c/code\u003e in interpreter setting. If you don\u0026rsquo;t specify \u003ccode\u003eSPARK_HOME\u003c/code\u003e, Zeppelin will use the embedded spark which can only run in local mode. And some advanced features may not work in this embedded spark.\u003c/li\u003e\n\u003cli\u003eSpecify \u003ccode\u003emaster\u003c/code\u003e for spark execution mode.\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003elocal[*]\u003c/code\u003e - Driver and Executor would both run in the same host of zeppelin server. It is only for testing and POC, not for production.\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eyarn-client\u003c/code\u003e - Driver would run on the same host of zeppelin server which means it would increase memory pressure on the machine of zeppelin server.\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eyarn-cluster\u003c/code\u003e - Driver would run in a remote node of yarn cluster, it is supported only after 0.8.0. And yarn-cluster is preffered over yarn-client as it mitigate the memeory pressure of zeppelin server.\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003estandalone\u003c/code\u003e - Just specify master to be the spark master address. e.g. \u003ca href\u003d\"spark://HOST:PORT\"\u003espark://HOST:PORT\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003emesos\u003c/code\u003e - There\u0026rsquo;s no suffient test on these mode in Zeppelin. So you may hit weird issues when using these modes.\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003eCreate different spark interpreter for different spark version. If you want to use different spark version in the same Zeppelin instance, you can create different spark interpreter for each spark version. And for each interpreter, you need to specify its \u003ccode\u003eSPARK_HOME\u003c/code\u003e properly which point to the correct spark distribution. e.g. You can use the default spark interpreter named \u003ccode\u003espark\u003c/code\u003e for spark 2.4 and create another spark interpreter named \u003ccode\u003espark3\u003c/code\u003e for spark 3.0\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762311_275695566", + "id": "20180530-211919_1936070943", + "dateCreated": "2020-04-30 10:46:02.311", + "dateStarted": "2020-05-04 13:44:39.482", + "dateFinished": "2020-05-04 13:44:39.508", + "status": "FINISHED" + }, + { + "title": "Use Generic Inline Configuration instead of Interpreter Setting", + "text": "%md\n\nCustomize your spark interpreter is indispensible for Zeppelin Notebook. E.g. You want to add third party jars, change the execution mode, change the number of exceutor or its memory and etc. You can check this link for all the available [spark configuration](http://spark.apache.org/docs/latest/configuration.html)\nAlthough you can customize these in interpreter setting, it is recommended to do via the generic inline configuration. Because interpreter setting is shared globally, it is intend to be managed by admin not by users. Users is recommended to customize spark interpreter via the generic inline configuration `%spark.conf`\n\nThe following is an example of how to customize your spark interpreter. To be noticed, you have to run this paragraph first before launching spark interpreter process. Because these customization won\u0027t take effect after spark interpreter process is launched.", + "user": "anonymous", + "dateUpdated": "2020-05-04 13:45:44.204", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eCustomize your spark interpreter is indispensible for Zeppelin Notebook. E.g. You want to add third party jars, change the execution mode, change the number of exceutor or its memory and etc. You can check this link for all the available \u003ca href\u003d\"http://spark.apache.org/docs/latest/configuration.html\"\u003espark configuration\u003c/a\u003e\u003cbr /\u003e\nAlthough you can customize these in interpreter setting, it is recommended to do via the generic inline configuration. Because interpreter setting is shared globally, it is intend to be managed by admin not by users. Users is recommended to customize spark interpreter via the generic inline configuration \u003ccode\u003e%spark.conf\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eThe following is an example of how to customize your spark interpreter. To be noticed, you have to run this paragraph first before launching spark interpreter process. Because these customization won\u0026rsquo;t take effect after spark interpreter process is launched.\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762316_737450410", + "id": "20180531-100923_1307061430", + "dateCreated": "2020-04-30 10:46:02.316", + "dateStarted": "2020-05-04 13:45:44.204", + "dateFinished": "2020-05-04 13:45:44.221", + "status": "FINISHED" + }, + { + "title": "Generic Inline Configuration", + "text": "%spark.conf\n\nSPARK_HOME \u003cPATH_TO_SPAKR_HOME\u003e\n\n# set driver memrory to 8g\nspark.driver.memory 8g\n\n# set executor number to be 6\nspark.executor.instances 6\n\n# set executor memrory 4g\nspark.executor.memory 4g\n\n# Any other spark properties can be set here. Here\u0027s avaliable spark configruation you can set. (http://spark.apache.org/docs/latest/configuration.html)\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 10:56:30.840", + "config": { + "editorSetting": { + "language": "text", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "colWidth": 12.0, + "editorMode": "ace/mode/text", + "fontSize": 9.0, + "results": {}, + "enabled": true, + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762316_1311021507", + "id": "20180531-101615_648039641", + "dateCreated": "2020-04-30 10:46:02.316", + "status": "READY" + }, + { + "title": "Use Third Party Library", + "text": "%md\n\nThere\u0027re 2 ways to add third party libraries.\n\n* `Generic Inline Configuration` It is the recommended way to add third party jars/packages. Use `spark.jars` for adding local jar file and `spark.jars.packages` for adding packages\n* `Interpreter Setting` You can also config `spark.jars` and `spark.jars.packages` in interpreter setting, but since adding third party libraries is usually application specific. It is recommended to use `Generic Inline Configuration` so that user can see clearly what dependencies this note needs and also easy to rerun this note in another enviroment. Otherwise you need create many interpreters for each note with different dependencies.\n\nThe following is an example that we want to use package `com.databricks:spark-avro_2.11:4.0.0` for reading avro data.\n1. First we specify it in `%spark.conf`\n2. Then we can use it in the next paragraph\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 10:59:35.270", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;re 2 ways to add third party libraries.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003eGeneric Inline Configuration\u003c/code\u003e It is the recommended way to add third party jars/packages. Use \u003ccode\u003espark.jars\u003c/code\u003e for adding local jar file and \u003ccode\u003espark.jars.packages\u003c/code\u003e for adding packages\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003eInterpreter Setting\u003c/code\u003e You can also config \u003ccode\u003espark.jars\u003c/code\u003e and \u003ccode\u003espark.jars.packages\u003c/code\u003e in interpreter setting, but since adding third party libraries is usually application specific. It is recommended to use \u003ccode\u003eGeneric Inline Configuration\u003c/code\u003e so that user can see clearly what dependencies this note needs and also easy to rerun this note in another enviroment. Otherwise you need create many interpreters for each note with different dependencies.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe following is an example that we want to use package \u003ccode\u003ecom.databricks:spark-avro_2.11:4.0.0\u003c/code\u003e for reading avro data.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eFirst we specify it in \u003ccode\u003e%spark.conf\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eThen we can use it in the next paragraph\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762323_1299339607", + "id": "20180530-212309_72587811", + "dateCreated": "2020-04-30 10:46:02.323", + "dateStarted": "2020-04-30 10:59:35.271", + "dateFinished": "2020-04-30 10:59:35.282", + "status": "FINISHED" + }, + { + "title": "", + "text": "%spark.conf\n\n# Must set SPARK_HOME for this example, because it won\u0027t work for Zeppelin\u0027s embedded spark mode. The embedded spark mode doesn\u0027t \n# use spark-submit to launch spark interpreter, so spark.jars and spark.jars.packages won\u0027t take affect. \nSPARK_HOME \u003cPATH_TO_SPAKR_HOME\u003e\n\n# set execution mode\nmaster yarn-client\n\n# spark.jars can be used for adding any local jar files into spark interpreter\n# spark.jars \u003cpath_to_local_jar\u003e\n\n# spark.jars.packages can be used for adding packages into spark interpreter\n# The following is to add avro into your spark interpreter\nspark.jars.packages com.databricks:spark-avro_2.11:4.0.0\n\n\n\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:01:36.681", + "config": { + "editorSetting": { + "language": "text", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "colWidth": 6.0, + "editorMode": "ace/mode/text", + "fontSize": 9.0, + "results": {}, + "enabled": true, + "editorHide": false + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762323_630094194", + "id": "20180530-222209_612020876", + "dateCreated": "2020-04-30 10:46:02.324", + "status": "READY" + }, + { + "title": "", + "text": "%spark\n\nimport com.databricks.spark.avro._\n\nval df \u003d spark.read.format(\"com.databricks.spark.avro\").load(\"users.avro\")\ndf.printSchema\n\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 10:46:02.324", + "config": { + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "colWidth": 6.0, + "editorMode": "ace/mode/scala", + "fontSize": 9.0, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "import com.databricks.spark.avro._\ndf: org.apache.spark.sql.DataFrame \u003d [name: string, favorite_color: string ... 1 more field]\n+------+--------------+----------------+\n| name|favorite_color|favorite_numbers|\n+------+--------------+----------------+\n|Alyssa| null| [3, 9, 15, 20]|\n| Ben| red| []|\n+------+--------------+----------------+\n\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_60233930", + "id": "20180530-222838_1995256600", + "dateCreated": "2020-04-30 10:46:02.324", + "status": "READY" + }, + { + "title": "Enable Hive", + "text": "%md\n\nIf you want to work with hive tables, you need to enable hive via the following 2 steps:\n\n1. Set `zeppelin.spark.useHiveContext` to `true`\n2. Put `hive-site.xml` under `SPARK_CONF_DIR` (By default it is the conf folder of `SPARK_HOME`). \n\n**To be noticed**, You can only enable hive when specifying `SPARK_HOME` explicitly. It doens\u0027t work with zeppelin\u0027s embedded spark.\n\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 10:46:02.324", + "config": { + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "colWidth": 12.0, + "editorMode": "ace/mode/scala", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eIf you want to work with hive tables, you need to enable hive via the following 2 steps:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eSet \u003ccode\u003ezeppelin.spark.useHiveContext\u003c/code\u003e to \u003ccode\u003etrue\u003c/code\u003e\u003c/li\u003e\n \u003cli\u003ePut \u003ccode\u003ehive-site.xml\u003c/code\u003e under \u003ccode\u003eSPARK_CONF_DIR\u003c/code\u003e (By default it is the conf folder of \u003ccode\u003eSPARK_HOME\u003c/code\u003e).\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003e\u003cstrong\u003eTo be noticed\u003c/strong\u003e, You can only enable hive when specifying \u003ccode\u003eSPARK_HOME\u003c/code\u003e explicitly. It doens\u0026rsquo;t work with zeppelin\u0026rsquo;s embedded spark.\u003c/p\u003e\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_975709991", + "id": "20180601-095002_1719356880", + "dateCreated": "2020-04-30 10:46:02.324", + "status": "READY" + }, + { + "title": "Code Completion in Scala", + "text": "%md\n\nSpark interpreter provide code completion feature. As long as you type `tab`, code completion will start to work and provide you with a list of candiates. Here\u0027s one screenshot of how it works. \n\n**To be noticed**, code completion only works after spark interpreter is launched. So it will not work when you type code in the first paragraph as the spark interpreter is not launched yet. For me, usually I will run one simple code such as `sc.version` to launch spark interprter, then type my code to leverage the code completion of spark interpreter.\n\n![code_completion](https://user-images.githubusercontent.com/164491/40758276-1ab2783e-64bf-11e8-9c1e-d132455234b3.gif)", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:03:03.127", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eSpark interpreter provide code completion feature. As long as you type \u003ccode\u003etab\u003c/code\u003e, code completion will start to work and provide you with a list of candiates. Here\u0026rsquo;s one screenshot of how it works.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTo be noticed\u003c/strong\u003e, code completion only works after spark interpreter is launched. So it will not work when you type code in the first paragraph as the spark interpreter is not launched yet. For me, usually I will run one simple code such as \u003ccode\u003esc.version\u003c/code\u003e to launch spark interprter, then type my code to leverage the code completion of spark interpreter.\u003c/p\u003e\n\u003cp\u003e\u003cimg src\u003d\"https://user-images.githubusercontent.com/164491/40758276-1ab2783e-64bf-11e8-9c1e-d132455234b3.gif\" alt\u003d\"code_completion\" /\u003e\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_1893956125", + "id": "20180531-095404_2000387113", + "dateCreated": "2020-04-30 10:46:02.324", + "dateStarted": "2020-04-30 11:03:03.136", + "dateFinished": "2020-04-30 11:03:03.147", + "status": "FINISHED" + }, + { + "title": "PySpark", + "text": "%md\n\nFor using PySpark, you need to do some other pyspark configration besides the above spark configuration we mentioned before. The most important property you need to set is python path for both driver and executor. If you hit the following error, it means your python on driver is mismatched with that of executor. In this case you need to check the 2 properties: `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`. (You can use `spark.pyspark.python` and `spark.pyspark.driver.python` instead if you are using spark after 2.1.0)\n\n```\nPy4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, localhost, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):\n File \"/Users/jzhang/Java/lib/spark-2.3.0-bin-hadoop2.7/python/pyspark/worker.py\", line 175, in main\n (\"%d.%d\" % sys.version_info[:2], version))\nException: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.\n```\n\nAlso it is better to specify them in the `generic inline configuration` like the following paragraph.\n\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:04:18.086", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eFor using PySpark, you need to do some other pyspark configration besides the above spark configuration we mentioned before. The most important property you need to set is python path for both driver and executor. If you hit the following error, it means your python on driver is mismatched with that of executor. In this case you need to check the 2 properties: \u003ccode\u003ePYSPARK_PYTHON\u003c/code\u003e and \u003ccode\u003ePYSPARK_DRIVER_PYTHON\u003c/code\u003e. (You can use \u003ccode\u003espark.pyspark.python\u003c/code\u003e and \u003ccode\u003espark.pyspark.driver.python\u003c/code\u003e instead if you are using spark after 2.1.0)\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003ePy4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, localhost, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):\n File \u0026quot;/Users/jzhang/Java/lib/spark-2.3.0-bin-hadoop2.7/python/pyspark/worker.py\u0026quot;, line 175, in main\n (\u0026quot;%d.%d\u0026quot; % sys.version_info[:2], version))\nException: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eAlso it is better to specify them in the \u003ccode\u003egeneric inline configuration\u003c/code\u003e like the following paragraph.\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_1658396672", + "id": "20180531-104119_406393728", + "dateCreated": "2020-04-30 10:46:02.324", + "dateStarted": "2020-04-30 11:04:18.087", + "dateFinished": "2020-04-30 11:04:18.098", + "status": "FINISHED" + }, + { + "title": "", + "text": "%spark.conf\n\n# If you python path on driver and executor is the same, then you only need to set PYSPARK_PYTHON\nPYSPARK_PYTHON \u003cpython_path\u003e\nspark.pyspark.python \u003cpython_path\u003e\n\n# You need to set PYSPARK_DRIVER_PYTHON as well if your python path on driver is different from executors.\nPYSPARK_DRIVER_PYTHON \u003cpython_path\u003e\nspark.pyspark.driver.python \u003cpython_path\u003e\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:04:52.984", + "config": { + "editorSetting": { + "language": "text", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "colWidth": 12.0, + "editorMode": "ace/mode/text", + "fontSize": 9.0, + "editorHide": false, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_496073034", + "id": "20180531-110822_21877516", + "dateCreated": "2020-04-30 10:46:02.324", + "status": "READY" + }, + { + "title": "Use IPython", + "text": "%md\n\nStarting from Zeppelin 0.8.0, `ipython` is integrated into Zeppelin. And `PySparkInterpreter`(`%spark.pyspark`) would use `ipython` if it is avalible. It is recommended to use `ipython` interpreter as it provides more powerful feature than the old PythonInterpreter. Spark create a new interpreter called `IPySparkInterpreter` (`%spark.ipyspark`) which use IPython underneath. You can use all the `ipython` features in this IPySparkInterpreter. There\u0027s one ipython tutorial note in Zeppelin which you can refer for more details.\n\n`spark.pyspark` will try to use `ipython` if it is avalible, it will fall back to the old PySpark implemention if `ipython` is not available. But you can always use `ipython` via `%spark.ipyspark`.\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:10:07.426", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eStarting from Zeppelin 0.8.0, \u003ccode\u003eipython\u003c/code\u003e is integrated into Zeppelin. And \u003ccode\u003ePySparkInterpreter\u003c/code\u003e(\u003ccode\u003e%spark.pyspark\u003c/code\u003e) would use \u003ccode\u003eipython\u003c/code\u003e if it is avalible. It is recommended to use \u003ccode\u003eipython\u003c/code\u003e interpreter as it provides more powerful feature than the old PythonInterpreter. Spark create a new interpreter called \u003ccode\u003eIPySparkInterpreter\u003c/code\u003e (\u003ccode\u003e%spark.ipyspark\u003c/code\u003e) which use IPython underneath. You can use all the \u003ccode\u003eipython\u003c/code\u003e features in this IPySparkInterpreter. There\u0026rsquo;s one ipython tutorial note in Zeppelin which you can refer for more details.\u003c/p\u003e\n\u003cp\u003e\u003ccode\u003espark.pyspark\u003c/code\u003e will try to use \u003ccode\u003eipython\u003c/code\u003e if it is avalible, it will fall back to the old PySpark implemention if \u003ccode\u003eipython\u003c/code\u003e is not available. But you can always use \u003ccode\u003eipython\u003c/code\u003e via \u003ccode\u003e%spark.ipyspark\u003c/code\u003e.\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_1303560480", + "id": "20180531-104646_1689036640", + "dateCreated": "2020-04-30 10:46:02.324", + "dateStarted": "2020-04-30 11:10:07.428", + "dateFinished": "2020-04-30 11:10:07.436", + "status": "FINISHED" + }, + { + "title": "Enable Impersonation", + "text": "%md\n\nBy default, all the spark interpreter will run as user who launch zeppelin server. This is OK for single user, but expose potential issue for multiple user scenaior. For multiple user scenaior, it is better to enable impersonation for Spark Interpreter in yarn mode.\nThere are 3 steps you need to do to enable impersonation.\n\n1. Enable it in Spark Interpreter Setting. You have to choose Isolated Per User mode, and then click the impersonation option as following screenshot. ![screen shot 2018-05-31 at 1 35 34 pm](https://user-images.githubusercontent.com/164491/40763519-b76fa56c-64d7-11e8-9d49-53928a04ba5d.png)\n2. Add the following configure in core-site.xml of your hadoop cluser, and then restart the hadoop cluster (restart hdfs and yarn). \u003cuser_name\u003e is the user who launch zeppelin server. \n\n```xml\n\u003cproperty\u003e\n \u003cname\u003ehadoop.proxyuser.\u003cuser_name\u003e.groups\u003c/name\u003e\n \u003cvalue\u003e*\u003c/value\u003e\n\u003c/property\u003e\n\n\u003cproperty\u003e\n \u003cname\u003ehadoop.proxyuser.\u003cuser_name\u003e.hosts\u003c/name\u003e\n \u003cvalue\u003e*\u003c/value\u003e\n\u003c/property\u003e\n```\n3. Create user home folder on hdfs and also set the right permission on this folder. Because spark will use this home folder as staging directory which is used to upload spark jars and other dependencies that is needed by yarn container. Here\u0027s a sample output of command `hadoop fs -ls /user`\n```\ndrwxr-xr-x - user1 supergroup 0 2018-05-31 13:41 /user/user1\ndrwxr-xr-x - user2 supergroup 0 2017-01-10 12:31 /user/user2\n```\nYou can use the following command to create home folder for `user1` and also set proper permission.\n```\nhadoop fs -mkdir /uesr/user1\nhadoop fs -chown user1 /user/user1\n```\n\nAfter all these steps, you can see impersonation should work in yarn web ui. E.g. In the following screenshot, we can see that the yarn app run as user `user1` instead of the user who run zeppelin server.\n\n![screen shot 2018-05-31 at 1 47 05 pm](https://user-images.githubusercontent.com/164491/40763896-330dc8f6-64d9-11e8-9737-92d8371e85ae.png)", + "user": "anonymous", + "dateUpdated": "2020-04-30 11:11:51.383", + "config": { + "tableHide": false, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "colWidth": 12.0, + "editorMode": "ace/mode/markdown", + "fontSize": 9.0, + "editorHide": true, + "title": true, + "results": {}, + "enabled": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eBy default, all the spark interpreter will run as user who launch zeppelin server. This is OK for single user, but expose potential issue for multiple user scenaior. For multiple user scenaior, it is better to enable impersonation for Spark Interpreter in yarn mode.\u003cbr /\u003e\nThere are 3 steps you need to do to enable impersonation.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eEnable it in Spark Interpreter Setting. You have to choose Isolated Per User mode, and then click the impersonation option as following screenshot. \u003cimg src\u003d\"https://user-images.githubusercontent.com/164491/40763519-b76fa56c-64d7-11e8-9d49-53928a04ba5d.png\" alt\u003d\"screen shot 2018-05-31 at 1 35 34 pm\" /\u003e\u003c/li\u003e\n\u003cli\u003eAdd the following configure in core-site.xml of your hadoop cluser, and then restart the hadoop cluster (restart hdfs and yarn). \u0026lt;user_name\u0026gt; is the user who launch zeppelin server.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cpre\u003e\u003ccode class\u003d\"language-xml\"\u003e\u0026lt;property\u0026gt;\n \u0026lt;name\u0026gt;hadoop.proxyuser.\u0026lt;user_name\u0026gt;.groups\u0026lt;/name\u0026gt;\n \u0026lt;value\u0026gt;*\u0026lt;/value\u0026gt;\n\u0026lt;/property\u0026gt;\n\n\u0026lt;property\u0026gt;\n \u0026lt;name\u0026gt;hadoop.proxyuser.\u0026lt;user_name\u0026gt;.hosts\u0026lt;/name\u0026gt;\n \u0026lt;value\u0026gt;*\u0026lt;/value\u0026gt;\n\u0026lt;/property\u0026gt;\n\u003c/code\u003e\u003c/pre\u003e\n\u003col start\u003d\"3\"\u003e\n\u003cli\u003eCreate user home folder on hdfs and also set the right permission on this folder. Because spark will use this home folder as staging directory which is used to upload spark jars and other dependencies that is needed by yarn container. Here\u0026rsquo;s a sample output of command \u003ccode\u003ehadoop fs -ls /user\u003c/code\u003e\u003c/li\u003e\n\u003c/ol\u003e\n\u003cpre\u003e\u003ccode\u003edrwxr-xr-x - user1 supergroup 0 2018-05-31 13:41 /user/user1\ndrwxr-xr-x - user2 supergroup 0 2017-01-10 12:31 /user/user2\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eYou can use the following command to create home folder for \u003ccode\u003euser1\u003c/code\u003e and also set proper permission.\u003c/p\u003e\n\u003cpre\u003e\u003ccode\u003ehadoop fs -mkdir /uesr/user1\nhadoop fs -chown user1 /user/user1\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eAfter all these steps, you can see impersonation should work in yarn web ui. E.g. In the following screenshot, we can see that the yarn app run as user \u003ccode\u003euser1\u003c/code\u003e instead of the user who run zeppelin server.\u003c/p\u003e\n\u003cp\u003e\u003cimg src\u003d\"https://user-images.githubusercontent.com/164491/40763896-330dc8f6-64d9-11e8-9737-92d8371e85ae.png\" alt\u003d\"screen shot 2018-05-31 at 1 47 05 pm\" /\u003e\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762324_1470430553", + "id": "20180531-105943_1008146830", + "dateCreated": "2020-04-30 10:46:02.324", + "dateStarted": "2020-04-30 11:11:51.385", + "dateFinished": "2020-04-30 11:11:51.395", + "status": "FINISHED" + }, + { + "title": "", + "text": "%spark\n", + "user": "anonymous", + "dateUpdated": "2020-04-30 10:46:02.325", + "config": {}, + "settings": { + "params": {}, + "forms": {} + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588214762325_1205048464", + "id": "20180531-134529_63265354", + "dateCreated": "2020-04-30 10:46:02.325", + "status": "READY" + } + ], + "name": "1. Spark Interpreter Introduction", + "id": "2F8KN6TKK", + "defaultInterpreterGroup": "spark", + "version": "0.9.0-SNAPSHOT", + "noteParams": {}, + "noteForms": {}, + "angularObjects": {}, + "config": { + "isZeppelinNotebookCronEnable": false + }, + "info": {} +} \ No newline at end of file diff --git a/notebook/Spark Tutorial/Spark Basic Features_2A94M5J1Z.zpln b/notebook/Spark Tutorial/2. Spark Basic Features_2A94M5J1Z.zpln similarity index 92% rename from notebook/Spark Tutorial/Spark Basic Features_2A94M5J1Z.zpln rename to notebook/Spark Tutorial/2. Spark Basic Features_2A94M5J1Z.zpln index a6d29dab931..b801f1d87ae 100644 --- a/notebook/Spark Tutorial/Spark Basic Features_2A94M5J1Z.zpln +++ b/notebook/Spark Tutorial/2. Spark Basic Features_2A94M5J1Z.zpln @@ -54,7 +54,7 @@ "title": "Load data into table", "text": "import org.apache.commons.io.IOUtils\nimport java.net.URL\nimport java.nio.charset.Charset\n\n// Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext or SqlContext)\n// So you don\u0027t need create them manually\n\n// load bank data\nval bankText \u003d sc.parallelize(\n IOUtils.toString(\n new URL(\"https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv\"),\n Charset.forName(\"utf8\")).split(\"\\n\"))\n\ncase class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)\n\nval bank \u003d bankText.map(s \u003d\u003e s.split(\";\")).filter(s \u003d\u003e s(0) !\u003d \"\\\"age\\\"\").map(\n s \u003d\u003e Bank(s(0).toInt, \n s(1).replaceAll(\"\\\"\", \"\"),\n s(2).replaceAll(\"\\\"\", \"\"),\n s(3).replaceAll(\"\\\"\", \"\"),\n s(5).replaceAll(\"\\\"\", \"\").toInt\n )\n).toDF()\nbank.registerTempTable(\"bank\")", "user": "anonymous", - "dateUpdated": "2020-01-21 22:58:52.064", + "dateUpdated": "2020-05-08 11:18:36.766", "config": { "colWidth": 12.0, "title": true, @@ -95,14 +95,14 @@ "jobName": "paragraph_1423500779206_-1502780787", "id": "20150210-015259_1403135953", "dateCreated": "2015-02-10 01:52:59.000", - "dateStarted": "2020-01-21 22:58:52.084", - "dateFinished": "2020-01-21 22:59:18.740", + "dateStarted": "2020-05-08 11:18:36.791", + "dateFinished": "2020-05-08 11:19:58.268", "status": "FINISHED" }, { "text": "%sql \nselect age, count(1) value\nfrom bank \nwhere age \u003c 30 \ngroup by age \norder by age", "user": "anonymous", - "dateUpdated": "2020-01-19 16:58:04.490", + "dateUpdated": "2020-05-04 23:34:43.954", "config": { "colWidth": 4.0, "results": [ @@ -142,7 +142,9 @@ "enabled": true, "editorSetting": { "language": "sql", - "editOnDblClick": false + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true }, "editorMode": "ace/mode/sql", "fontSize": 9.0 @@ -165,14 +167,14 @@ "jobName": "paragraph_1423500782552_-1439281894", "id": "20150210-015302_1492795503", "dateCreated": "2015-02-10 01:53:02.000", - "dateStarted": "2016-12-17 15:30:13.000", - "dateFinished": "2016-12-17 15:31:04.000", + "dateStarted": "2020-05-04 23:34:43.959", + "dateFinished": "2020-05-04 23:34:52.126", "status": "FINISHED" }, { "text": "%sql \nselect age, count(1) value \nfrom bank \nwhere age \u003c ${maxAge\u003d30} \ngroup by age \norder by age", "user": "anonymous", - "dateUpdated": "2020-01-19 16:58:04.541", + "dateUpdated": "2020-05-04 23:34:45.514", "config": { "colWidth": 4.0, "results": [ @@ -212,7 +214,9 @@ "enabled": true, "editorSetting": { "language": "sql", - "editOnDblClick": false + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true }, "editorMode": "ace/mode/sql", "fontSize": 9.0 @@ -230,28 +234,19 @@ } } }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "age\tvalue\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n30\t150\n31\t199\n32\t224\n33\t186\n34\t231\n" - } - ] - }, "apps": [], "progressUpdateIntervalMs": 500, "jobName": "paragraph_1423720444030_-1424110477", "id": "20150212-145404_867439529", "dateCreated": "2015-02-12 14:54:04.000", - "dateStarted": "2016-12-17 15:30:58.000", - "dateFinished": "2016-12-17 15:31:07.000", + "dateStarted": "2020-05-04 23:34:45.520", + "dateFinished": "2020-05-04 23:34:54.074", "status": "FINISHED" }, { "text": "%sql \nselect age, count(1) value \nfrom bank \nwhere marital\u003d\"${marital\u003dsingle,single|divorced|married}\" \ngroup by age \norder by age", "user": "anonymous", - "dateUpdated": "2020-01-19 16:58:04.590", + "dateUpdated": "2020-05-04 23:34:47.079", "config": { "colWidth": 4.0, "results": [ @@ -291,7 +286,9 @@ "enabled": true, "editorSetting": { "language": "sql", - "editOnDblClick": false + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true }, "editorMode": "ace/mode/sql", "fontSize": 9.0, @@ -335,8 +332,8 @@ "jobName": "paragraph_1423836262027_-210588283", "id": "20150213-230422_1600658137", "dateCreated": "2015-02-13 23:04:22.000", - "dateStarted": "2016-12-17 15:31:05.000", - "dateFinished": "2016-12-17 15:31:09.000", + "dateStarted": "2020-05-04 23:34:52.255", + "dateFinished": "2020-05-04 23:34:55.739", "status": "FINISHED" }, { @@ -445,19 +442,15 @@ "status": "READY" } ], - "name": "Basic Features (Spark)", + "name": "2. Spark Basic Features", "id": "2A94M5J1Z", "defaultInterpreterGroup": "spark", - "permissions": {}, "noteParams": {}, "noteForms": {}, - "angularObjects": { - "2C73DY9P9:shared_process": [] - }, + "angularObjects": {}, "config": { "looknfeel": "default", - "isZeppelinNotebookCronEnable": true + "isZeppelinNotebookCronEnable": false }, - "info": {}, - "path": "/Spark Tutorial/Basic Features (Spark)" + "info": {} } \ No newline at end of file diff --git a/notebook/Spark Tutorial/Spark SQL (PySpark)_2EWM84JXA.zpln b/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln similarity index 100% rename from notebook/Spark Tutorial/Spark SQL (PySpark)_2EWM84JXA.zpln rename to notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln diff --git a/notebook/Spark Tutorial/Spark SQL (Scala)_2EYUV26VR.zpln b/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln similarity index 100% rename from notebook/Spark Tutorial/Spark SQL (Scala)_2EYUV26VR.zpln rename to notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln diff --git a/notebook/Spark Tutorial/Spark MlLib_2EZFM3GJA.zpln b/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln similarity index 100% rename from notebook/Spark Tutorial/Spark MlLib_2EZFM3GJA.zpln rename to notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln diff --git a/notebook/Spark Tutorial/SparkR Basics_2BWJFTXKM.zpln b/notebook/Spark Tutorial/5. SparkR Basics_2BWJFTXKM.zpln similarity index 100% rename from notebook/Spark Tutorial/SparkR Basics_2BWJFTXKM.zpln rename to notebook/Spark Tutorial/5. SparkR Basics_2BWJFTXKM.zpln diff --git a/notebook/Spark Tutorial/SparkR Shiny App_2F1CHQ4TT.zpln b/notebook/Spark Tutorial/6. SparkR Shiny App_2F1CHQ4TT.zpln similarity index 100% rename from notebook/Spark Tutorial/SparkR Shiny App_2F1CHQ4TT.zpln rename to notebook/Spark Tutorial/6. SparkR Shiny App_2F1CHQ4TT.zpln diff --git a/notebook/Spark Tutorial/7. Spark Delta Lake Tutorial_2F8VDBMMT.zpln b/notebook/Spark Tutorial/7. Spark Delta Lake Tutorial_2F8VDBMMT.zpln new file mode 100644 index 00000000000..97f011c3463 --- /dev/null +++ b/notebook/Spark Tutorial/7. Spark Delta Lake Tutorial_2F8VDBMMT.zpln @@ -0,0 +1,311 @@ +{ + "paragraphs": [ + { + "text": "%md\n\n# Introduction\n\nThis is a tutorial for using spark [delta lake](https://delta.io/) in Zeppelin. You need to run the following paragraph first to load delta package.\n\n", + "user": "anonymous", + "dateUpdated": "2020-05-04 14:11:57.999", + "config": { + "colWidth": 12.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "markdown", + "editOnDblClick": true, + "completionKey": "TAB", + "completionSupport": false + }, + "editorMode": "ace/mode/markdown", + "editorHide": true, + "tableHide": false + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "HTML", + "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis is a tutorial for using spark \u003ca href\u003d\"https://delta.io/\"\u003edelta lake\u003c/a\u003e in Zeppelin. You need to run the following paragraph first to load delta package.\u003c/p\u003e\n\n\u003c/div\u003e" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588572279774_1507831415", + "id": "paragraph_1588572279774_1507831415", + "dateCreated": "2020-05-04 14:04:39.775", + "dateStarted": "2020-05-04 14:11:57.999", + "dateFinished": "2020-05-04 14:11:58.021", + "status": "FINISHED" + }, + { + "text": "%spark.conf\n\nspark.jars.packages io.delta:delta-core_2.11:0.6.0", + "user": "anonymous", + "dateUpdated": "2020-05-04 14:12:12.254", + "config": { + "colWidth": 12.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "text", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/text" + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588147206215_1200788867", + "id": "paragraph_1588147206215_1200788867", + "dateCreated": "2020-04-29 16:00:06.215", + "dateStarted": "2020-04-29 16:10:33.429", + "dateFinished": "2020-04-29 16:10:33.434", + "status": "FINISHED" + }, + { + "title": "Create a table", + "text": "%spark\n\nval data \u003d spark.range(0, 5)\ndata.write.format(\"delta\").save(\"/tmp/delta-table\")\n", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:13:31.957", + "config": { + "colWidth": 6.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/scala", + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.spark.sql.Dataset[Long]\u001b[0m \u003d [id: bigint]\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588147833426_1914590471", + "id": "paragraph_1588147833426_1914590471", + "dateCreated": "2020-04-29 16:10:33.426", + "dateStarted": "2020-04-29 16:11:45.197", + "dateFinished": "2020-04-29 16:11:49.694", + "status": "FINISHED" + }, + { + "title": "Read a table", + "text": "%spark\n\nval df \u003d spark.read.format(\"delta\").load(\"/tmp/delta-table\")\ndf.show()", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:13:35.297", + "config": { + "colWidth": 6.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/scala", + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "+---+\n| id|\n+---+\n| 0|\n| 3|\n| 1|\n| 2|\n| 4|\n+---+\n\n\u001b[1m\u001b[34mdf\u001b[0m: \u001b[1m\u001b[32morg.apache.spark.sql.DataFrame\u001b[0m \u003d [id: bigint]\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588147853461_1624743216", + "id": "paragraph_1588147853461_1624743216", + "dateCreated": "2020-04-29 16:10:53.462", + "dateStarted": "2020-04-29 16:11:55.302", + "dateFinished": "2020-04-29 16:11:56.658", + "status": "FINISHED" + }, + { + "title": "Overwrite", + "text": "%spark\n\nval data \u003d spark.range(5, 10)\ndata.write.format(\"delta\").mode(\"overwrite\").save(\"/tmp/delta-table\")\ndf.show()", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:14:41.855", + "config": { + "colWidth": 6.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/scala", + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "+---+\n| id|\n+---+\n| 5|\n| 6|\n| 7|\n| 9|\n| 8|\n+---+\n\n\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.spark.sql.Dataset[Long]\u001b[0m \u003d [id: bigint]\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588148062120_1790808564", + "id": "paragraph_1588148062120_1790808564", + "dateCreated": "2020-04-29 16:14:22.120", + "dateStarted": "2020-04-29 16:14:41.863", + "dateFinished": "2020-04-29 16:14:45.093", + "status": "FINISHED" + }, + { + "title": "Conditional update without overwrite", + "text": "%spark\n\nimport io.delta.tables._\nimport org.apache.spark.sql.functions._\n\nval deltaTable \u003d DeltaTable.forPath(\"/tmp/delta-table\")\n\n// Update every even value by adding 100 to it\ndeltaTable.update(\n condition \u003d expr(\"id % 2 \u003d\u003d 0\"),\n set \u003d Map(\"id\" -\u003e expr(\"id + 100\")))\n\n// Delete every even value\ndeltaTable.delete(condition \u003d expr(\"id % 2 \u003d\u003d 0\"))\n\n// Upsert (merge) new data\nval newData \u003d spark.range(0, 20).toDF\n\ndeltaTable.as(\"oldData\")\n .merge(\n newData.as(\"newData\"),\n \"oldData.id \u003d newData.id\")\n .whenMatched\n .update(Map(\"id\" -\u003e col(\"newData.id\")))\n .whenNotMatched\n .insert(Map(\"id\" -\u003e col(\"newData.id\")))\n .execute()\n\ndeltaTable.toDF.show()", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:15:33.129", + "config": { + "colWidth": 6.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/scala", + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "+---+\n| id|\n+---+\n| 15|\n| 16|\n| 1|\n| 18|\n| 14|\n| 4|\n| 8|\n| 17|\n| 0|\n| 10|\n| 6|\n| 2|\n| 3|\n| 13|\n| 5|\n| 12|\n| 19|\n| 7|\n| 9|\n| 11|\n+---+\n\nimport io.delta.tables._\nimport org.apache.spark.sql.functions._\n\u001b[1m\u001b[34mdeltaTable\u001b[0m: \u001b[1m\u001b[32mio.delta.tables.DeltaTable\u001b[0m \u003d io.delta.tables.DeltaTable@355329ee\n\u001b[1m\u001b[34mnewData\u001b[0m: \u001b[1m\u001b[32morg.apache.spark.sql.DataFrame\u001b[0m \u003d [id: bigint]\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588147954117_626957150", + "id": "paragraph_1588147954117_626957150", + "dateCreated": "2020-04-29 16:12:34.117", + "dateStarted": "2020-04-29 16:15:33.132", + "dateFinished": "2020-04-29 16:15:48.086", + "status": "FINISHED" + }, + { + "title": "Read older versions of data using time travel", + "text": "%spark\n\nval df \u003d spark.read.format(\"delta\").option(\"versionAsOf\", 0).load(\"/tmp/delta-table\")\ndf.show()", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:16:04.935", + "config": { + "colWidth": 6.0, + "fontSize": 9.0, + "enabled": true, + "results": {}, + "editorSetting": { + "language": "scala", + "editOnDblClick": false, + "completionKey": "TAB", + "completionSupport": true + }, + "editorMode": "ace/mode/scala", + "title": true + }, + "settings": { + "params": {}, + "forms": {} + }, + "results": { + "code": "SUCCESS", + "msg": [ + { + "type": "TEXT", + "data": "+---+\n| id|\n+---+\n| 0|\n| 3|\n| 1|\n| 2|\n| 4|\n+---+\n\n\u001b[1m\u001b[34mdf\u001b[0m: \u001b[1m\u001b[32morg.apache.spark.sql.DataFrame\u001b[0m \u003d [id: bigint]\n" + } + ] + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588148133131_1770029903", + "id": "paragraph_1588148133131_1770029903", + "dateCreated": "2020-04-29 16:15:33.131", + "dateStarted": "2020-04-29 16:16:04.937", + "dateFinished": "2020-04-29 16:16:08.415", + "status": "FINISHED" + }, + { + "text": "%spark\n", + "user": "anonymous", + "dateUpdated": "2020-04-29 16:18:21.603", + "config": {}, + "settings": { + "params": {}, + "forms": {} + }, + "apps": [], + "progressUpdateIntervalMs": 500, + "jobName": "paragraph_1588148301603_1997345504", + "id": "paragraph_1588148301603_1997345504", + "dateCreated": "2020-04-29 16:18:21.603", + "status": "READY" + } + ], + "name": "6. Spark Delta Lake Tutorial", + "id": "2F8VDBMMT", + "defaultInterpreterGroup": "spark", + "version": "0.9.0-SNAPSHOT", + "noteParams": {}, + "noteForms": {}, + "angularObjects": {}, + "config": { + "isZeppelinNotebookCronEnable": false + }, + "info": {} +} \ No newline at end of file diff --git a/notebook/~Trash/Zeppelin Tutorial/Flink Batch Tutorial_2EN1E1ATY.zpln b/notebook/~Trash/Zeppelin Tutorial/Flink Batch Tutorial_2EN1E1ATY.zpln deleted file mode 100644 index 018089e341c..00000000000 --- a/notebook/~Trash/Zeppelin Tutorial/Flink Batch Tutorial_2EN1E1ATY.zpln +++ /dev/null @@ -1,602 +0,0 @@ -{ - "paragraphs": [ - { - "title": "Introduction", - "text": "%md\n\nThis is a tutorial note for Flink batch scenario (`To be noticed, you need to use flink 1.9 or afterwards`), . You can run flink scala api via `%flink` and run flink batch sql via `%flink.bsql`. \nThis note use flink\u0027s DataSet api to demonstrate flink\u0027s batch capablity. DataSet is only supported by flink planner, so here we have to specify the planner `zeppelin.flink.planner` as `flink`, otherwise it would use blink planner by default which doesn\u0027t support DataSet api.\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:15:54.910", - "config": { - "tableHide": false, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true, - "completionKey": "TAB", - "completionSupport": false - }, - "colWidth": 12.0, - "editorMode": "ace/mode/markdown", - "fontSize": 9.0, - "editorHide": true, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial note for Flink batch scenario (\u003ccode\u003eTo be noticed, you need to use flink 1.9 or afterwards\u003c/code\u003e), . You can run flink scala api via \u003ccode\u003e%flink\u003c/code\u003e and run flink batch sql via \u003ccode\u003e%flink.bsql\u003c/code\u003e.\u003cbr/\u003eThis note use flink\u0026rsquo;s DataSet api to demonstrate flink\u0026rsquo;s batch capablity. DataSet is only supported by flink planner, so here we have to specify the planner \u003ccode\u003ezeppelin.flink.planner\u003c/code\u003e as \u003ccode\u003eflink\u003c/code\u003e, otherwise it would use blink planner by default which doesn\u0026rsquo;t support DataSet api.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641095_-188362229", - "id": "paragraph_1547794482637_957545547", - "dateCreated": "2019-09-26 17:20:41.095", - "dateStarted": "2019-10-08 15:15:54.910", - "dateFinished": "2019-10-08 15:15:54.923", - "status": "FINISHED" - }, - { - "title": "Configure Flink Interpreter", - "text": "%flink.conf\n\nFLINK_HOME \u003cFLINK_INSTALLATION\u003e\n# DataSet is only supported in flink planner, so here we use flink planner. By default it is blink planner\nzeppelin.flink.planner flink\n\n\n", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:37:39.948", - "config": { - "editorSetting": { - "language": "text", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/text", - "fontSize": 9.0, - "editorHide": false, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641097_1856810430", - "id": "paragraph_1546565092490_1952685806", - "dateCreated": "2019-09-26 17:20:41.097", - "dateStarted": "2019-10-11 10:00:42.031", - "dateFinished": "2019-10-11 10:00:42.052", - "status": "FINISHED" - }, - { - "text": "%sh\n\ncd /tmp\nwget https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:16:45.127", - "config": { - "runOnSelectionChange": true, - "title": false, - "checkEmpty": true, - "colWidth": 12.0, - "fontSize": 9.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "sh", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": false - }, - "editorMode": "ace/mode/sh" - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "--2019-10-08 15:16:46-- https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv\nResolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.120.2\nConnecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.120.2|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 461474 (451K) [application/octet-stream]\nSaving to: \u0027bank.csv\u0027\n\n 0K .......... .......... .......... .......... .......... 11% 91.0K 4s\n 50K .......... .......... .......... .......... .......... 22% 568K 2s\n 100K .......... .......... .......... .......... .......... 33% 372K 2s\n 150K .......... .......... .......... .......... .......... 44% 176K 1s\n 200K .......... .......... .......... .......... .......... 55% 626K 1s\n 250K .......... .......... .......... .......... .......... 66% 326K 1s\n 300K .......... .......... .......... .......... .......... 77% 433K 0s\n 350K .......... .......... .......... .......... .......... 88% 77.0K 0s\n 400K .......... .......... .......... .......... .......... 99% 77.4M 0s\n 450K 100% 13.2K\u003d2.1s\n\n2019-10-08 15:16:50 (219 KB/s) - \u0027bank.csv\u0027 saved [461474/461474]\n\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489915993_-1377803690", - "id": "paragraph_1569489915993_-1377803690", - "dateCreated": "2019-09-26 17:25:15.993", - "dateStarted": "2019-10-08 15:16:45.132", - "dateFinished": "2019-10-08 15:16:50.387", - "status": "FINISHED" - }, - { - "title": "Load Bank Data", - "text": "%flink\n\nval bankText \u003d benv.readTextFile(\"/tmp/bank.csv\")\nval bank \u003d bankText.map(s \u003d\u003e s.split(\";\")).filter(s \u003d\u003e s(0) !\u003d \"\\\"age\\\"\").map(\n s \u003d\u003e (s(0).toInt,\n s(1).replaceAll(\"\\\"\", \"\"),\n s(2).replaceAll(\"\\\"\", \"\"),\n s(3).replaceAll(\"\\\"\", \"\"),\n s(5).replaceAll(\"\\\"\", \"\").toInt\n )\n )\n\nbtenv.registerDataSet(\"bank\", bank, \u0027age, \u0027job, \u0027marital, \u0027education, \u0027balance)\n\n\n", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:01:44.897", - "config": { - "editorSetting": { - "language": "scala", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/scala", - "fontSize": 9.0, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "table", - "height": 111.0, - "optionOpen": false - } - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "\u001b[1m\u001b[34mbankText\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@683ccfd\n\u001b[1m\u001b[34mbank\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[(Int, String, String, String, Int)]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@35e6c08f\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641099_-2030350664", - "id": "paragraph_1546584347815_1533642635", - "dateCreated": "2019-09-26 17:20:41.099", - "dateStarted": "2019-10-11 10:01:44.904", - "dateFinished": "2019-10-11 10:02:12.333", - "status": "FINISHED" - }, - { - "text": "%flink.bsql\n\ndescribe bank\n\n\n\n\n\n\n\n\n", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:07:51.800", - "config": { - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "runOnSelectionChange": true, - "title": false, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "table", - "height": 300.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "table": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - } - }, - "commonSetting": {} - } - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "Column\tType\nOptional[age]\tOptional[INT]\nOptional[job]\tOptional[STRING]\nOptional[marital]\tOptional[STRING]\nOptional[education]\tOptional[STRING]\nOptional[balance]\tOptional[INT]\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491358206_1385745589", - "id": "paragraph_1569491358206_1385745589", - "dateCreated": "2019-09-26 17:49:18.206", - "dateStarted": "2019-10-11 10:07:51.808", - "dateFinished": "2019-10-11 10:07:52.076", - "status": "FINISHED" - }, - { - "text": "%flink.bsql\n\nselect age, count(1) as v\nfrom bank \nwhere age \u003c 30 \ngroup by age \norder by age", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:07:53.645", - "config": { - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "runOnSelectionChange": true, - "title": false, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "table", - "height": 300.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "age": "string", - "v": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - } - }, - "commonSetting": {} - } - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "age\tv\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n" - }, - { - "type": "TEXT", - "data": "" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491511585_-677666348", - "id": "paragraph_1569491511585_-677666348", - "dateCreated": "2019-09-26 17:51:51.585", - "dateStarted": "2019-10-11 10:07:53.654", - "dateFinished": "2019-10-11 10:08:10.439", - "status": "FINISHED" - }, - { - "text": "%flink.bsql\n\nselect age, count(1) as v \nfrom bank \nwhere age \u003c ${maxAge\u003d30} \ngroup by age \norder by age", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:08:13.910", - "config": { - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "runOnSelectionChange": true, - "title": false, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "pieChart", - "height": 300.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "age": "string", - "v": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - }, - "multiBarChart": { - "rotate": { - "degree": "-45" - }, - "xLabelStatus": "default" - } - }, - "commonSetting": {}, - "keys": [ - { - "name": "age", - "index": 0.0, - "aggr": "sum" - } - ], - "groups": [], - "values": [ - { - "name": "v", - "index": 1.0, - "aggr": "sum" - } - ] - }, - "helium": {} - } - }, - "enabled": true - }, - "settings": { - "params": { - "maxAge": "40" - }, - "forms": { - "maxAge": { - "type": "TextBox", - "name": "maxAge", - "defaultValue": "30", - "hidden": false - } - } - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "age\tv\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n30\t150\n31\t199\n32\t224\n33\t186\n34\t231\n35\t180\n36\t188\n37\t161\n38\t159\n39\t130\n" - }, - { - "type": "TEXT", - "data": "" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641100_115468969", - "id": "paragraph_1546592465802_-1181051373", - "dateCreated": "2019-09-26 17:20:41.100", - "dateStarted": "2019-10-11 10:08:13.923", - "dateFinished": "2019-10-11 10:08:14.876", - "status": "FINISHED" - }, - { - "text": "%flink.bsql\n\nselect age, count(1) as v \nfrom bank \nwhere marital\u003d\u0027${marital\u003dsingle,single|divorced|married}\u0027 \ngroup by age \norder by age", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:37:59.493", - "config": { - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "runOnSelectionChange": true, - "title": false, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "multiBarChart", - "height": 300.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "age": "string", - "v": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - }, - "multiBarChart": { - "rotate": { - "degree": "-45" - }, - "xLabelStatus": "default" - }, - "lineChart": { - "rotate": { - "degree": "-45" - }, - "xLabelStatus": "default" - }, - "stackedAreaChart": { - "rotate": { - "degree": "-45" - }, - "xLabelStatus": "default" - } - }, - "commonSetting": {}, - "keys": [ - { - "name": "age", - "index": 0.0, - "aggr": "sum" - } - ], - "groups": [], - "values": [ - { - "name": "v", - "index": 1.0, - "aggr": "sum" - } - ] - }, - "helium": {} - } - }, - "enabled": true - }, - "settings": { - "params": { - "marital": "married" - }, - "forms": { - "marital": { - "type": "Select", - "options": [ - { - "value": "single" - }, - { - "value": "divorced" - }, - { - "value": "married" - } - ], - "name": "marital", - "defaultValue": "single", - "hidden": false - } - } - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "age\tv\n23\t3\n24\t11\n25\t11\n26\t18\n27\t26\n28\t23\n29\t37\n30\t56\n31\t104\n32\t105\n33\t103\n34\t142\n35\t109\n36\t117\n37\t100\n38\t99\n39\t88\n40\t105\n41\t97\n42\t91\n43\t79\n44\t68\n45\t76\n46\t82\n47\t78\n48\t91\n49\t87\n50\t74\n51\t63\n52\t66\n53\t75\n54\t56\n55\t68\n56\t50\n57\t78\n58\t67\n59\t56\n60\t36\n61\t15\n62\t5\n63\t7\n64\t6\n65\t4\n66\t7\n67\t5\n68\t1\n69\t5\n70\t5\n71\t5\n72\t4\n73\t6\n74\t2\n75\t3\n76\t1\n77\t5\n78\t2\n79\t3\n80\t6\n81\t1\n83\t2\n86\t1\n87\t1\n" - }, - { - "type": "TEXT", - "data": "" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641100_-1622522691", - "id": "paragraph_1546592478596_-1766740165", - "dateCreated": "2019-09-26 17:20:41.100", - "dateStarted": "2019-10-11 10:08:16.423", - "dateFinished": "2019-10-11 10:08:17.295", - "status": "FINISHED" - }, - { - "text": "\n", - "user": "anonymous", - "dateUpdated": "2019-09-26 17:54:48.292", - "config": { - "colWidth": 12.0, - "fontSize": 9.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "editorMode": "ace/mode/scala" - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569489641101_-1182720873", - "id": "paragraph_1553093710610_-1734599499", - "dateCreated": "2019-09-26 17:20:41.101", - "status": "READY" - } - ], - "name": "Flink Batch Tutorial", - "id": "2EN1E1ATY", - "defaultInterpreterGroup": "spark", - "version": "0.9.0-SNAPSHOT", - "permissions": {}, - "noteParams": {}, - "noteForms": {}, - "angularObjects": {}, - "config": { - "isZeppelinNotebookCronEnable": false - }, - "info": {}, - "path": "/Flink Batch Tutorial" -} \ No newline at end of file diff --git a/notebook/~Trash/Zeppelin Tutorial/Flink Stream Tutorial_2ER62Y5VJ.zpln b/notebook/~Trash/Zeppelin Tutorial/Flink Stream Tutorial_2ER62Y5VJ.zpln deleted file mode 100644 index b8f365809f2..00000000000 --- a/notebook/~Trash/Zeppelin Tutorial/Flink Stream Tutorial_2ER62Y5VJ.zpln +++ /dev/null @@ -1,427 +0,0 @@ -{ - "paragraphs": [ - { - "title": "Introduction", - "text": "%md\n\nThis is a tutorial note for Flink Streaming application. You can run flink scala api via `%flink` and run flink stream sql via `%flink.ssql`. We provide 2 examples in this tutorial:\n\n* Classical word count example via Flink streaming\n* We simulate a web log stream source, and then query and visualize this streaming data in Zeppelin.\n\n\nFor now, the capability of supporting flink streaming job is very limited in Zeppelin. For example, canceling job is not supported, these capability depends on flink itself, flink community is working on that for downstream project like Zeppelin.\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:18:50.038", - "config": { - "tableHide": false, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true, - "completionKey": "TAB", - "completionSupport": false - }, - "colWidth": 12.0, - "editorMode": "ace/mode/markdown", - "fontSize": 9.0, - "editorHide": false, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial note for Flink Streaming application. You can run flink scala api via \u003ccode\u003e%flink\u003c/code\u003e and run flink stream sql via \u003ccode\u003e%flink.ssql\u003c/code\u003e. We provide 2 examples in this tutorial:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eClassical word count example via Flink streaming\u003c/li\u003e\n \u003cli\u003eWe simulate a web log stream source, and then query and visualize this streaming data in Zeppelin.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eFor now, the capability of supporting flink streaming job is very limited in Zeppelin. For example, canceling job is not supported, these capability depends on flink itself, flink community is working on that for downstream project like Zeppelin.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705460_-1023073277", - "id": "paragraph_1548052720723_1943177100", - "dateCreated": "2019-09-26 17:55:05.460", - "dateStarted": "2019-10-08 15:18:47.711", - "dateFinished": "2019-10-08 15:18:47.724", - "status": "FINISHED" - }, - { - "title": "Configure Flink Interpreter", - "text": "%flink.conf\n\nFLINK_HOME \u003cFLINK_INSTALLATION\u003e\n# Use blink planner for flink streaming scenario\nzeppelin.flink.planner blink\n", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:38:09.734", - "config": { - "editorSetting": { - "language": "text", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/text", - "fontSize": 9.0, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705461_327101264", - "id": "paragraph_1546571299955_275296580", - "dateCreated": "2019-09-26 17:55:05.461", - "dateStarted": "2019-10-08 15:21:48.153", - "dateFinished": "2019-10-08 15:21:48.165", - "status": "FINISHED" - }, - { - "title": "Stream WordCount", - "text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .keyBy(0)\n .sum(1)\n .print\n\nsenv.execute()\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:19:07.936", - "config": { - "editorSetting": { - "language": "scala", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/scala", - "fontSize": 9.0, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@5a099566\n\u001b[1m\u001b[34mres0\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)]\u001b[0m \u003d org.apache.flink.streaming.api.datastream.DataStreamSink@58805cef\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u001b[34mres1\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.common.JobExecutionResult\u001b[0m \u003d org.apache.flink.api.common.JobExecutionResult@74b3b79d\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705461_377981606", - "id": "paragraph_1546571324670_-435705916", - "dateCreated": "2019-09-26 17:55:05.461", - "dateStarted": "2019-10-08 15:19:07.941", - "dateFinished": "2019-10-08 15:19:23.511", - "status": "FINISHED" - }, - { - "title": "Register a Stream DataSource to simulate web log", - "text": "%flink\n\nimport org.apache.flink.streaming.api.functions.source.SourceFunction\nimport org.apache.flink.table.api.TableEnvironment\nimport org.apache.flink.streaming.api.TimeCharacteristic\nimport org.apache.flink.streaming.api.checkpoint.ListCheckpointed\nimport java.util.Collections\nimport scala.collection.JavaConversions._\n\nsenv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)\nsenv.enableCheckpointing(1000)\n\nval data \u003d senv.addSource(new SourceFunction[(Long, String)] with ListCheckpointed[java.lang.Long] {\n\n val pages \u003d Seq(\"home\", \"search\", \"search\", \"product\", \"product\", \"product\")\n var count: Long \u003d 0\n // startTime is 2018/1/1\n var startTime: Long \u003d new java.util.Date(2018 - 1900,0,1).getTime\n var sleepInterval \u003d 100\n \n override def run(ctx: SourceFunction.SourceContext[(Long, String)]): Unit \u003d {\n val lock \u003d ctx.getCheckpointLock\n \n while (count \u003c 10000000) {\n lock.synchronized({\n ctx.collect((startTime + count * sleepInterval, pages(count.toInt % pages.size)))\n count +\u003d 1\n Thread.sleep(sleepInterval)\n })\n }\n }\n\n override def cancel(): Unit \u003d {\n\n }\n\n override def snapshotState(checkpointId: Long, timestamp: Long): java.util.List[java.lang.Long] \u003d {\n Collections.singletonList(count)\n }\n\n override def restoreState(state: java.util.List[java.lang.Long]): Unit \u003d {\n state.foreach(s \u003d\u003e count \u003d s)\n }\n\n}).assignAscendingTimestamps(_._1)\n\nstenv.registerDataStream(\"log\", data, \u0027time, \u0027url, \u0027rowtime.rowtime)\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:19:31.503", - "config": { - "editorSetting": { - "language": "scala", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/scala", - "fontSize": 9.0, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": {}, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "import org.apache.flink.streaming.api.functions.source.SourceFunction\nimport org.apache.flink.table.api.TableEnvironment\nimport org.apache.flink.streaming.api.TimeCharacteristic\nimport org.apache.flink.streaming.api.checkpoint.ListCheckpointed\nimport java.util.Collections\nimport scala.collection.JavaConversions._\n\u001b[1m\u001b[34mres4\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.StreamExecutionEnvironment\u001b[0m \u003d org.apache.flink.streaming.api.scala.StreamExecutionEnvironment@7e0e86c0\n\u001b[33mwarning: \u001b[0mthere was one deprecation warning; re-run with -deprecation for details\n\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[(Long, String)]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@361e2c6e\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705461_-1986381705", - "id": "paragraph_1546571333074_1869171983", - "dateCreated": "2019-09-26 17:55:05.461", - "dateStarted": "2019-10-08 15:19:31.507", - "dateFinished": "2019-10-08 15:19:34.040", - "status": "FINISHED" - }, - { - "title": "Total Page View", - "text": "%flink.ssql(type\u003dsingle, parallelism\u003d1, refreshInterval\u003d3000, template\u003d\u003ch1\u003e{1}\u003c/h1\u003e until \u003ch2\u003e{0}\u003c/h2\u003e, enableSavePoint\u003dfalse, runWithSavePoint\u003dfalse)\n\nselect max(rowtime), count(1) from log\n", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:19:41.998", - "config": { - "savepointPath": "file:/tmp/save_point/savepoint-13e681-f552013d6184", - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "editorHide": false, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "table", - "height": 141.0, - "optionOpen": false - } - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705461_616808409", - "id": "paragraph_1546571459644_596843735", - "dateCreated": "2019-09-26 17:55:05.461", - "dateStarted": "2019-10-08 15:19:42.003", - "dateFinished": "2019-09-26 17:57:24.042", - "status": "ABORT" - }, - { - "title": "Page View by Page", - "text": "%flink.ssql(type\u003dretract, refreshInterval\u003d2000, parallelism\u003d1, enableSavePoint\u003dfalse, runWithSavePoint\u003dfalse)\n\nselect url, count(1) as pv from log group by url", - "user": "anonymous", - "dateUpdated": "2019-10-08 15:20:02.708", - "config": { - "savepointPath": "file:/flink/save_point/savepoint-e4f781-996290516ef1", - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 6.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "editorHide": false, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "multiBarChart", - "height": 198.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "url": "string", - "pv": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - }, - "multiBarChart": { - "xLabelStatus": "default", - "rotate": { - "degree": "-45" - } - } - }, - "commonSetting": {}, - "keys": [], - "groups": [], - "values": [] - }, - "helium": {} - }, - "1": { - "graph": { - "mode": "table", - "height": 86.0, - "optionOpen": false - } - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705462_1478852202", - "id": "paragraph_1546571485092_-716357716", - "dateCreated": "2019-09-26 17:55:05.462", - "dateStarted": "2019-09-26 17:56:45.502", - "dateFinished": "2019-09-26 17:57:24.042", - "status": "ABORT" - }, - { - "title": "Page View Per Page for each 5 Seconds", - "text": "%flink.ssql(type\u003dts, parallelism\u003d1, refreshInterval\u003d2000, enableSavePoint\u003dfalse, runWithSavePoint\u003dfalse, threshold\u003d60000)\n\nselect TUMBLE_START(rowtime, INTERVAL \u00275\u0027 SECOND) as start_time, url, count(1) as pv from log group by TUMBLE(rowtime, INTERVAL \u00275\u0027 SECOND), url", - "user": "anonymous", - "dateUpdated": "2019-10-11 10:38:03.669", - "config": { - "editorSetting": { - "language": "sql", - "editOnDblClick": false, - "completionKey": "TAB", - "completionSupport": true - }, - "colWidth": 12.0, - "editorMode": "ace/mode/sql", - "fontSize": 9.0, - "title": false, - "runOnSelectionChange": true, - "checkEmpty": true, - "results": { - "0": { - "graph": { - "mode": "lineChart", - "height": 300.0, - "optionOpen": false, - "setting": { - "table": { - "tableGridState": {}, - "tableColumnTypeState": { - "names": { - "start_time": "string", - "url": "string", - "pv": "string" - }, - "updated": false - }, - "tableOptionSpecHash": "[{\"name\":\"useFilter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable filter for columns\"},{\"name\":\"showPagination\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable pagination for better navigation\"},{\"name\":\"showAggregationFooter\",\"valueType\":\"boolean\",\"defaultValue\":false,\"widget\":\"checkbox\",\"description\":\"Enable a footer for displaying aggregated values\"}]", - "tableOptionValue": { - "useFilter": false, - "showPagination": false, - "showAggregationFooter": false - }, - "updated": false, - "initialized": false - }, - "lineChart": { - "rotate": { - "degree": "-45" - }, - "xLabelStatus": "rotate" - } - }, - "commonSetting": {}, - "keys": [ - { - "name": "start_time", - "index": 0.0, - "aggr": "sum" - } - ], - "groups": [ - { - "name": "url", - "index": 1.0, - "aggr": "sum" - } - ], - "values": [ - { - "name": "pv", - "index": 2.0, - "aggr": "sum" - } - ] - }, - "helium": {} - } - }, - "enabled": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "start_time\turl\tpv\n2018-01-01 00:00:00.0\thome\t9\n2018-01-01 00:00:00.0\tsearch\t17\n2018-01-01 00:00:00.0\tproduct\t24\n2018-01-01 00:00:05.0\tsearch\t17\n2018-01-01 00:00:05.0\thome\t8\n2018-01-01 00:00:05.0\tproduct\t25\n" - } - ] - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705462_-861601454", - "id": "paragraph_1546571542468_1571709353", - "dateCreated": "2019-09-26 17:55:05.462", - "status": "READY" - }, - { - "text": "%flink.ssql\n", - "user": "anonymous", - "dateUpdated": "2019-09-26 17:55:05.462", - "config": {}, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "progressUpdateIntervalMs": 500, - "jobName": "paragraph_1569491705462_-1279435256", - "id": "paragraph_1546571603746_-749250139", - "dateCreated": "2019-09-26 17:55:05.462", - "status": "READY" - } - ], - "name": "Flink Stream Tutorial", - "id": "2ER62Y5VJ", - "defaultInterpreterGroup": "spark", - "version": "0.9.0-SNAPSHOT", - "permissions": {}, - "noteParams": {}, - "noteForms": {}, - "angularObjects": {}, - "config": { - "isZeppelinNotebookCronEnable": false - }, - "info": {}, - "path": "/Flink Stream Tutorial" -} \ No newline at end of file diff --git a/notebook/~Trash/Zeppelin Tutorial/Using Flink for batch processing_2C35YU814.zpln b/notebook/~Trash/Zeppelin Tutorial/Using Flink for batch processing_2C35YU814.zpln deleted file mode 100644 index 357271a9a60..00000000000 --- a/notebook/~Trash/Zeppelin Tutorial/Using Flink for batch processing_2C35YU814.zpln +++ /dev/null @@ -1,806 +0,0 @@ -{ - "paragraphs": [ - { - "text": "%md\n### Intro\nThis notebook is an example of how to use **Apache Flink** for processing simple data sets. We will take an open airline data set from [stat-computing.org](http://stat-computing.org) and find out who was the most popular carrier during 1998-2000 years. Next we will build a chart that shows flights distribution by months and look how it changes from year to year. We will use Zeppelin `%table` display system to build charts.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 11:55:42 AM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true - }, - "editorMode": "ace/mode/markdown", - "editorHide": true, - "tableHide": false - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952101049_-1120777567", - "id": "20170109-115501_192763014", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003eIntro\u003c/h3\u003e\n\u003cp\u003eThis notebook is an example of how to use \u003cstrong\u003eApache Flink\u003c/strong\u003e for processing simple data sets. We will take an open airline data set from \u003ca href\u003d\"http://stat-computing.org\"\u003estat-computing.org\u003c/a\u003e and find out who was the most popular carrier during 1998-2000 years. Next we will build a chart that shows flights distribution by months and look how it changes from year to year. We will use Zeppelin \u003ccode\u003e%table\u003c/code\u003e display system to build charts.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:55:01 AM", - "dateStarted": "Jan 9, 2017 11:55:42 AM", - "dateFinished": "Jan 9, 2017 11:55:44 AM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%md\n### Getting the data\nFirst we need to download and unpack the data. We will get three big data sets with flight details (one pack for each year) and a small one with carriers names. In total we will get for about 1,5 GB of data. To be able to process such amount of data it is recommended to increase `shell.command.timeout.millisecs` value in `%sh` interpreter settings up to several minutes. You can find interpreters configuration by clicking on `Interpreter` in a drop-down menu from the top right corner of the Zeppelin web-ui.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 11:56:08 AM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "editorHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952142017_284386712", - "id": "20170109-115542_1487437739", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003eGetting the data\u003c/h3\u003e\n\u003cp\u003eFirst we need to download and unpack the data. We will get three big data sets with flight details (one pack for each year) and a small one with carriers names. In total we will get for about 1,5 GB of data. To be able to process such amount of data it is recommended to increase \u003ccode\u003eshell.command.timeout.millisecs\u003c/code\u003e value in \u003ccode\u003e%sh\u003c/code\u003e interpreter settings up to several minutes. You can find interpreters configuration by clicking on \u003ccode\u003eInterpreter\u003c/code\u003e in a drop-down menu from the top right corner of the Zeppelin web-ui.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:55:42 AM", - "dateStarted": "Jan 9, 2017 11:56:07 AM", - "dateFinished": "Jan 9, 2017 11:56:07 AM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%sh\n\nrm /tmp/flights98.csv.bz2\ncurl -o /tmp/flights98.csv.bz2 \"http://stat-computing.org/dataexpo/2009/1998.csv.bz2\"\nrm /tmp/flights98.csv\nbzip2 -d /tmp/flights98.csv.bz2\nchmod 666 /tmp/flights98.csv", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 11:59:02 AM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "sh", - "editOnDblClick": false - }, - "editorMode": "ace/mode/sh", - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952167547_-566831096", - "id": "20170109-115607_1634441713", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "rm: cannot remove \u0027/tmp/flights98.csv.bz2\u0027: No such file or directory\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 73.1M 0 64295 0 0 51646 0 0:24:44 0:00:01 0:24:43 51642\r 0 73.1M 0 358k 0 0 160k 0 0:07:47 0:00:02 0:07:45 160k\r 1 73.1M 1 1209k 0 0 373k 0 0:03:20 0:00:03 0:03:17 373k\r 4 73.1M 4 3204k 0 0 773k 0 0:01:36 0:00:04 0:01:32 773k\r 7 73.1M 7 5508k 0 0 1071k 0 0:01:09 0:00:05 0:01:04 1145k\r 10 73.1M 10 7875k 0 0 1280k 0 0:00:58 0:00:06 0:00:52 1592k\r 13 73.1M 13 10.1M 0 0 1458k 0 0:00:51 0:00:07 0:00:44 2049k\r 17 73.1M 17 12.7M 0 0 1608k 0 0:00:46 0:00:08 0:00:38 2422k\r 20 73.1M 20 14.9M 0 0 1671k 0 0:00:44 0:00:09 0:00:35 2413k\r 23 73.1M 23 17.1M 0 0 1728k 0 0:00:43 0:00:10 0:00:33 2403k\r 26 73.1M 26 19.4M 0 0 1787k 0 0:00:41 0:00:11 0:00:30 2411k\r 29 73.1M 29 21.7M 0 0 1837k 0 0:00:40 0:00:12 0:00:28 2379k\r 32 73.1M 32 24.1M 0 0 1879k 0 0:00:39 0:00:13 0:00:26 2322k\r 36 73.1M 36 26.4M 0 0 1916k 0 0:00:39 0:00:14 0:00:25 2365k\r 39 73.1M 39 28.5M 0 0 1930k 0 0:00:38 0:00:15 0:00:23 2341k\r 41 73.1M 41 30.6M 0 0 1943k 0 0:00:38 0:00:16 0:00:22 2292k\r 44 73.1M 44 32.6M 0 0 1947k 0 0:00:38 0:00:17 0:00:21 2215k\r 47 73.1M 47 34.6M 0 0 1952k 0 0:00:38 0:00:18 0:00:20 2145k\r 50 73.1M 50 36.6M 0 0 1960k 0 0:00:38 0:00:19 0:00:19 2082k\r 52 73.1M 52 38.3M 0 0 1947k 0 0:00:38 0:00:20 0:00:18 1998k\r 55 73.1M 55 40.4M 0 0 1956k 0 0:00:38 0:00:21 0:00:17 1996k\r 57 73.1M 57 42.2M 0 0 1951k 0 0:00:38 0:00:22 0:00:16 1965k\r 60 73.1M 60 44.0M 0 0 1948k 0 0:00:38 0:00:23 0:00:15 1932k\r 62 73.1M 62 45.4M 0 0 1927k 0 0:00:38 0:00:24 0:00:14 1803k\r 63 73.1M 63 46.5M 0 0 1896k 0 0:00:39 0:00:25 0:00:14 1688k\r 65 73.1M 65 47.7M 0 0 1868k 0 0:00:40 0:00:26 0:00:14 1496k\r 66 73.1M 66 48.8M 0 0 1843k 0 0:00:40 0:00:27 0:00:13 1363k\r 68 73.1M 68 50.0M 0 0 1820k 0 0:00:41 0:00:28 0:00:13 1227k\r 69 73.1M 69 51.1M 0 0 1786k 0 0:00:41 0:00:29 0:00:12 1126k\r 71 73.1M 71 52.0M 0 0 1769k 0 0:00:42 0:00:30 0:00:12 1131k\r 72 73.1M 72 53.0M 0 0 1744k 0 0:00:42 0:00:31 0:00:11 1098k\r 73 73.1M 73 54.0M 0 0 1723k 0 0:00:43 0:00:32 0:00:11 1070k\r 75 73.1M 75 55.1M 0 0 1702k 0 0:00:43 0:00:33 0:00:10 1040k\r 76 73.1M 76 56.0M 0 0 1681k 0 0:00:44 0:00:34 0:00:10 1048k\r 77 73.1M 77 56.9M 0 0 1659k 0 0:00:45 0:00:35 0:00:10 993k\r 79 73.1M 79 57.8M 0 0 1638k 0 0:00:45 0:00:36 0:00:09 972k\r 80 73.1M 80 58.7M 0 0 1618k 0 0:00:46 0:00:37 0:00:09 946k\r 81 73.1M 81 59.6M 0 0 1600k 0 0:00:46 0:00:38 0:00:08 921k\r 82 73.1M 82 60.5M 0 0 1582k 0 0:00:47 0:00:39 0:00:08 906k\r 83 73.1M 83 61.4M 0 0 1566k 0 0:00:47 0:00:40 0:00:07 917k\r 85 73.1M 85 62.1M 0 0 1546k 0 0:00:48 0:00:41 0:00:07 887k\r 86 73.1M 86 63.0M 0 0 1532k 0 0:00:48 0:00:42 0:00:06 892k\r 87 73.1M 87 63.9M 0 0 1517k 0 0:00:49 0:00:43 0:00:06 882k\r 88 73.1M 88 64.8M 0 0 1503k 0 0:00:49 0:00:44 0:00:05 878k\r 89 73.1M 89 65.6M 0 0 1489k 0 0:00:50 0:00:45 0:00:05 872k\r 91 73.1M 91 66.5M 0 0 1477k 0 0:00:50 0:00:46 0:00:04 904k\r 92 73.1M 92 67.4M 0 0 1465k 0 0:00:51 0:00:47 0:00:04 897k\r 93 73.1M 93 68.2M 0 0 1451k 0 0:00:51 0:00:48 0:00:03 889k\r 94 73.1M 94 69.2M 0 0 1441k 0 0:00:51 0:00:49 0:00:02 897k\r 95 73.1M 95 70.1M 0 0 1430k 0 0:00:52 0:00:50 0:00:02 904k\r 97 73.1M 97 71.0M 0 0 1421k 0 0:00:52 0:00:51 0:00:01 910k\r 98 73.1M 98 71.9M 0 0 1413k 0 0:00:52 0:00:52 --:--:-- 923k\r 99 73.1M 99 72.8M 0 0 1403k 0 0:00:53 0:00:53 --:--:-- 941k\r100 73.1M 100 73.1M 0 0 1401k 0 0:00:53 0:00:53 --:--:-- 941k\nrm: cannot remove \u0027/tmp/flights98.csv\u0027: No such file or directory\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:56:07 AM", - "dateStarted": "Jan 9, 2017 11:57:37 AM", - "dateFinished": "Jan 9, 2017 11:58:50 AM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%sh\n\nrm /tmp/flights99.csv.bz2\ncurl -o /tmp/flights99.csv.bz2 \"http://stat-computing.org/dataexpo/2009/1999.csv.bz2\"\nrm /tmp/flights99.csv\nbzip2 -d /tmp/flights99.csv.bz2\nchmod 666 /tmp/flights99.csv", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 11:59:59 AM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "sh", - "editOnDblClick": false - }, - "editorMode": "ace/mode/sh", - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952257873_-1874269156", - "id": "20170109-115737_1346880844", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "rm: cannot remove \u0027/tmp/flights99.csv.bz2\u0027: No such file or directory\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 75.7M 0 5520 0 0 9851 0 2:14:25 --:--:-- 2:14:25 9839\r 0 75.7M 0 88819 0 0 64302 0 0:20:35 0:00:01 0:20:34 64268\r 0 75.7M 0 181k 0 0 25316 0 0:52:18 0:00:07 0:52:11 25316\r 0 75.7M 0 548k 0 0 67331 0 0:19:39 0:00:08 0:19:31 67327\r 1 75.7M 1 817k 0 0 89344 0 0:14:49 0:00:09 0:14:40 89337\r 1 75.7M 1 1042k 0 0 100k 0 0:12:54 0:00:10 0:12:44 105k\r 3 75.7M 3 2461k 0 0 218k 0 0:05:55 0:00:11 0:05:44 239k\r 6 75.7M 6 5069k 0 0 412k 0 0:03:08 0:00:12 0:02:56 985k\r 11 75.7M 11 9165k 0 0 690k 0 0:01:52 0:00:13 0:01:39 1744k\r 14 75.7M 14 11.2M 0 0 796k 0 0:01:37 0:00:14 0:01:23 2109k\r 19 75.7M 19 14.8M 0 0 995k 0 0:01:17 0:00:15 0:01:02 2910k\r 24 75.7M 24 18.6M 0 0 1174k 0 0:01:06 0:00:16 0:00:50 3331k\r 29 75.7M 29 22.5M 0 0 1338k 0 0:00:57 0:00:17 0:00:40 3613k\r 35 75.7M 35 26.5M 0 0 1486k 0 0:00:52 0:00:18 0:00:34 3603k\r 40 75.7M 40 30.3M 0 0 1610k 0 0:00:48 0:00:19 0:00:29 4025k\r 45 75.7M 45 34.2M 0 0 1731k 0 0:00:44 0:00:20 0:00:24 3980k\r 50 75.7M 50 38.2M 0 0 1840k 0 0:00:42 0:00:21 0:00:21 4011k\r 55 75.7M 55 42.2M 0 0 1940k 0 0:00:39 0:00:22 0:00:17 4020k\r 60 75.7M 60 46.2M 0 0 2032k 0 0:00:38 0:00:23 0:00:15 4026k\r 65 75.7M 65 49.9M 0 0 2106k 0 0:00:36 0:00:24 0:00:12 4017k\r 70 75.7M 70 53.5M 0 0 2169k 0 0:00:35 0:00:25 0:00:10 3945k\r 75 75.7M 75 57.2M 0 0 2229k 0 0:00:34 0:00:26 0:00:08 3884k\r 80 75.7M 80 61.1M 0 0 2293k 0 0:00:33 0:00:27 0:00:06 3868k\r 86 75.7M 86 65.5M 0 0 2372k 0 0:00:32 0:00:28 0:00:04 3956k\r 92 75.7M 92 70.4M 0 0 2464k 0 0:00:31 0:00:29 0:00:02 4200k\r100 75.7M 100 75.7M 0 0 2565k 0 0:00:30 0:00:30 --:--:-- 4585k\nrm: cannot remove \u0027/tmp/flights99.csv\u0027: No such file or directory\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:57:37 AM", - "dateStarted": "Jan 9, 2017 11:59:04 AM", - "dateFinished": "Jan 9, 2017 11:59:53 AM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%sh\n\nrm /tmp/flights00.csv.bz2\ncurl -o /tmp/flights00.csv.bz2 \"http://stat-computing.org/dataexpo/2009/2000.csv.bz2\"\nrm /tmp/flights00.csv\nbzip2 -d /tmp/flights00.csv.bz2\nchmod 666 /tmp/flights00.csv", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:01:42 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "sh", - "editOnDblClick": false - }, - "editorMode": "ace/mode/sh", - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952312038_-1315320949", - "id": "20170109-115832_608069986", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "rm: cannot remove \u0027/tmp/flights00.csv.bz2\u0027: No such file or directory\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0\r 0 78.7M 0 5520 0 0 3016 0 7:36:06 0:00:01 7:36:05 3014\r 0 78.7M 0 39987 0 0 15337 0 1:29:41 0:00:02 1:29:39 15332\r 0 78.7M 0 87755 0 0 24531 0 0:56:04 0:00:03 0:56:01 24526\r 0 78.7M 0 157k 0 0 33950 0 0:40:31 0:00:04 0:40:27 33944\r 0 78.7M 0 221k 0 0 40878 0 0:33:39 0:00:05 0:33:34 53734\r 0 78.7M 0 308k 0 0 47250 0 0:29:06 0:00:06 0:29:00 63943\r 0 78.7M 0 398k 0 0 52806 0 0:26:03 0:00:07 0:25:56 71903\r 0 78.7M 0 437k 0 0 36667 0 0:37:31 0:00:12 0:37:19 41697\r 0 78.7M 0 703k 0 0 57158 0 0:24:04 0:00:12 0:23:52 71137\r 1 78.7M 1 851k 0 0 64259 0 0:21:24 0:00:13 0:21:11 80471\r 1 78.7M 1 1171k 0 0 82442 0 0:16:41 0:00:14 0:16:27 109k\r 1 78.7M 1 1546k 0 0 79861 0 0:17:13 0:00:19 0:16:54 97134\r 3 78.7M 3 3181k 0 0 154k 0 0:08:41 0:00:20 0:08:21 327k\r 4 78.7M 4 3466k 0 0 160k 0 0:08:21 0:00:21 0:08:00 308k\r 4 78.7M 4 3565k 0 0 136k 0 0:09:50 0:00:26 0:09:24 216k\r 8 78.7M 8 7196k 0 0 270k 0 0:04:57 0:00:26 0:04:31 501k\r 10 78.7M 10 8459k 0 0 307k 0 0:04:22 0:00:27 0:03:55 894k\r 11 78.7M 11 9386k 0 0 327k 0 0:04:06 0:00:28 0:03:38 768k\r 15 78.7M 15 11.9M 0 0 413k 0 0:03:14 0:00:29 0:02:45 1093k\r 18 78.7M 18 14.5M 0 0 487k 0 0:02:45 0:00:30 0:02:15 2553k\r 22 78.7M 22 17.7M 0 0 574k 0 0:02:20 0:00:31 0:01:49 2195k\r 25 78.7M 25 19.9M 0 0 626k 0 0:02:08 0:00:32 0:01:36 2375k\r 28 78.7M 28 22.1M 0 0 676k 0 0:01:59 0:00:33 0:01:26 2726k\r 31 78.7M 31 24.7M 0 0 734k 0 0:01:49 0:00:34 0:01:15 2643k\r 34 78.7M 34 27.3M 0 0 789k 0 0:01:42 0:00:35 0:01:07 2638k\r 38 78.7M 38 30.0M 0 0 841k 0 0:01:35 0:00:36 0:00:59 2513k\r 40 78.7M 40 32.1M 0 0 874k 0 0:01:32 0:00:37 0:00:55 2457k\r 43 78.7M 43 34.1M 0 0 906k 0 0:01:28 0:00:38 0:00:50 2445k\r 45 78.7M 45 35.7M 0 0 925k 0 0:01:27 0:00:39 0:00:48 2250k\r 47 78.7M 47 37.4M 0 0 946k 0 0:01:25 0:00:40 0:00:45 2062k\r 49 78.7M 49 39.3M 0 0 968k 0 0:01:23 0:00:41 0:00:42 1907k\r 52 78.7M 52 41.0M 0 0 987k 0 0:01:21 0:00:42 0:00:39 1859k\r 54 78.7M 54 42.5M 0 0 1000k 0 0:01:20 0:00:43 0:00:37 1729k\r 55 78.7M 55 43.9M 0 0 1008k 0 0:01:19 0:00:44 0:00:35 1651k\r 57 78.7M 57 45.4M 0 0 1020k 0 0:01:18 0:00:45 0:00:33 1625k\r 59 78.7M 59 46.6M 0 0 1027k 0 0:01:18 0:00:46 0:00:32 1512k\r 60 78.7M 60 47.7M 0 0 1027k 0 0:01:18 0:00:47 0:00:31 1376k\r 61 78.7M 61 48.6M 0 0 1024k 0 0:01:18 0:00:48 0:00:30 1236k\r 62 78.7M 62 49.5M 0 0 1020k 0 0:01:18 0:00:49 0:00:29 1125k\r 64 78.7M 64 50.4M 0 0 1021k 0 0:01:18 0:00:50 0:00:28 1027k\r 65 78.7M 65 51.3M 0 0 1018k 0 0:01:19 0:00:51 0:00:28 941k\r 66 78.7M 66 52.1M 0 0 1016k 0 0:01:19 0:00:52 0:00:27 910k\r 67 78.7M 67 53.0M 0 0 1014k 0 0:01:19 0:00:53 0:00:26 909k\r 68 78.7M 68 53.7M 0 0 1006k 0 0:01:20 0:00:54 0:00:26 868k\r 69 78.7M 69 54.6M 0 0 1006k 0 0:01:20 0:00:55 0:00:25 858k\r 70 78.7M 70 55.3M 0 0 1002k 0 0:01:20 0:00:56 0:00:24 831k\r 71 78.7M 71 56.1M 0 0 998k 0 0:01:20 0:00:57 0:00:23 807k\r 72 78.7M 72 56.9M 0 0 994k 0 0:01:21 0:00:58 0:00:23 787k\r 73 78.7M 73 57.6M 0 0 991k 0 0:01:21 0:00:59 0:00:22 823k\r 74 78.7M 74 58.4M 0 0 988k 0 0:01:21 0:01:00 0:00:21 784k\r 75 78.7M 75 59.2M 0 0 985k 0 0:01:21 0:01:01 0:00:20 791k\r 76 78.7M 76 60.0M 0 0 982k 0 0:01:22 0:01:02 0:00:20 797k\r 77 78.7M 77 60.8M 0 0 980k 0 0:01:22 0:01:03 0:00:19 808k\r 78 78.7M 78 61.6M 0 0 977k 0 0:01:22 0:01:04 0:00:18 812k\r 79 78.7M 79 62.4M 0 0 975k 0 0:01:22 0:01:05 0:00:17 824k\r 80 78.7M 80 63.4M 0 0 976k 0 0:01:22 0:01:06 0:00:16 870k\r 82 78.7M 82 64.9M 0 0 984k 0 0:01:21 0:01:07 0:00:14 1006k\r 85 78.7M 85 66.9M 0 0 1000k 0 0:01:20 0:01:08 0:00:12 1254k\r 88 78.7M 88 69.4M 0 0 1022k 0 0:01:18 0:01:09 0:00:09 1602k\r 92 78.7M 92 72.5M 0 0 1053k 0 0:01:16 0:01:10 0:00:06 2064k\r 96 78.7M 96 76.1M 0 0 1089k 0 0:01:13 0:01:11 0:00:02 2600k\r100 78.7M 100 78.7M 0 0 1116k 0 0:01:12 0:01:12 --:--:-- 3022k\nrm: cannot remove \u0027/tmp/flights00.csv\u0027: No such file or directory\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:58:32 AM", - "dateStarted": "Jan 9, 2017 12:00:01 PM", - "dateFinished": "Jan 9, 2017 12:01:34 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%sh\n\nrm /tmp/carriers.csv\ncurl -o /tmp/carriers.csv \"http://stat-computing.org/dataexpo/2009/carriers.csv\"\nchmod 666 /tmp/carriers.csv", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:01:48 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "sh", - "editOnDblClick": false - }, - "editorMode": "ace/mode/sh", - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952329229_2136292082", - "id": "20170109-115849_1794095031", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "rm: cannot remove \u0027/tmp/carriers.csv\u0027: No such file or directory\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 9 43758 9 4140 0 0 7588 0 0:00:05 --:--:-- 0:00:05 7582\r100 43758 100 43758 0 0 46357 0 --:--:-- --:--:-- --:--:-- 46353\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:58:49 AM", - "dateStarted": "Jan 9, 2017 12:01:44 PM", - "dateFinished": "Jan 9, 2017 12:01:45 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%md\n### Preparing the data\nThe `flights\u003cYY\u003e.csv` contains various data but we only need the information about the year, the month and the carrier who served the flight. Let\u0027s retrieve this information and create `DataSets`.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:01:51 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true - }, - "editorMode": "ace/mode/markdown", - "editorHide": true, - "tableHide": false - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952363836_-1769111757", - "id": "20170109-115923_963126574", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003ePreparing the data\u003c/h3\u003e\n\u003cp\u003eThe \u003ccode\u003eflights\u0026lt;YY\u0026gt;.csv\u003c/code\u003e contains various data but we only need the information about the year, the month and the carrier who served the flight. Let\u0026rsquo;s retrieve this information and create \u003ccode\u003eDataSets\u003c/code\u003e.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 11:59:23 AM", - "dateStarted": "Jan 9, 2017 12:01:51 PM", - "dateFinished": "Jan 9, 2017 12:01:53 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%flink\n\ncase class Flight(year: Int, month: Int, carrierCode: String)\ncase class Carrier(code: String, name: String)\n\nval flights98 \u003d benv.readCsvFile[Flight](\"/tmp/flights98.csv\", ignoreFirstLine \u003d true, includedFields \u003d Array(0, 1, 8))\nval flights99 \u003d benv.readCsvFile[Flight](\"/tmp/flights99.csv\", ignoreFirstLine \u003d true, includedFields \u003d Array(0, 1, 8))\nval flights00 \u003d benv.readCsvFile[Flight](\"/tmp/flights00.csv\", ignoreFirstLine \u003d true, includedFields \u003d Array(0, 1, 8))\nval flights \u003d flights98.union(flights99).union(flights00)\nval carriers \u003d benv.readCsvFile[Carrier](\"/tmp/carriers.csv\", ignoreFirstLine \u003d true, quoteCharacter \u003d \u0027\"\u0027)", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:02:38 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true, - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952511284_-589624871", - "id": "20170109-120151_872852428", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "defined class Flight\ndefined class Carrier\nflights98: org.apache.flink.api.scala.DataSet[Flight] \u003d org.apache.flink.api.scala.DataSet@7cd81fd5\nflights99: org.apache.flink.api.scala.DataSet[Flight] \u003d org.apache.flink.api.scala.DataSet@58242e79\nflights00: org.apache.flink.api.scala.DataSet[Flight] \u003d org.apache.flink.api.scala.DataSet@13f866c0\nflights: org.apache.flink.api.scala.DataSet[Flight] \u003d org.apache.flink.api.scala.DataSet@2aad2530\ncarriers: org.apache.flink.api.scala.DataSet[Carrier] \u003d org.apache.flink.api.scala.DataSet@148c977b\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:01:51 PM", - "dateStarted": "Jan 9, 2017 12:02:10 PM", - "dateFinished": "Jan 9, 2017 12:02:29 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%md\n### Choosing the carrier\nNow we will search for the most popular carrier during the whole time period.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:03:08 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true - }, - "editorMode": "ace/mode/markdown", - "editorHide": true, - "tableHide": false - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952530113_212237809", - "id": "20170109-120210_773710997", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003eChoosing the carrier\u003c/h3\u003e\n\u003cp\u003eNow we will search for the most popular carrier during the whole time period.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:02:10 PM", - "dateStarted": "Jan 9, 2017 12:03:08 PM", - "dateFinished": "Jan 9, 2017 12:03:08 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%flink\n\nimport org.apache.flink.api.common.operators.Order\nimport org.apache.flink.api.java.aggregation.Aggregations\n\ncase class CarrierFlightsCount(carrierCode: String, count: Int)\ncase class CountByMonth(month: Int, count: Int)\n\nval carriersFlights \u003d flights\n .map(f \u003d\u003e CarrierFlightsCount(f.carrierCode, 1))\n .groupBy(\"carrierCode\")\n .sum(\"count\")\n\nval maxFlights \u003d carriersFlights\n .aggregate(Aggregations.MAX, \"count\")\n\nval bestCarrier \u003d carriersFlights\n .join(maxFlights)\n .where(\"count\")\n .equalTo(\"count\")\n .map(_._1)\n \nval carrierName \u003d bestCarrier\n .join(carriers)\n .where(\"carrierCode\")\n .equalTo(\"code\")\n .map(_._2.name)\n .collect\n .head", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:04:04 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true, - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952588708_-1770095793", - "id": "20170109-120308_1328511597", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "import org.apache.flink.api.common.operators.Order\nimport org.apache.flink.api.java.aggregation.Aggregations\ndefined class CarrierFlightsCount\ndefined class CountByMonth\ncarriersFlights: org.apache.flink.api.scala.AggregateDataSet[CarrierFlightsCount] \u003d org.apache.flink.api.scala.AggregateDataSet@2c59be0b\nmaxFlights: org.apache.flink.api.scala.AggregateDataSet[CarrierFlightsCount] \u003d org.apache.flink.api.scala.AggregateDataSet@53e5fad9\nbestCarrier: org.apache.flink.api.scala.DataSet[CarrierFlightsCount] \u003d org.apache.flink.api.scala.DataSet@64b7b1b3\ncarrierName: String \u003d Delta Air Lines Inc.\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:03:08 PM", - "dateStarted": "Jan 9, 2017 12:03:41 PM", - "dateFinished": "Jan 9, 2017 12:03:58 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%flink\n\nprintln(s\"\"\"The most popular carrier is:\n$carrierName\n\"\"\")", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:09:18 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952621624_-1222400539", - "id": "20170109-120341_952212268", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "The most popular carrier is:\nDelta Air Lines Inc.\n\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:03:41 PM", - "dateStarted": "Jan 9, 2017 12:04:09 PM", - "dateFinished": "Jan 9, 2017 12:04:10 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%md\n### Calculating flights\nThe last step is to filter **Delta Air Lines** flights and group them by months.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:04:26 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true - }, - "editorMode": "ace/mode/markdown", - "editorHide": true, - "tableHide": false - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952649646_-1553253944", - "id": "20170109-120409_2003276881", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003eCalculating flights\u003c/h3\u003e\n\u003cp\u003eThe last step is to filter \u003cstrong\u003eDelta Air Lines\u003c/strong\u003e flights and group them by months.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:04:09 PM", - "dateStarted": "Jan 9, 2017 12:04:26 PM", - "dateFinished": "Jan 9, 2017 12:04:26 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "title": "flights grouping", - "text": "%flink\n\ndef countFlightsPerMonth(flights: DataSet[Flight],\n carrier: DataSet[CarrierFlightsCount]) \u003d {\n val carrierFlights \u003d flights\n .join(carrier)\n .where(\"carrierCode\")\n .equalTo(\"carrierCode\")\n .map(_._1)\n \n carrierFlights\n .map(flight \u003d\u003e CountByMonth(flight.month, 1))\n .groupBy(\"month\")\n .sum(\"count\")\n .sortPartition(\"month\", Order.ASCENDING)\n}\n\nval bestCarrierFlights_98 \u003d countFlightsPerMonth(flights98, bestCarrier)\nval bestCarrierFlights_99 \u003d countFlightsPerMonth(flights99, bestCarrier)\nval bestCarrierFlights_00 \u003d countFlightsPerMonth(flights00, bestCarrier)", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:05:06 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true, - "title": true, - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952665972_667547355", - "id": "20170109-120425_2018337048", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "countFlightsPerMonth: (flights: org.apache.flink.api.scala.DataSet[Flight], carrier: org.apache.flink.api.scala.DataSet[CarrierFlightsCount])org.apache.flink.api.scala.DataSet[CountByMonth]\nbestCarrierFlights_98: org.apache.flink.api.scala.DataSet[CountByMonth] \u003d org.apache.flink.api.scala.PartitionSortedDataSet@2aa64309\nbestCarrierFlights_99: org.apache.flink.api.scala.DataSet[CountByMonth] \u003d org.apache.flink.api.scala.PartitionSortedDataSet@35fe60c4\nbestCarrierFlights_00: org.apache.flink.api.scala.DataSet[CountByMonth] \u003d org.apache.flink.api.scala.PartitionSortedDataSet@4621410f\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:04:25 PM", - "dateStarted": "Jan 9, 2017 12:04:50 PM", - "dateFinished": "Jan 9, 2017 12:04:51 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "title": "making a results table", - "text": "%flink\n\ndef monthAsString(month: Int): String \u003d {\n month match {\n case 1 \u003d\u003e \"Jan\"\n case 2 \u003d\u003e \"Feb\"\n case 3 \u003d\u003e \"Mar\"\n case 4 \u003d\u003e \"Apr\"\n case 5 \u003d\u003e \"May\"\n case 6 \u003d\u003e \"Jun\"\n case 7 \u003d\u003e \"Jul\"\n case 8 \u003d\u003e \"Aug\"\n case 9 \u003d\u003e \"Sept\"\n case 10 \u003d\u003e \"Oct\"\n case 11 \u003d\u003e \"Nov\"\n case 12 \u003d\u003e \"Dec\"\n }\n}\n\n// We should put all the results into a common DataFrame\n// to show them in a common picture\nval bestCarrierFlights \u003d bestCarrierFlights_98\n .join(bestCarrierFlights_99)\n .where(\"month\")\n .equalTo(\"month\")\n .map(tuple \u003d\u003e (tuple._1.month, tuple._1.count, tuple._2.count))\n .join(bestCarrierFlights_00)\n .where(0)\n .equalTo(\"month\")\n .map(tuple \u003d\u003e (tuple._1._1, tuple._1._2, tuple._1._3, tuple._2.count))\n .collect\n \nvar flightsByMonthTable \u003d s\"Month\\t1998\\t1999\\t2000\\n\"\nbestCarrierFlights.foreach(data \u003d\u003e flightsByMonthTable +\u003d s\"${monthAsString(data._1)}\\t${data._2}\\t${data._3}\\t${data._4}\\n\")", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:06:03 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true, - "title": true, - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952690164_-1061667443", - "id": "20170109-120450_1574916350", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "monthAsString: (month: Int)String\nbestCarrierFlights: Seq[(Int, Int, Int, Int)] \u003d Buffer((1,78523,77745,78055), (2,71101,70498,71090), (3,78906,77812,78453), (4,75726,75343,75247), (5,77937,77226,76797), (6,75432,75840,74846), (7,77521,77264,75776), (8,78104,78141,77654), (9,74840,75067,73696), (10,76145,77829,77425), (11,73552,74411,73659), (12,77308,76954,75331))\nflightsByMonthTable: String \u003d \n\"Month\t1998\t1999\t2000\n\"\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:04:50 PM", - "dateStarted": "Jan 9, 2017 12:05:24 PM", - "dateFinished": "Jan 9, 2017 12:05:59 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "title": "\"Delta Air Lines\" flights count by months", - "text": "%flink\n\nprintln(s\"\"\"%table\n$flightsByMonthTable\n\"\"\")", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:06:17 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": { - "0": { - "graph": { - "mode": "lineChart", - "height": 300.0, - "optionOpen": false, - "setting": { - "lineChart": {} - }, - "commonSetting": {}, - "keys": [ - { - "name": "Month", - "index": 0.0, - "aggr": "sum" - } - ], - "groups": [], - "values": [ - { - "name": "1998", - "index": 1.0, - "aggr": "sum" - }, - { - "name": "1999", - "index": 2.0, - "aggr": "sum" - }, - { - "name": "2000", - "index": 3.0, - "aggr": "sum" - } - ] - }, - "helium": {} - } - }, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "title": true, - "lineNumbers": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952724460_191505697", - "id": "20170109-120524_2037622815", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "Month\t1998\t1999\t2000\nJan\t78523\t77745\t78055\nFeb\t71101\t70498\t71090\nMar\t78906\t77812\t78453\nApr\t75726\t75343\t75247\nMay\t77937\t77226\t76797\nJun\t75432\t75840\t74846\nJul\t77521\t77264\t75776\nAug\t78104\t78141\t77654\nSept\t74840\t75067\t73696\nOct\t76145\t77829\t77425\nNov\t73552\t74411\t73659\nDec\t77308\t76954\t75331\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:05:24 PM", - "dateStarted": "Jan 9, 2017 12:06:07 PM", - "dateFinished": "Jan 9, 2017 12:06:08 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%md\n### Results\nLooking at this chart we can say that February is the most unpopular month, but this is only because it has less days (28 or 29) than the other months (30 or 31). To receive more fair picture we should calculate the average flights count per day for each month.", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:06:34 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "markdown", - "editOnDblClick": true - }, - "editorMode": "ace/mode/markdown", - "editorHide": true, - "tableHide": false - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952767719_-1010557136", - "id": "20170109-120607_67673280", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "HTML", - "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003eResults\u003c/h3\u003e\n\u003cp\u003eLooking at this chart we can say that February is the most unpopular month, but this is only because it has less days (28 or 29) than the other months (30 or 31). To receive more fair picture we should calculate the average flights count per day for each month.\u003c/p\u003e\n\u003c/div\u003e" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:06:07 PM", - "dateStarted": "Jan 9, 2017 12:06:34 PM", - "dateFinished": "Jan 9, 2017 12:06:34 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%flink\n\ndef daysInMonth(month: Int, year: Int): Int \u003d {\n month match {\n case 1 \u003d\u003e 31\n case 2 \u003d\u003e if (year % 4 \u003d\u003d 0) {\n 29\n } else {\n 28\n }\n case 3 \u003d\u003e 31\n case 4 \u003d\u003e 30\n case 5 \u003d\u003e 31\n case 6 \u003d\u003e 30\n case 7 \u003d\u003e 31\n case 8 \u003d\u003e 31\n case 9 \u003d\u003e 30\n case 10 \u003d\u003e 31\n case 11 \u003d\u003e 30\n case 12 \u003d\u003e 31\n }\n}\n\n\nvar flightsByDayTable \u003d s\"Month\\t1998\\t1999\\t2000\\n\"\n\nbestCarrierFlights.foreach(data \u003d\u003e flightsByDayTable +\u003d s\"${monthAsString(data._1)}\\t${data._2/daysInMonth(data._1,1998)}\\t${data._3/daysInMonth(data._1,1999)}\\t${data._4/daysInMonth(data._1,2000)}\\n\")", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:06:58 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": {}, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "lineNumbers": true, - "tableHide": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952794097_-785833130", - "id": "20170109-120634_492170963", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TEXT", - "data": "daysInMonth: (month: Int, year: Int)Int\nflightsByDayTable: String \u003d \n\"Month\t1998\t1999\t2000\n\"\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:06:34 PM", - "dateStarted": "Jan 9, 2017 12:06:53 PM", - "dateFinished": "Jan 9, 2017 12:06:53 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "title": "\"Delta Air Lines\" flights count by days", - "text": "%flink\n\nprintln(s\"\"\"%table\n$flightsByDayTable\n\"\"\")", - "user": "anonymous", - "dateUpdated": "Jan 9, 2017 12:10:56 PM", - "config": { - "colWidth": 12.0, - "enabled": true, - "results": { - "0": { - "graph": { - "mode": "lineChart", - "height": 300.0, - "optionOpen": false, - "setting": { - "lineChart": {} - }, - "commonSetting": {}, - "keys": [ - { - "name": "Month", - "index": 0.0, - "aggr": "sum" - } - ], - "groups": [], - "values": [ - { - "name": "1998", - "index": 1.0, - "aggr": "sum" - }, - { - "name": "1999", - "index": 2.0, - "aggr": "sum" - }, - { - "name": "2000", - "index": 3.0, - "aggr": "sum" - } - ] - }, - "helium": {} - } - }, - "editorSetting": { - "language": "scala", - "editOnDblClick": false - }, - "editorMode": "ace/mode/scala", - "title": true, - "lineNumbers": true - }, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952813391_1847418990", - "id": "20170109-120653_1870236569", - "results": { - "code": "SUCCESS", - "msg": [ - { - "type": "TABLE", - "data": "Month\t1998\t1999\t2000\nJan\t2533\t2507\t2517\nFeb\t2539\t2517\t2451\nMar\t2545\t2510\t2530\nApr\t2524\t2511\t2508\nMay\t2514\t2491\t2477\nJun\t2514\t2528\t2494\nJul\t2500\t2492\t2444\nAug\t2519\t2520\t2504\nSept\t2494\t2502\t2456\nOct\t2456\t2510\t2497\nNov\t2451\t2480\t2455\nDec\t2493\t2482\t2430\n" - } - ] - }, - "dateCreated": "Jan 9, 2017 12:06:53 PM", - "dateStarted": "Jan 9, 2017 12:07:22 PM", - "dateFinished": "Jan 9, 2017 12:07:23 PM", - "status": "FINISHED", - "progressUpdateIntervalMs": 500 - }, - { - "text": "%flink\n", - "dateUpdated": "Jan 9, 2017 12:07:22 PM", - "config": {}, - "settings": { - "params": {}, - "forms": {} - }, - "apps": [], - "jobName": "paragraph_1483952842919_587228425", - "id": "20170109-120722_939892827", - "dateCreated": "Jan 9, 2017 12:07:22 PM", - "status": "READY", - "progressUpdateIntervalMs": 500 - } - ], - "name": "Using Flink for batch processing", - "id": "2C35YU814", - "angularObjects": { - "2C4PVECE6:shared_process": [], - "2C4US9MUF:shared_process": [], - "2C4FYNB4G:shared_process": [], - "2C4GX28KP:shared_process": [], - "2C648AXXN:shared_process": [], - "2C3MSEJ2F:shared_process": [], - "2C6F2N6BT:shared_process": [], - "2C3US2RTN:shared_process": [], - "2C3TYMD6K:shared_process": [], - "2C3FDPZRX:shared_process": [], - "2C5TEARYX:shared_process": [], - "2C5D6NSNG:shared_process": [], - "2C6FVVEAD:shared_process": [], - "2C582KNWG:shared_process": [], - "2C6ZMVGM7:shared_process": [], - "2C6UYQG8R:shared_process": [], - "2C666VZT2:shared_process": [], - "2C4JRCY3K:shared_process": [], - "2C64W5T9D:shared_process": [] - }, - "config": { - "looknfeel": "default" - }, - "info": {} -}