diff --git a/Working with Git/Working with Git.ipynb b/Working with Git/Working with Git.ipynb index c622766..ba4d2d1 100644 --- a/Working with Git/Working with Git.ipynb +++ b/Working with Git/Working with Git.ipynb @@ -1,288 +1,274 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "38d31fbc-6666-4495-a2b1-d716ffe24329", - "metadata": { - "collapsed": false, - "name": "cell1" - }, - "source": [ - "In this example, we will demonstrate how you can easily go from prototyping for development purposes to production with Git integration.\n", - "\n", - "We will show an example of a simple data pipeline with one query. By changing the `MODE` variable to `DEV` or `PROD` with different warehouse and schema configurations.\n", - "\n", - "For `DEV`, we will be using an extra small warehouse on a sample of the TPCH data.\n", - "For `PROD`, we will be using a large warehouse on a sample of the TPCH data that is 100X the size of the DEV sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3775908f-ca36-4846-8f38-5adca39217f2", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "python", - "name": "cell2" - }, - "outputs": [], - "source": [ - "MODE = \"DEV\" # Parameter to control whether to run in DEV or PROD mode\n", - "\n", - "if MODE == \"DEV\":\n", - " # For development, use XSMALL warehouse on TPCH data with scale factor of 1\n", - " warehouse_name = \"GIT_EXAMPLE_DEV_WH\"\n", - " schema_name = \"TPCH_SF1\"\n", - " size = 'XSMALL'\n", - "elif MODE == \"PROD\": \n", - " # For production, use LARGE warehouse on TPCH data with scale factor of 100\n", - " warehouse_name = \"GIT_EXAMPLE_PROD_WH\"\n", - " schema_name = \"TPCH_SF100\"\n", - " size = 'LARGE'" - ] - }, - { - "cell_type": "markdown", - "id": "01bd1a4d-1715-4c10-8fdc-08be7b115be5", - "metadata": { - "name": "cell3" - }, - "source": [ - "Let's create and use a warehouse with the specified name and size." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "55bb9c45-e1e4-49ba-a7db-e5eb671ad13a", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "sql", - "name": "cell4" - }, - "outputs": [], - "source": [ - "-- Create warehouse with specified name and size\n", - "CREATE OR REPLACE WAREHOUSE {{warehouse_name}} WITH WAREHOUSE_SIZE= {{size}};" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2b1f4b91-7988-432b-afe1-cb599eea5cc6", - "metadata": { - "collapsed": false, - "language": "sql", - "name": "cell5" - }, - "outputs": [], - "source": [ - "-- Use specified warehouse for subsequent query\n", - "USE WAREHOUSE {{warehouse_name}};" - ] - }, - { - "cell_type": "markdown", - "id": "f330162f-b59e-467d-bc4e-5c297993c4ee", - "metadata": { - "collapsed": false, - "name": "cell6" - }, - "source": [ - "Use the TPC-H Sample dataset with differing scale factor. \n", - "- Note: Sample data sets are provided in a database named SNOWFLAKE_SAMPLE_DATA that has been shared with your account from the Snowflake SFC_SAMPLES account. If you do not see the database, you can create it yourself. Refer to [Using the Sample Database](https://docs.snowflake.com/en/user-guide/sample-data-using)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "edb15abf-6061-4e29-9d45-85b0cc806e71", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "sql", - "name": "cell7" - }, - "outputs": [], - "source": [ - "USE SCHEMA SNOWFLAKE_SAMPLE_DATA.{{schema_name}}; " - ] - }, - { - "cell_type": "markdown", - "id": "024892ff-b2df-4a4d-9308-1760751b4dae", - "metadata": { - "collapsed": false, - "name": "cell8" - }, - "source": [ - "Check out the number of rows in the `LINEITEM` table." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e73a5b30-fdcc-4dd6-9619-f19a5c31e769", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "sql", - "name": "cell9" - }, - "outputs": [], - "source": [ - "SELECT COUNT(*) FROM LINEITEM;" - ] - }, - { - "cell_type": "markdown", - "id": "115c9b33-f508-4385-806d-20bada66fe18", - "metadata": { - "collapsed": false, - "name": "cell10" - }, - "source": [ - "Now let's run a query on this dataset:\n", - "- The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS. A count of the number of line items in each group is included." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8d50cbf4-0c8d-4950-86cb-114990437ac9", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "sql", - "name": "cell11" - }, - "outputs": [], - "source": [ - "select\n", - " l_returnflag,\n", - " l_linestatus,\n", - " sum(l_quantity) as sum_qty,\n", - " sum(l_extendedprice) as sum_base_price,\n", - " sum(l_extendedprice * (1-l_discount)) as sum_disc_price,\n", - " sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,\n", - " avg(l_quantity) as avg_qty,\n", - " avg(l_extendedprice) as avg_price,\n", - " avg(l_discount) as avg_disc,\n", - " count(*) as count_order\n", - " from\n", - " lineitem\n", - " where\n", - " l_shipdate <= dateadd(day, -90, to_date('1998-12-01'))\n", - " group by\n", - " l_returnflag,\n", - " l_linestatus\n", - " order by\n", - " l_returnflag,\n", - " l_linestatus;" - ] - }, - { - "cell_type": "markdown", - "id": "170637df-6e8b-498a-8f2a-fda1a41c21ca", - "metadata": { - "collapsed": false, - "name": "cell12" - }, - "source": [ - "Using the cell referencing, we get the query ID and history of the query we just ran." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c49eb85b-6956-4da6-949f-1939c6a1dcc4", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "python", - "name": "cell13" - }, - "outputs": [], - "source": [ - "# Get query ID of the referenced cell\n", - "query_id = cell11.result_scan_sql().split(\"'\")[1]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dfd22f9f-44ef-4a3f-99e6-7c774b02eea7", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "sql", - "name": "cell14" - }, - "outputs": [], - "source": [ - "select * from table(information_schema.query_history_by_warehouse('{{warehouse_name}}')) \n", - "where query_id = '{{query_id}}';" - ] - }, - { - "cell_type": "markdown", - "id": "ef4d7fcb-9729-4409-8bce-7a7081b98e87", - "metadata": { - "name": "cell15" - }, - "source": [ - "Finally, we compile all of this information into a report to document the run information." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9b718981-9577-4996-b212-0cf7ffb4f23b", - "metadata": { - "codeCollapsed": false, - "collapsed": false, - "language": "python", - "name": "cell16" - }, - "outputs": [], - "source": [ - "import streamlit as st\n", - "from datetime import datetime\n", - "st.header(f\"[{MODE}] Run Report\")\n", - "st.markdown(f\"Generated on: {datetime.now()}\")\n", - "\n", - "st.markdown(f\"### System Information\")\n", - "# Print session information\n", - "from snowflake.snowpark.context import get_active_session\n", - "session = get_active_session()\n", - "# Add a query tag to the session. This helps with troubleshooting and performance monitoring.\n", - "session.query_tag = {\"origin\":\"sf_sit-is\", \n", - " \"name\":\"notebook_demo_pack\", \n", - " \"version\":{\"major\":1, \"minor\":0},\n", - " \"attributes\":{\"is_quickstart\":1, \"source\":\"notebook\", \"vignette\":\"working_with_git\"}}\n", - "st.markdown(f\"**Database:** {session.get_current_database()[1:-1]}\")\n", - "st.markdown(f\"**Schema:** {session.get_current_schema()[1:-1]}\")\n", - "st.markdown(f\"**Warehouse:** {session.get_current_warehouse()[1:-1]}\")\n", - "\n", - "st.markdown(f\"### Query Information\")\n", - "# Print session information\n", - "st.markdown(f\"**Query ID:** {query_id}\")\n", - "result_info = cell14.to_pandas()\n", - "st.markdown(\"**Query Text:**\")\n", - "st.code(result_info[\"QUERY_TEXT\"].values[0],language='sql',line_numbers=True)\n", - "st.markdown(\"**Runtime information:**\")\n", - "st.dataframe(result_info[['START_TIME','END_TIME','TOTAL_ELAPSED_TIME']])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Streamlit Notebook", - "name": "streamlit" - } + "metadata": { + "kernelspec": { + "display_name": "Streamlit Notebook", + "name": "streamlit" }, - "nbformat": 4, - "nbformat_minor": 5 -} + "lastEditStatus": { + "notebookId": "x4uoonl6smacl4lne3qv", + "authorId": "1973481707579", + "authorName": "SHASHANKSNOWFLAKE", + "authorEmail": "shashank.khs183@gmail.com", + "sessionId": "3efdc76b-33dd-4097-8449-582cf15ba97e", + "lastEditTime": 1739053715974 + } + }, + "nbformat_minor": 5, + "nbformat": 4, + "cells": [ + { + "cell_type": "markdown", + "id": "38d31fbc-6666-4495-a2b1-d716ffe24329", + "metadata": { + "collapsed": false, + "name": "cell1" + }, + "source": [ + "In this example, we will demonstrate how you can easily go from prototyping for development purposes to production with Git integration.\n", + "\n", + "We will show an example of a simple data pipeline with one query. By changing the `MODE` variable to `DEV` or `PROD` with different warehouse and schema configurations.\n", + "\n", + "For `DEV`, we will be using an extra small warehouse on a sample of the TPCH data.\n", + "For `PROD`, we will be using a large warehouse on a sample of the TPCH data that is 100X the size of the DEV sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3775908f-ca36-4846-8f38-5adca39217f2", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "python", + "name": "cell2" + }, + "outputs": [], + "source": [ + "MODE = \"DEV\" # Parameter to control whether to run in DEV or PROD mode\n", + "\n", + "if MODE == \"DEV\":\n", + " # For development, use XSMALL warehouse on TPCH data with scale factor of 1\n", + " warehouse_name = \"GIT_EXAMPLE_DEV_WH\"\n", + " schema_name = \"TPCH_SF1\"\n", + " size = 'XSMALL'\n", + "elif MODE == \"PROD\": \n", + " # For production, use LARGE warehouse on TPCH data with scale factor of 100\n", + " warehouse_name = \"GIT_EXAMPLE_PROD_WH\"\n", + " schema_name = \"TPCH_SF100\"\n", + " size = 'LARGE'" + ] + }, + { + "cell_type": "markdown", + "id": "01bd1a4d-1715-4c10-8fdc-08be7b115be5", + "metadata": { + "name": "cell3" + }, + "source": [ + "Let's create and use a warehouse with the specified name and size." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55bb9c45-e1e4-49ba-a7db-e5eb671ad13a", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "sql", + "name": "cell4" + }, + "outputs": [], + "source": [ + "-- Create warehouse with specified name and size\n", + "CREATE OR REPLACE WAREHOUSE {{warehouse_name}} WITH WAREHOUSE_SIZE= {{size}};" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b1f4b91-7988-432b-afe1-cb599eea5cc6", + "metadata": { + "collapsed": false, + "language": "sql", + "name": "cell5" + }, + "outputs": [], + "source": [ + "-- Use specified warehouse for subsequent query\n", + "USE WAREHOUSE {{warehouse_name}};" + ] + }, + { + "cell_type": "markdown", + "id": "f330162f-b59e-467d-bc4e-5c297993c4ee", + "metadata": { + "collapsed": false, + "name": "cell6" + }, + "source": [ + "Use the TPC-H Sample dataset with differing scale factor. \n", + "- Note: Sample data sets are provided in a database named SNOWFLAKE_SAMPLE_DATA that has been shared with your account from the Snowflake SFC_SAMPLES account. If you do not see the database, you can create it yourself. Refer to [Using the Sample Database](https://docs.snowflake.com/en/user-guide/sample-data-using)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "edb15abf-6061-4e29-9d45-85b0cc806e71", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "sql", + "name": "cell7" + }, + "outputs": [], + "source": [ + "USE SCHEMA SNOWFLAKE_SAMPLE_DATA.{{schema_name}}; " + ] + }, + { + "cell_type": "markdown", + "id": "024892ff-b2df-4a4d-9308-1760751b4dae", + "metadata": { + "collapsed": false, + "name": "cell8" + }, + "source": [ + "Check out the number of rows in the `LINEITEM` table." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e73a5b30-fdcc-4dd6-9619-f19a5c31e769", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "sql", + "name": "cell9" + }, + "outputs": [], + "source": [ + "SELECT COUNT(*) FROM LINEITEM;" + ] + }, + { + "cell_type": "markdown", + "id": "115c9b33-f508-4385-806d-20bada66fe18", + "metadata": { + "collapsed": false, + "name": "cell10" + }, + "source": [ + "Now let's run a query on this dataset:\n", + "- The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS. A count of the number of line items in each group is included." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d50cbf4-0c8d-4950-86cb-114990437ac9", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "sql", + "name": "cell11" + }, + "outputs": [], + "source": "select\n l_returnflag,\n l_linestatus,\n sum(l_quantity) as sum_qty,\n sum(l_extendedprice) as sum_base_price,\n sum(l_extendedprice * (1-l_discount)) as sum_disc_price,\n sum(l_extendedprice * (1-l_discount) * (1+l_tax)) as sum_charge,\n avg(l_quantity) as avg_qty,\n avg(l_extendedprice) as avg_price,\n avg(l_discount) as avg_disc,\n count(*) as count_order\n from\n lineitem\n group by\n l_returnflag,\n l_linestatus\n order by\n l_returnflag,\n l_linestatus;" + }, + { + "cell_type": "markdown", + "id": "170637df-6e8b-498a-8f2a-fda1a41c21ca", + "metadata": { + "collapsed": false, + "name": "cell12" + }, + "source": [ + "Using the cell referencing, we get the query ID and history of the query we just ran." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c49eb85b-6956-4da6-949f-1939c6a1dcc4", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "python", + "name": "cell13" + }, + "outputs": [], + "source": [ + "# Get query ID of the referenced cell\n", + "query_id = cell11.result_scan_sql().split(\"'\")[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dfd22f9f-44ef-4a3f-99e6-7c774b02eea7", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "sql", + "name": "cell14" + }, + "outputs": [], + "source": [ + "select * from table(information_schema.query_history_by_warehouse('{{warehouse_name}}')) \n", + "where query_id = '{{query_id}}';" + ] + }, + { + "cell_type": "markdown", + "id": "ef4d7fcb-9729-4409-8bce-7a7081b98e87", + "metadata": { + "name": "cell15" + }, + "source": [ + "Finally, we compile all of this information into a report to document the run information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b718981-9577-4996-b212-0cf7ffb4f23b", + "metadata": { + "codeCollapsed": false, + "collapsed": false, + "language": "python", + "name": "cell16" + }, + "outputs": [], + "source": [ + "import streamlit as st\n", + "from datetime import datetime\n", + "st.header(f\"[{MODE}] Run Report\")\n", + "st.markdown(f\"Generated on: {datetime.now()}\")\n", + "\n", + "st.markdown(f\"### System Information\")\n", + "# Print session information\n", + "from snowflake.snowpark.context import get_active_session\n", + "session = get_active_session()\n", + "# Add a query tag to the session. This helps with troubleshooting and performance monitoring.\n", + "session.query_tag = {\"origin\":\"sf_sit-is\", \n", + " \"name\":\"notebook_demo_pack\", \n", + " \"version\":{\"major\":1, \"minor\":0},\n", + " \"attributes\":{\"is_quickstart\":1, \"source\":\"notebook\", \"vignette\":\"working_with_git\"}}\n", + "st.markdown(f\"**Database:** {session.get_current_database()[1:-1]}\")\n", + "st.markdown(f\"**Schema:** {session.get_current_schema()[1:-1]}\")\n", + "st.markdown(f\"**Warehouse:** {session.get_current_warehouse()[1:-1]}\")\n", + "\n", + "st.markdown(f\"### Query Information\")\n", + "# Print session information\n", + "st.markdown(f\"**Query ID:** {query_id}\")\n", + "result_info = cell14.to_pandas()\n", + "st.markdown(\"**Query Text:**\")\n", + "st.code(result_info[\"QUERY_TEXT\"].values[0],language='sql',line_numbers=True)\n", + "st.markdown(\"**Runtime information:**\")\n", + "st.dataframe(result_info[['START_TIME','END_TIME','TOTAL_ELAPSED_TIME']])" + ] + } + ] +} \ No newline at end of file