Spark quickstart page #75

samredai · 2022-04-25T19:18:31Z

This adds a Spark Quickstart page that effectively replaces the Spark Getting-Started page in the docs site.

This quickstart uses the tabulario/spark-iceberg docker image and all code snippets aim to be directly copy-pastable and run successfully in a fresh docker-compose up. Furthermore all code snippets are provided with tabs to see the equivalent logic in SQL, Scala, or Python.

This is dependent on PR #73 and is part of the broader initiative outlined in this issue in the iceberg repo.

note: I first added the current getting-started page in a commit as-is so that the diff will be visible in the next commit 7d643ba.

UPDATED

landing-page/content/common/spark-quickstart.md

rdblue · 2022-05-04T23:54:36Z

landing-page/content/common/spark-quickstart.md

+### Creating a table
+
+To create your first Iceberg table in Spark, run a [`CREATE TABLE`](../spark-ddl#create-table) command. In the following example, we'll create a table
+using `prod.nyc.taxis` where `prod` is the catalog name, `nyc` is the schema name, and `taxis` is the table name.


Should we use test or a different catalog name? This isn't a prod catalog, so it seems odd to call it prod.

Thanks for noticing this. I've changed the catalog name everywhere to demo which is what we use in the docker image and seems to be sticking as a catalog name we use in examples elsewhere.

landing-page/content/common/spark-quickstart.md

samredai · 2022-05-17T20:28:52Z

Thanks @rdblue! Sorry for taking a while to address the comments here. I have some more structural theme changes that I'm getting in now and this should be fully ready for a review+merge.

samredai · 2022-05-17T22:39:06Z

This is ready for review. I've added only the Quickstarts menu item in the top navbar. The Concepts section is hidden until we have some content to fill that out.

quickstart.mp4

landing-page/layouts/_default/single.html

landing-page/layouts/partials/header.html

landing-page/content/about/about.html

landing-page/static/css/landing-page.css

samredai · 2022-05-24T14:57:54Z

Rebased and squashed all commits! (now that PR #73 has been merged)

landing-page/content/common/concepts.md

landing-page/content/common/quickstarts.md

rdblue · 2022-05-24T16:52:19Z

landing-page/content/common/quickstarts.md

+{{% codetabs "AddIcebergToSpark" %}}
+{{% addtab "SparkSQL" checked %}}
+{{% addtab "SparkShell" %}}
+{{% addtab "PySpark" %}}


Would it be a good idea to use tabs for Spark version instead?

Could this potentially be confusing for someone trying out Iceberg for the first time? The quickstart begins with a docker image which we'd have to keep in sync with the examples here. If we also include examples for other Spark versions, someone trying this out would have to concern themselves with which version of Spark we've included in the example image to make sure they're using the correct code snippets.

landing-page/content/common/quickstarts.md

landing-page/content/about/about.html

landing-page/content/common/concepts.md

landing-page/content/common/quickstarts.md

samredai · 2022-06-29T00:05:30Z

This is ready for another review. I've rebased this to use the new iceberg-theme and also included a "quickstart" shortcode that renders a "More Quickstarts" dropdown menu at the top of each quickstart guide. The shortcode will also exclude the current quickstart page you're on from the dropdown which means until we have a second one, the dropdown will be empty. Here's a video that generally shows what this looks like where I also add in a handful of entries to the quickstart menu to show what the dropdown looks like.

quickstart-cards.mp4

iceberg-theme/static/css/iceberg-theme.css

rdblue · 2022-06-29T15:40:02Z

landing-page/content/common/spark-quickstart.md

+The fastest way to get started is to use a docker-compose file that uses the the [tabulario/spark-iceberg](https://hub.docker.com/r/tabulario/spark-iceberg) image
+which contains a local Spark cluster with a configured Iceberg catalog. To use this, you'll need to install the [Docker CLI](https://docs.docker.com/get-docker/) as well as the [Docker Compose CLI](https://github.com/docker/compose-cli/blob/main/INSTALL.md).
+
+Once you have those, save the yaml below into a file named `docker-compose.yml`:


rdblue · 2022-06-29T15:41:14Z

landing-page/content/common/spark-quickstart.md

+- [Adding A Catalog](#adding-a-catalog)
+- [Next Steps](#next-steps)
+
+### Docker-Compose


Later, we should consider keeping the Spark shell instructions, in case anyone already has Spark.

landing-page/content/common/spark-quickstart.md

rdblue · 2022-06-29T15:43:46Z

landing-page/content/common/spark-quickstart.md

+    - "quickstarts"
+    - "getting-started"
+disableSidebar: true
+disableToc: true


Why disabled?

Since we include a TOC at the top of the quickstart, it felt odd seeing both. Would you rather have the fixed TOC on the right and remove it from the intro section?

I'd probably include it eventually, but that would happen when there's more content.

landing-page/content/common/spark-quickstart.md

rdblue · 2022-06-29T15:49:16Z

landing-page/content/common/spark-quickstart.md

+{{% /tabcontent %}}
+{{% /codetabs %}}
+{{< hint info >}}
+You can also launch a notebook server by running `docker exec -it spark-iceberg notebook`.


How about "If you prefer a PySpark notebook, you can start a notebook server by ..."? That way it is clear that the notebooks will be PySpark.

The notebooks actually can be python or java (both kernels are available) although I know it's less likely that Java devs would want to use the java kernel to run a spark app. I've been meaning to add a scala kernel as well.

databricks/docker-spark-iceberg#25

rdblue · 2022-06-29T15:50:24Z

landing-page/content/common/spark-quickstart.md

+You can then run any of the following commands to start a Spark session.
+
+{{% codetabs "LaunchSparkClient" %}}
+{{% addtab "SparkSQL" checked %}}


Is there a way to change all of the tabs at once, depending on what is checked for any of them? That would be really helpful.

I was able to do this with a small amount of javascript and introducing a concept of "groups" to the shortcodes behind this. It's described in the readme that I add in PR #110 and the logic is added to iceberg-theme.js.

rdblue · 2022-06-29T15:54:30Z

landing-page/content/common/spark-quickstart.md

+{{% /tabcontent %}}
+{{% /codetabs %}}
+
+### Adding Iceberg to Spark


This doesn't fit with the flow of the quickstart page. It goes directly from an example of reading to a shell command to start Spark.

I think it would help to have more content explaining what is happening and giving context. This should also have an outline that is more clear, with higher-level sections like "Interacting with tables" where creating, reading, and writing sections will go.

I think this content should be in "Next steps" because it covers how to get Iceberg installed outside of the quickstart. Maybe that should be a Spark page and not a quickstart page, so Next steps just links to it?

Agreed, the section definitely breaks the flow of the guide. I moved it to the end and now the order looks like:

Docker-Compose

Creating a table

Writing Data to a Table

Reading Data from a Table

Adding A Catalog

Next Steps

findinpath reviewed Apr 26, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

findinpath reviewed Apr 26, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Show resolved Hide resolved

findinpath approved these changes Apr 27, 2022

View reviewed changes

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 4, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 5, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

rdblue reviewed May 5, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Outdated Show resolved Hide resolved

samredai marked this pull request as ready for review May 17, 2022 22:37

samredai commented May 17, 2022

View reviewed changes

landing-page/layouts/_default/single.html Outdated Show resolved Hide resolved

samredai commented May 17, 2022

View reviewed changes

landing-page/layouts/partials/header.html Outdated Show resolved Hide resolved

samredai commented May 17, 2022

View reviewed changes

landing-page/content/about/about.html Outdated Show resolved Hide resolved

samredai mentioned this pull request May 18, 2022

Update the Hive documentation for 4.0.0-alpha-1 apache/iceberg#4805

Closed

samredai commented May 24, 2022

View reviewed changes

landing-page/static/css/landing-page.css Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/concepts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Show resolved Hide resolved

rdblue reviewed May 24, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

rdblue reviewed Jun 7, 2022

View reviewed changes

landing-page/content/about/about.html Outdated Show resolved Hide resolved

rdblue reviewed Jun 7, 2022

View reviewed changes

landing-page/content/common/concepts.md Outdated Show resolved Hide resolved

rdblue reviewed Jun 7, 2022

View reviewed changes

landing-page/content/common/quickstarts.md Outdated Show resolved Hide resolved

samredai mentioned this pull request Jun 21, 2022

Add reference to Apache Impala documentation #94

Merged

Add Spark and Iceberg Quickstart and quickstarts shortcode

ec311ce

samredai mentioned this pull request Jun 29, 2022

Add Hive and Impala to landing page navbar links #104

Closed

rdblue reviewed Jun 29, 2022

View reviewed changes

iceberg-theme/static/css/iceberg-theme.css Show resolved Hide resolved

rdblue reviewed Jun 29, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Show resolved Hide resolved

rdblue reviewed Jun 29, 2022

View reviewed changes

landing-page/content/common/spark-quickstart.md Show resolved Hide resolved

rdblue reviewed Jun 29, 2022

View reviewed changes

samredai mentioned this pull request Jul 6, 2022

Add Spark and Iceberg Quickstart and quickstarts shortcode #110

Merged

This pull request was closed.

Spark quickstart page #75

Spark quickstart page #75

Uh oh!

Conversation

samredai commented Apr 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samredai commented May 17, 2022

Uh oh!

samredai commented May 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samredai commented May 24, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samredai commented Jun 29, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samredai commented Apr 25, 2022 •

edited

Loading