Add Spark and Iceberg Quickstart and quickstarts shortcode #110

samredai · 2022-07-06T16:24:14Z

This is a rebased version of PR #75. Here is the PR comment from that PR:

This adds a Spark Quickstart page that effectively replaces the Spark Getting-Started page in the docs site.

This quickstart uses the tabulario/spark-iceberg docker image and all code snippets aim to be directly copy-pastable and run successfully in a fresh docker-compose up. Furthermore all code snippets are provided with tabs to see the equivalent logic in SQL, Scala, or Python.

rdblue · 2022-07-07T22:34:46Z

landing-page/content/common/spark-quickstart.md

+## Spark and Iceberg Quickstart
+
+This guide will get you up and running with an Iceberg and Spark environment, including sample code to
+highlight some powerful features. You can learn more about Iceberg's Spark runtime by checking out the [Spark](../docs/latest/spark-ddl/) section.


Is this link correct? It goes to spark-ddl.

rdblue · 2022-07-07T22:40:06Z

landing-page/content/common/spark-quickstart.md

+
+Iceberg catalogs support the full range of SQL DDL commands, including:
+
+* [`CREATE TABLE ... PARTITIONED BY`](../spark-ddl#create-table)


It would be nice to have a short explanation of what these are used for, like "CREATE TABLE ... PARTITIONED BY to index data for queries" and "ALTER TABLE to update table schemas or other config"

rdblue · 2022-07-07T22:44:57Z

landing-page/content/common/spark-quickstart.md

+
+
+{{< hint info >}}
+If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE demo;`


This says the catalog is called "demo" but it was introduced as "local" in the paragraph above. I think it should be "local" since the rest of the guide uses a pre-configured "demo" catalog.

rdblue · 2022-07-07T22:47:41Z

landing-page/content/common/spark-quickstart.md

+
+#### Adding Iceberg to Spark
+
+If you already have a Spark environment, you can add Iceberg, using the `--packages` option.


For a real deployment, you'd want to add the runtime Jar to your Spark install's jars folder. I think this should say that first (and link to the 3.2 Jar) and then add that you can add Iceberg to a single Spark session using the --packages option.

Also, this should point the reader to catalogs, since that's how you would add a catalog.

rdblue · 2022-07-07T22:49:16Z

landing-page/content/common/spark-quickstart.md

+- [Adding A Catalog](#adding-a-catalog)
+- [Next Steps](#next-steps)
+
+### Docker-Compose


Can we put this whole thing in a tab? It would be great to have one option for Spark CLI (e.g., you already have Spark installed) and another tab for this.

rdblue · 2022-07-07T22:49:55Z

landing-page/content/common/spark-quickstart.md

+The notebook server will be available at [http://localhost:8888](http://localhost:8888)
+{{< /hint >}}
+
+### Creating a table


I think this and the next few sections should be in a separate higher-level section for interacting with tables. There is a good chance that this is the content that people want to skip to.

rdblue · 2022-07-07T22:51:32Z

landing-page/content/common/spark-quickstart.md

+
+### Writing Data to a Table
+
+Once your table is created, you can insert records.


These sections are the core of the Quickstart guide, but they have hardly any content. I think this quickstart needs to focus on what you can do with Iceberg and how, and less on getting set up. Setup is necessary, but shouldn't be the focus.

rdblue · 2022-07-07T22:52:36Z

I think that we can improve a few things on this Quickstart, but the structure is basically there, so we can go ahead and make the majority of the changes and come back to improve it over time.

samredai added 2 commits July 5, 2022 10:39

Add Spark and Iceberg Quickstart and quickstarts shortcode

811eeb4

Add coordinate codetab switching and README for iceberg-theme

d245741

samredai mentioned this pull request Jul 7, 2022

Spark quickstart page #75

Closed

rdblue reviewed Jul 7, 2022

View reviewed changes

rdblue merged commit 9fb9e35 into apache:main Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Spark and Iceberg Quickstart and quickstarts shortcode #110

Add Spark and Iceberg Quickstart and quickstarts shortcode #110

Uh oh!

samredai commented Jul 6, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue Jul 7, 2022

Uh oh!

rdblue commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Iceberg catalogs support the full range of SQL DDL commands, including:

		* [`CREATE TABLE ... PARTITIONED BY`](../spark-ddl#create-table)



		{{< hint info >}}
		If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE demo;`


		#### Adding Iceberg to Spark

		If you already have a Spark environment, you can add Iceberg, using the `--packages` option.


		### Writing Data to a Table

		Once your table is created, you can insert records.

Add Spark and Iceberg Quickstart and quickstarts shortcode #110

Add Spark and Iceberg Quickstart and quickstarts shortcode #110

Uh oh!

Conversation

samredai commented Jul 6, 2022

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants