-
Notifications
You must be signed in to change notification settings - Fork 96
Add Spark and Iceberg Quickstart and quickstarts shortcode #110
Conversation
| ## Spark and Iceberg Quickstart | ||
|
|
||
| This guide will get you up and running with an Iceberg and Spark environment, including sample code to | ||
| highlight some powerful features. You can learn more about Iceberg's Spark runtime by checking out the [Spark](../docs/latest/spark-ddl/) section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this link correct? It goes to spark-ddl.
|
|
||
| Iceberg catalogs support the full range of SQL DDL commands, including: | ||
|
|
||
| * [`CREATE TABLE ... PARTITIONED BY`](../spark-ddl#create-table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a short explanation of what these are used for, like "CREATE TABLE ... PARTITIONED BY to index data for queries" and "ALTER TABLE to update table schemas or other config"
|
|
||
|
|
||
| {{< hint info >}} | ||
| If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE demo;` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says the catalog is called "demo" but it was introduced as "local" in the paragraph above. I think it should be "local" since the rest of the guide uses a pre-configured "demo" catalog.
|
|
||
| #### Adding Iceberg to Spark | ||
|
|
||
| If you already have a Spark environment, you can add Iceberg, using the `--packages` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a real deployment, you'd want to add the runtime Jar to your Spark install's jars folder. I think this should say that first (and link to the 3.2 Jar) and then add that you can add Iceberg to a single Spark session using the --packages option.
Also, this should point the reader to catalogs, since that's how you would add a catalog.
| - [Adding A Catalog](#adding-a-catalog) | ||
| - [Next Steps](#next-steps) | ||
|
|
||
| ### Docker-Compose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this whole thing in a tab? It would be great to have one option for Spark CLI (e.g., you already have Spark installed) and another tab for this.
| The notebook server will be available at [http://localhost:8888](http://localhost:8888) | ||
| {{< /hint >}} | ||
|
|
||
| ### Creating a table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the next few sections should be in a separate higher-level section for interacting with tables. There is a good chance that this is the content that people want to skip to.
|
|
||
| ### Writing Data to a Table | ||
|
|
||
| Once your table is created, you can insert records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sections are the core of the Quickstart guide, but they have hardly any content. I think this quickstart needs to focus on what you can do with Iceberg and how, and less on getting set up. Setup is necessary, but shouldn't be the focus.
|
I think that we can improve a few things on this Quickstart, but the structure is basically there, so we can go ahead and make the majority of the changes and come back to improve it over time. |
This is a rebased version of PR #75. Here is the PR comment from that PR: