-
Notifications
You must be signed in to change notification settings - Fork 96
Spark quickstart page #75
Conversation
| ### Creating a table | ||
|
|
||
| To create your first Iceberg table in Spark, run a [`CREATE TABLE`](../spark-ddl#create-table) command. In the following example, we'll create a table | ||
| using `prod.nyc.taxis` where `prod` is the catalog name, `nyc` is the schema name, and `taxis` is the table name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use test or a different catalog name? This isn't a prod catalog, so it seems odd to call it prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this. I've changed the catalog name everywhere to demo which is what we use in the docker image and seems to be sticking as a catalog name we use in examples elsewhere.
|
Thanks @rdblue! Sorry for taking a while to address the comments here. I have some more structural theme changes that I'm getting in now and this should be fully ready for a review+merge. |
|
This is ready for review. I've added only the Quickstarts menu item in the top navbar. The Concepts section is hidden until we have some content to fill that out. quickstart.mp4 |
|
Rebased and squashed all commits! (now that PR #73 has been merged) |
| {{% codetabs "AddIcebergToSpark" %}} | ||
| {{% addtab "SparkSQL" checked %}} | ||
| {{% addtab "SparkShell" %}} | ||
| {{% addtab "PySpark" %}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to use tabs for Spark version instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this potentially be confusing for someone trying out Iceberg for the first time? The quickstart begins with a docker image which we'd have to keep in sync with the examples here. If we also include examples for other Spark versions, someone trying this out would have to concern themselves with which version of Spark we've included in the example image to make sure they're using the correct code snippets.
|
This is ready for another review. I've rebased this to use the new quickstart-cards.mp4 |
| The fastest way to get started is to use a docker-compose file that uses the the [tabulario/spark-iceberg](https://hub.docker.com/r/tabulario/spark-iceberg) image | ||
| which contains a local Spark cluster with a configured Iceberg catalog. To use this, you'll need to install the [Docker CLI](https://docs.docker.com/get-docker/) as well as the [Docker Compose CLI](https://github.com/docker/compose-cli/blob/main/INSTALL.md). | ||
|
|
||
| Once you have those, save the yaml below into a file named `docker-compose.yml`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later, we should consider adding a Quickstart folder in the Iceberg repo. We could have a tree like this:
quickstart/
|- README.md
|- spark/
| |- README.md
| |- quickstart.ipynb
| `- docker-compose.yml
`- flink/
|- README.md
|- quickstart.ipynb
`- docker-compose.yml
| - [Adding A Catalog](#adding-a-catalog) | ||
| - [Next Steps](#next-steps) | ||
|
|
||
| ### Docker-Compose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later, we should consider keeping the Spark shell instructions, in case anyone already has Spark.
| - "quickstarts" | ||
| - "getting-started" | ||
| disableSidebar: true | ||
| disableToc: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we include a TOC at the top of the quickstart, it felt odd seeing both. Would you rather have the fixed TOC on the right and remove it from the intro section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably include it eventually, but that would happen when there's more content.
| {{% /tabcontent %}} | ||
| {{% /codetabs %}} | ||
| {{< hint info >}} | ||
| You can also launch a notebook server by running `docker exec -it spark-iceberg notebook`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "If you prefer a PySpark notebook, you can start a notebook server by ..."? That way it is clear that the notebooks will be PySpark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebooks actually can be python or java (both kernels are available) although I know it's less likely that Java devs would want to use the java kernel to run a spark app. I've been meaning to add a scala kernel as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You can then run any of the following commands to start a Spark session. | ||
|
|
||
| {{% codetabs "LaunchSparkClient" %}} | ||
| {{% addtab "SparkSQL" checked %}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to change all of the tabs at once, depending on what is checked for any of them? That would be really helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {{% /tabcontent %}} | ||
| {{% /codetabs %}} | ||
|
|
||
| ### Adding Iceberg to Spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fit with the flow of the quickstart page. It goes directly from an example of reading to a shell command to start Spark.
I think it would help to have more content explaining what is happening and giving context. This should also have an outline that is more clear, with higher-level sections like "Interacting with tables" where creating, reading, and writing sections will go.
I think this content should be in "Next steps" because it covers how to get Iceberg installed outside of the quickstart. Maybe that should be a Spark page and not a quickstart page, so Next steps just links to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the section definitely breaks the flow of the guide. I moved it to the end and now the order looks like:
- Docker-Compose
- Creating a table
- Writing Data to a Table
- Reading Data from a Table
- Adding A Catalog
- Next Steps
This adds a Spark Quickstart page that effectively replaces the Spark Getting-Started page in the docs site.
This quickstart uses the tabulario/spark-iceberg docker image and all code snippets aim to be directly copy-pastable and run successfully in a fresh
docker-compose up. Furthermore all code snippets are provided with tabs to see the equivalent logic in SQL, Scala, or Python.This is dependent on PR #73 and is part of the broader initiative outlined in this issue in the iceberg repo.
note: I first added the current getting-started page in a commit as-is so that the diff will be visible in the next commit 7d643ba.
UPDATED