Skip to content
This repository was archived by the owner on Feb 6, 2024. It is now read-only.

Conversation

@samredai
Copy link
Contributor

@samredai samredai commented Jul 6, 2022

This is a rebased version of PR #75. Here is the PR comment from that PR:

This adds a Spark Quickstart page that effectively replaces the Spark Getting-Started page in the docs site.

This quickstart uses the tabulario/spark-iceberg docker image and all code snippets aim to be directly copy-pastable and run successfully in a fresh docker-compose up. Furthermore all code snippets are provided with tabs to see the equivalent logic in SQL, Scala, or Python.

@samredai samredai mentioned this pull request Jul 7, 2022
## Spark and Iceberg Quickstart

This guide will get you up and running with an Iceberg and Spark environment, including sample code to
highlight some powerful features. You can learn more about Iceberg's Spark runtime by checking out the [Spark](../docs/latest/spark-ddl/) section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this link correct? It goes to spark-ddl.


Iceberg catalogs support the full range of SQL DDL commands, including:

* [`CREATE TABLE ... PARTITIONED BY`](../spark-ddl#create-table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a short explanation of what these are used for, like "CREATE TABLE ... PARTITIONED BY to index data for queries" and "ALTER TABLE to update table schemas or other config"



{{< hint info >}}
If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE demo;`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says the catalog is called "demo" but it was introduced as "local" in the paragraph above. I think it should be "local" since the rest of the guide uses a pre-configured "demo" catalog.


#### Adding Iceberg to Spark

If you already have a Spark environment, you can add Iceberg, using the `--packages` option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a real deployment, you'd want to add the runtime Jar to your Spark install's jars folder. I think this should say that first (and link to the 3.2 Jar) and then add that you can add Iceberg to a single Spark session using the --packages option.

Also, this should point the reader to catalogs, since that's how you would add a catalog.

- [Adding A Catalog](#adding-a-catalog)
- [Next Steps](#next-steps)

### Docker-Compose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this whole thing in a tab? It would be great to have one option for Spark CLI (e.g., you already have Spark installed) and another tab for this.

The notebook server will be available at [http://localhost:8888](http://localhost:8888)
{{< /hint >}}

### Creating a table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the next few sections should be in a separate higher-level section for interacting with tables. There is a good chance that this is the content that people want to skip to.


### Writing Data to a Table

Once your table is created, you can insert records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sections are the core of the Quickstart guide, but they have hardly any content. I think this quickstart needs to focus on what you can do with Iceberg and how, and less on getting set up. Setup is necessary, but shouldn't be the focus.

@rdblue rdblue merged commit 9fb9e35 into apache:main Jul 7, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 7, 2022

I think that we can improve a few things on this Quickstart, but the structure is basically there, so we can go ahead and make the majority of the changes and come back to improve it over time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants