Skip to content

Docs: Clarify merge-on-read modes in docs for spark#4223

Closed
ajantha-bhat wants to merge 1 commit intoapache:masterfrom
ajantha-bhat:mkdocs
Closed

Docs: Clarify merge-on-read modes in docs for spark#4223
ajantha-bhat wants to merge 1 commit intoapache:masterfrom
ajantha-bhat:mkdocs

Conversation

@ajantha-bhat
Copy link
Copy Markdown
Member

Currently iceberg-spark documentation doesn't talk about merge-on-read mode. Hence the PR.

@github-actions github-actions bot added the docs label Feb 25, 2022
@ajantha-bhat ajantha-bhat changed the title Clarify merge-on-read modes in docs for spark Docs: Clarify merge-on-read modes in docs for spark Feb 25, 2022
!!! Note
By default Spark uses copy-on-write merge mode.
With spark-3.2 and onwards, iceberg supports merge-on-read mode.
To use merge-on-read merge mode, need to set the table property `write.merge.mode` to "merge-on-read".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add that format-version needs to be v2?

Also nit: I think 'need to' is redundant, removing it seems more consistent with the rest of docs


Only one record in the source data can update any given row of the target table, or else an error will be thrown.

!!! Note
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the three !!! renders anything? (I dont see it when viewing the file as md). Should we just use a normal sub-header here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes this to render in a text box. I would prefer having a full section on the copy-on-write vs merge-on-read distinction and then refer to it from the individual command sections.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a PR for that already,
#3432


!!! Note
By default Spark uses copy-on-write merge mode.
With spark-3.2 and onwards, iceberg supports merge-on-read mode.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what everyone thinks, but seems there's not much context. Should we explain what it means (that it writes v2 delete files)?

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Feb 25, 2022

@samredai FYI

Copy link
Copy Markdown
Contributor

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I had one grammar nit

!!! Note
By default Spark uses copy-on-write merge mode.
With spark-3.2 and onwards, iceberg supports merge-on-read mode.
To use merge-on-read merge mode, need to set the table property `write.merge.mode` to "merge-on-read".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit: "need to set the table property" can just say "set the table property". Same comment for the other note boxes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@samredai
Copy link
Copy Markdown
Contributor

samredai commented Feb 28, 2022

These properties like write.merge.mode also need to be added to the table configuration page right?

Nvm, it's already added but just not deployed yet!

@ajantha-bhat
Copy link
Copy Markdown
Member Author

@samredai , @rdblue , @szehon-ho : This is commonly asked question in the Iceberg slack workspace. Today also someone asking this.
So, I feel we should merge this and update the site docs and provide more context if required after #3432 is merged.

@samredai
Copy link
Copy Markdown
Contributor

I feel we should merge this and update the site docs and provide more context if required after #3432 is merged

+1

Copy link
Copy Markdown
Member

@liuml07 liuml07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clear and simple. Thanks


!!! Note
By default Spark uses copy-on-write update mode.
With spark-3.2 and onwards, iceberg supports merge-on-read mode.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/iceberg/Iceberg/

s/merge-on-read mode/merge-on-read update mode/

Same other places.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Thanks

@ajantha-bhat
Copy link
Copy Markdown
Member Author

closing as stale. Might take another shot at it in the future based on the need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants