Skip to content

Iceberg reuse information from table handle#14079

Merged
findepi merged 4 commits intotrinodb:masterfrom
findinpath:iceberg-reuse-information-from-table-handle
Sep 19, 2022
Merged

Iceberg reuse information from table handle#14079
findepi merged 4 commits intotrinodb:masterfrom
findinpath:iceberg-reuse-information-from-table-handle

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

Description

Instead of loading the table from the catalog, reuse as much as possible the information already packed into the IcebergTableHandle

Non-technical explanation

Internal optimizations

Release notes

(x) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 9, 2022
@findinpath findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from 11e63d1 to a31ce06 Compare September 9, 2022 20:10
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 12, 2022

Iceberg tests are failing

@findinpath findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from a31ce06 to b08b9da Compare September 12, 2022 11:10
@findinpath
Copy link
Copy Markdown
Contributor Author

Related discussion:

#14076 (comment)

@findinpath findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from 42d1bdd to ad2752e Compare September 14, 2022 11:51
IcebergTableHandle table = (IcebergTableHandle) tableHandle;
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());
return getWriteLayout(icebergTable.schema(), icebergTable.spec(), false);
Schema schema = SchemaParser.fromJson(table.getTableSchemaJson());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaParser.fromJson isn't very expensive because it's cached.
If it was not cached, it could be expensive.

catalog.loadTable isn't very expensive because it's cached.
if it was not cached, we could have query consistency issues (like one query accessing same table twice, and reading different versions)

how do we assess which one is actually better?
it sounds that in any case we rely on some (hidden) caching taking place.

Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find SchemaParser.fromJson(tableHandle.getSchemaJson()) less hidden than the loadTable option. The fact that it's fast is hidden but the fact that it's the correct schema is obvious. To me the loadTable option needs an extra step to reason that it is correct.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but don't we rely on caching catalog.loadTable anyway?
e.g. we want to read consistent table version in self-join query case.
So this is not only about schemas (which we can and do carry in the table handle), but also about other state.

I think i tend to agree that -- if we carry info in a table handle, this is "the version of information" to be used

@findepi findepi merged commit 6acfd82 into trinodb:master Sep 19, 2022
@github-actions github-actions bot added this to the 397 milestone Sep 19, 2022
@findinpath findinpath self-assigned this Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants