Skip to content

Remove SerializedPage#10241

Merged
martint merged 1 commit intotrinodb:masterfrom
arhimondr:remove-serialized-page
Jan 4, 2022
Merged

Remove SerializedPage#10241
martint merged 1 commit intotrinodb:masterfrom
arhimondr:remove-serialized-page

Conversation

@arhimondr
Copy link
Copy Markdown
Contributor

SerializedPage is a weird abstraction. Although it is already serialized
it must be serialized itself to be transferred over the wire.

In the pipelined execution the additional serialization is happening behind
the scenes when the SerializedPage is being written to an OutputStream
directly (field by field). Thus it technically happens at no extra cost.

However with fault tolerant execution serialized pages have to be handed
over to the external exchange as a Slice (as the external exchange
interface is agnostic of the data type being exchange). To create a
Slice a second explicit round of serialization is needed that has a cost
of one extra memory copy.

Removing the SerializedPage abstraction doesn't really have any
significant downsides, as the exchange pipeline is already pretty
agnostioc of what is being exchanged. The only situation when the exchange
pipeline has to understand the content is when it logs the number of rows
exchanged. This statistic is somehow questionable, as the number of rows
exchanged can be visible at the ExchangeOperator / TaskOutputOperator
level.

@cla-bot cla-bot bot added the cla-signed label Dec 9, 2021
@arhimondr arhimondr force-pushed the remove-serialized-page branch from 7d4ed36 to 3d28d82 Compare December 9, 2021 15:46
@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 10, 2021

Removing the SerializedPage abstraction doesn't really have any
significant downsides,

Can you elaborate more how meta-information from SerializedPage is going to be handled now. like pageCodecMarkers?
I think this can be the reason why we have SerializedPage abstraction.

@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 10, 2021

cc @losipiuk

@arhimondr
Copy link
Copy Markdown
Contributor Author

@findepi

Can you elaborate more how meta-information from SerializedPage is going to be handled now. like pageCodecMarkers?
I think this can be the reason why we have SerializedPage abstraction.

Instead of storing this information in a wrapper class (SerializePage) this information can be stored directly in the Slice that contains the serialized page itself.

@arhimondr arhimondr force-pushed the remove-serialized-page branch from 3d28d82 to 9742e6b Compare December 22, 2021 20:23
@arhimondr arhimondr changed the title [WIP] Remove SerializedPage Remove SerializedPage Dec 22, 2021
@arhimondr arhimondr requested a review from linzebing December 22, 2021 20:23
@arhimondr
Copy link
Copy Markdown
Contributor Author

Ready for review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the variable name. It is understood that they are serialized pages since they are represented as Slice

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I would vote for renaming (but I do not feel strongly). With more percise variable name it is nicer to read to me and does not require to look at variable type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find reading code with longer and verbose variable names harder (especially when the type name is repeated in the variable name — Hungarian notation?), so I try to avoid them unless it’s not immediately obvious from context what they are, or if the meaning is ambiguous due to other conflicting or similar names

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see your point. Since the Slice type represents a "blob" it already suggests that this is rather "something" that is serialized. And to clarify what is "something" naming variables as "page / pages" should be sufficient. Let me change the naming back then.

Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

SerializedPage is a weird abstraction. Although it is already serialized
it must be serialized itself to be transferred over the wire.

In the pipelined execution the additional serialization is happening behind
the scenes when the SerializedPage is being written to an OutputStream
directly (field by field). Thus it technically happens at no extra cost.

However with fault tolerant execution serialized pages have to be handed
over to the external exchange as a Slice (as the external exchange
interface is agnostic of the data type being exchange). To create a
Slice a second explicit round of serialization is needed that has a cost
of one extra memory copy.

Removing the SerializedPage abstraction doesn't really have any
significant downsides, as the exchange pipeline is already pretty
agnostioc of what is being exchanged. The only situation when the exchange
pipeline has to understand the content is when it logs the number of rows
exchanged. This statistic is somehow questionable, as the number of rows
exchanged can be visible at the ExchangeOperator / TaskOutputOperator
level.
@arhimondr arhimondr force-pushed the remove-serialized-page branch from 9742e6b to 0c7e000 Compare January 3, 2022 17:36
@arhimondr
Copy link
Copy Markdown
Contributor Author

arhimondr commented Jan 3, 2022

Updated (revert changes to variable names)

@martint martint merged commit fb073f9 into trinodb:master Jan 4, 2022
@github-actions github-actions bot added this to the 368 milestone Jan 4, 2022
@arhimondr arhimondr deleted the remove-serialized-page branch January 4, 2022 19:55
@arhimondr arhimondr mentioned this pull request Jan 4, 2022
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants