Conversation
7d4ed36 to
3d28d82
Compare
Can you elaborate more how meta-information from |
|
cc @losipiuk |
Instead of storing this information in a wrapper class ( |
3d28d82 to
9742e6b
Compare
|
Ready for review |
There was a problem hiding this comment.
No need to change the variable name. It is understood that they are serialized pages since they are represented as Slice
There was a problem hiding this comment.
Actually I would vote for renaming (but I do not feel strongly). With more percise variable name it is nicer to read to me and does not require to look at variable type.
There was a problem hiding this comment.
I find reading code with longer and verbose variable names harder (especially when the type name is repeated in the variable name — Hungarian notation?), so I try to avoid them unless it’s not immediately obvious from context what they are, or if the meaning is ambiguous due to other conflicting or similar names
There was a problem hiding this comment.
Oh, I see your point. Since the Slice type represents a "blob" it already suggests that this is rather "something" that is serialized. And to clarify what is "something" naming variables as "page / pages" should be sufficient. Let me change the naming back then.
core/trino-main/src/test/java/io/trino/operator/TestExchangeClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/buffer/PageCodecMarker.java
Outdated
Show resolved
Hide resolved
SerializedPage is a weird abstraction. Although it is already serialized it must be serialized itself to be transferred over the wire. In the pipelined execution the additional serialization is happening behind the scenes when the SerializedPage is being written to an OutputStream directly (field by field). Thus it technically happens at no extra cost. However with fault tolerant execution serialized pages have to be handed over to the external exchange as a Slice (as the external exchange interface is agnostic of the data type being exchange). To create a Slice a second explicit round of serialization is needed that has a cost of one extra memory copy. Removing the SerializedPage abstraction doesn't really have any significant downsides, as the exchange pipeline is already pretty agnostioc of what is being exchanged. The only situation when the exchange pipeline has to understand the content is when it logs the number of rows exchanged. This statistic is somehow questionable, as the number of rows exchanged can be visible at the ExchangeOperator / TaskOutputOperator level.
9742e6b to
0c7e000
Compare
|
Updated (revert changes to variable names) |
SerializedPage is a weird abstraction. Although it is already serialized
it must be serialized itself to be transferred over the wire.
In the pipelined execution the additional serialization is happening behind
the scenes when the SerializedPage is being written to an OutputStream
directly (field by field). Thus it technically happens at no extra cost.
However with fault tolerant execution serialized pages have to be handed
over to the external exchange as a Slice (as the external exchange
interface is agnostic of the data type being exchange). To create a
Slice a second explicit round of serialization is needed that has a cost
of one extra memory copy.
Removing the SerializedPage abstraction doesn't really have any
significant downsides, as the exchange pipeline is already pretty
agnostioc of what is being exchanged. The only situation when the exchange
pipeline has to understand the content is when it logs the number of rows
exchanged. This statistic is somehow questionable, as the number of rows
exchanged can be visible at the ExchangeOperator / TaskOutputOperator
level.