-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes, but overall looks good.
if (stats_materialize_fn_) { | ||
stats_materialize_fn_(chunk_stats_); | ||
StatsMaterializeFn().swap(stats_materialize_fn_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some logging or fatal error for failure mode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect any logging and exceptions to be in the materializer because that's where synchronization happens. But I don't expect stats computation ever fail because we can always use the default stats.
@@ -82,7 +80,6 @@ class FragmentInfo { | |||
mutable size_t numTuples; | |||
mutable ChunkMetadataMap chunkMetadataMap; | |||
mutable bool synthesizedNumTuplesIsValid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to remove this -- the invalidate calls were used for update/delete, which is all gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it's not used anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was wrong. It is removed in later patches when we don't store result sets there anymore.
TableFragmentsInfo synthesize_table_info(hdk::ResultSetTableTokenPtr token) { | ||
std::vector<FragmentInfo> result; | ||
bool non_empty = false; | ||
for (int frag_id = 0; frag_id < token->resultSetCount(); ++frag_id) { | ||
for (int frag_id = 0; frag_id < (int)token->resultSetCount(); ++frag_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have always preferred static_cast
, but I wouldn't change it if that's the only change you need to make in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: ienkovich <[email protected]>
1bc6350
to
854d8d1
Compare
This PR adds lazy chunk stats to ChunkMetadata.
Currently, we may have lazily computed
ChunkMetadataMap
inFragmentInfo
when it holds aResultSet
. Here I move laziness intoChunkMetadata
to allow more granular stats computation (i.e. compute stats for some columns only) and usestd::function
instead ofResultSet
to potentially utilize it later forArrowStorage
. Also, only stats are computed lazily now, not the wholeChunkMetadata
. I'm not sure if we need laziness fornumElements
andnumBytes
. Computing the number of rows for ResultSet can take some time, but it is inevitable anyway so we can do it on import.