Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Add ResultSetRegistry storage [2/N] #348

Merged
merged 1 commit into from
Apr 5, 2023
Merged

Conversation

ienkovich
Copy link
Contributor

This PR adds lazy chunk stats to ChunkMetadata.

Currently, we may have lazily computed ChunkMetadataMap in FragmentInfo when it holds a ResultSet. Here I move laziness into ChunkMetadata to allow more granular stats computation (i.e. compute stats for some columns only) and use std::function instead of ResultSet to potentially utilize it later for ArrowStorage. Also, only stats are computed lazily now, not the whole ChunkMetadata. I'm not sure if we need laziness for numElements and numBytes. Computing the number of rows for ResultSet can take some time, but it is inevitable anyway so we can do it on import.

Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes, but overall looks good.

if (stats_materialize_fn_) {
stats_materialize_fn_(chunk_stats_);
StatsMaterializeFn().swap(stats_materialize_fn_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some logging or fatal error for failure mode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect any logging and exceptions to be in the materializer because that's where synchronization happens. But I don't expect stats computation ever fail because we can always use the default stats.

@@ -82,7 +80,6 @@ class FragmentInfo {
mutable size_t numTuples;
mutable ChunkMetadataMap chunkMetadataMap;
mutable bool synthesizedNumTuplesIsValid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to remove this -- the invalidate calls were used for update/delete, which is all gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's not used anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was wrong. It is removed in later patches when we don't store result sets there anymore.

TableFragmentsInfo synthesize_table_info(hdk::ResultSetTableTokenPtr token) {
std::vector<FragmentInfo> result;
bool non_empty = false;
for (int frag_id = 0; frag_id < token->resultSetCount(); ++frag_id) {
for (int frag_id = 0; frag_id < (int)token->resultSetCount(); ++frag_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always preferred static_cast, but I wouldn't change it if that's the only change you need to make in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ienkovich ienkovich force-pushed the ienkovich/rs-registry-02 branch from 1bc6350 to 854d8d1 Compare April 4, 2023 22:08
@ienkovich ienkovich merged commit 442a1a8 into main Apr 5, 2023
@ienkovich ienkovich deleted the ienkovich/rs-registry-02 branch April 5, 2023 16:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants