Add ability to plan based on chunk table AM #7284

erimatnor · 2024-09-19T19:21:17Z

A chunk can use different table access methods so to support this more easily, cache the AM oid in the chunk struct. This allows identifying the access method quickly at, e.g., planning time.

Disable-check: force-changelog-file

codecov · 2024-09-19T19:31:41Z

Codecov Report

Attention: Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.

Project coverage is 92.23%. Comparing base (59f50f2) to head (069bd67).
Report is 351 commits behind head on main.

Files with missing lines	Patch %	Lines
src/utils.c	88.00%	3 Missing ⚠️
tsl/src/planner.c	84.61%	2 Missing ⚠️
src/planner/planner.h	88.88%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #7284       +/-   ##
===========================================
+ Coverage   80.06%   92.23%   +12.17%     
===========================================
  Files         190      205       +15     
  Lines       37181    38471     +1290     
  Branches     9450     9977      +527     
===========================================
+ Hits        29770    35485     +5715     
+ Misses       2997     2983       -14     
+ Partials     4414        3     -4411

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/chunk.c

akuzm · 2024-09-19T22:24:44Z

I think we can merge much more parts of the TAM planning in this PR in the following way:

Port the check for hyperstore Oid as is, just make it get_am_oid("hyperstore", /* missing_ok = */ true), so that it works even w/o the TAM
Would be good to cache the TAM oid the same way as extension oid.
Port the ts_relation_uses_hyperstore(Oid relid) as is. The cached TAM oid will be InvalidOid until we implement the acutal TAM, so the function will return false.
Port the code that uses this function where it makes sense. I think it's OK if some of it becomes dead code, because we'll be testing it in the TAM PR anyway.

This way we can actually test the place where I saw the regression in this PR, which was my original motivation to suggest splitting it. And also remove much more changes from the TAM PR. What do you think?

mkindahl

Pretty straightforward. See no problems with this, it should only improve the situation, so approving, but I wonder if it makes any difference at all.

Potential risk is that the cache is invalidated during execution of the query, but this should not happen since it would require at least a ShareUpdateExclusiveLock, so should be properly serialized.

The safest is to use the PostgreSQL cache each time, which does the right thing regardless of situation, at the risk of wasting some cycles if you fetch the chunk frequently. However, chunk look-ups should not be frequent, so this should not be a big problem, so I wonder if this commit makes any difference at all for anything but very specific benchmarks with cold caches.

akuzm · 2024-09-20T10:59:34Z

I wonder if it makes any difference at all.

Of course it makes a difference, the reason we're doing this is that I found a 4% regression on a particular query. In general, the hypertable expansion is one of the most performance critical areas we have, so your intuition that an extra catalog lookup there costs nothing is wrong. We've spend a lot of effort on optimizing it in every possible way, including reducing the catalog lookups, and before that a random select * would plan for a second. So we should make the new code reasonably efficient as well, by using the existing caching infrastructure.

erimatnor · 2024-09-20T11:19:44Z

I think we can merge much more parts of the TAM planning in this PR in the following way:
1. Port the check for hyperstore Oid as is, just make it `get_am_oid("hyperstore", /* missing_ok = */ true)`, so that it works even w/o the TAM

2. Would be good to cache the TAM oid the same way as extension oid.

3. Port the `ts_relation_uses_hyperstore(Oid relid)` as is. The cached TAM oid will be InvalidOid until we implement the acutal TAM, so the function will return false.

4. Port the code that uses this function where it makes sense. I think it's OK if some of it becomes dead code, because we'll be testing it in the TAM PR anyway.
This way we can actually test the place where I saw the regression in this PR, which was my original motivation to suggest splitting it. And also remove much more changes from the TAM PR. What do you think?

I added the planner hook code, and also did some refactoring there because the checks got a little unwieldy. Let me know what you think. I don't think it should hurt anything performance wise but we'll check with tsbench.

I am not too fond of having unused skeleton code without anything that is there to use it. But the code is there now anyway so we can assess any impact as you wished.

mkindahl · 2024-09-20T11:44:08Z

I wonder if it makes any difference at all.

Of course it makes a difference, the reason we're doing this is that I found a 4% regression on a particular query.

It would be interesting to understand what query that is and the variance in execution time for that query.

In general, the hypertable expansion is one of the most performance critical areas we have, so your intuition that an extra catalog lookup there costs nothing is wrong.

Hypertable expansion is done just once for each query, so this makes me wonder what kind of query you are running?

How many catalog lookups are done for the query?

Do you have a lot of chunks in the query?

What is the execution time of the query?

We've spend a lot of effort on optimizing it in every possible way, including reducing the catalog lookups, and before that a random select * would plan for a second. So we should make the new code reasonably efficient as well, by using the existing caching infrastructure.

As I said in the review, I see no problems in merging this since I think it is at best an improvement of execution time and at worst makes no difference. Not sure what you're objecting to here.

src/chunk.c

tsl/src/planner.c

akuzm

Thanks for the changes, I've redone the benchmark manually and now the performance is the same as on main branch (ordered append planning suite, e.g. SELECT * FROM ht_chunk_1k ORDER BY time DESC LIMIT 1;)

A chunk can use different table access methods so to support this more easily, cache the AM oid in the chunk struct. This allows identifying the access method quickly at, e.g., planning time.

erimatnor · 2024-09-20T16:25:27Z

Thanks for the changes, I've redone the benchmark manually and now the performance is the same as on main branch (ordered append planning suite, e.g. SELECT * FROM ht_chunk_1k ORDER BY time DESC LIMIT 1;)

Ok, that's good.

@akuzm FYI, I did a small additional change to src/import/allpaths.c to apply the same changes there as in other places. It should not change the performance profile, but you might want to just throw a quick eye at that.

erimatnor requested review from mkindahl, antekresic and akuzm September 19, 2024 19:21

akuzm reviewed Sep 19, 2024

View reviewed changes

src/chunk.c Outdated Show resolved Hide resolved

mkindahl approved these changes Sep 20, 2024

View reviewed changes

erimatnor force-pushed the cache-am-in-chunk branch 2 times, most recently from 3c4aef4 to 74fb72c Compare September 20, 2024 10:43

erimatnor force-pushed the cache-am-in-chunk branch 2 times, most recently from 6242428 to 3bd5ba9 Compare September 20, 2024 11:13

erimatnor force-pushed the cache-am-in-chunk branch from 3bd5ba9 to 498b721 Compare September 20, 2024 11:26

erimatnor force-pushed the cache-am-in-chunk branch from 498b721 to 31ae80f Compare September 20, 2024 11:58

erimatnor changed the title ~~Cache table AM in Chunk struct~~ Add ability to plan based on chunk table AM Sep 20, 2024

erimatnor requested a review from akuzm September 20, 2024 11:59

akuzm reviewed Sep 20, 2024

View reviewed changes

src/chunk.c Outdated Show resolved Hide resolved

akuzm reviewed Sep 20, 2024

View reviewed changes

tsl/src/planner.c Outdated Show resolved Hide resolved

erimatnor force-pushed the cache-am-in-chunk branch from 31ae80f to 505feb0 Compare September 20, 2024 12:31

akuzm approved these changes Sep 20, 2024

View reviewed changes

Add ability to plan based on chunk table AM

069bd67

A chunk can use different table access methods so to support this more easily, cache the AM oid in the chunk struct. This allows identifying the access method quickly at, e.g., planning time.

erimatnor force-pushed the cache-am-in-chunk branch from 505feb0 to 069bd67 Compare September 20, 2024 16:23

erimatnor merged commit ca0a62b into timescale:main Sep 20, 2024
38 checks passed

akuzm mentioned this pull request Sep 25, 2024

Table access method for compressed hypertables #7104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to plan based on chunk table AM #7284

Add ability to plan based on chunk table AM #7284

erimatnor commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

akuzm commented Sep 19, 2024

mkindahl left a comment

akuzm commented Sep 20, 2024

erimatnor commented Sep 20, 2024

mkindahl commented Sep 20, 2024

akuzm left a comment

erimatnor commented Sep 20, 2024

Add ability to plan based on chunk table AM #7284

Add ability to plan based on chunk table AM #7284

Conversation

erimatnor commented Sep 19, 2024 • edited Loading

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

akuzm commented Sep 19, 2024

mkindahl left a comment

Choose a reason for hiding this comment

akuzm commented Sep 20, 2024

erimatnor commented Sep 20, 2024

mkindahl commented Sep 20, 2024

akuzm left a comment

Choose a reason for hiding this comment

erimatnor commented Sep 20, 2024

erimatnor commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading