Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to plan based on chunk table AM #7284

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Sep 19, 2024

A chunk can use different table access methods so to support this more easily, cache the AM oid in the chunk struct. This allows identifying the access method quickly at, e.g., planning time.

Disable-check: force-changelog-file

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.

Project coverage is 92.23%. Comparing base (59f50f2) to head (069bd67).
Report is 351 commits behind head on main.

Files with missing lines Patch % Lines
src/utils.c 88.00% 3 Missing ⚠️
tsl/src/planner.c 84.61% 2 Missing ⚠️
src/planner/planner.h 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #7284       +/-   ##
===========================================
+ Coverage   80.06%   92.23%   +12.17%     
===========================================
  Files         190      205       +15     
  Lines       37181    38471     +1290     
  Branches     9450     9977      +527     
===========================================
+ Hits        29770    35485     +5715     
+ Misses       2997     2983       -14     
+ Partials     4414        3     -4411     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/chunk.c Outdated Show resolved Hide resolved
@akuzm
Copy link
Member

akuzm commented Sep 19, 2024

I think we can merge much more parts of the TAM planning in this PR in the following way:

  1. Port the check for hyperstore Oid as is, just make it get_am_oid("hyperstore", /* missing_ok = */ true), so that it works even w/o the TAM
  2. Would be good to cache the TAM oid the same way as extension oid.
  3. Port the ts_relation_uses_hyperstore(Oid relid) as is. The cached TAM oid will be InvalidOid until we implement the acutal TAM, so the function will return false.
  4. Port the code that uses this function where it makes sense. I think it's OK if some of it becomes dead code, because we'll be testing it in the TAM PR anyway.

This way we can actually test the place where I saw the regression in this PR, which was my original motivation to suggest splitting it. And also remove much more changes from the TAM PR. What do you think?

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty straightforward. See no problems with this, it should only improve the situation, so approving, but I wonder if it makes any difference at all.

Potential risk is that the cache is invalidated during execution of the query, but this should not happen since it would require at least a ShareUpdateExclusiveLock, so should be properly serialized.

The safest is to use the PostgreSQL cache each time, which does the right thing regardless of situation, at the risk of wasting some cycles if you fetch the chunk frequently. However, chunk look-ups should not be frequent, so this should not be a big problem, so I wonder if this commit makes any difference at all for anything but very specific benchmarks with cold caches.

@erimatnor erimatnor force-pushed the cache-am-in-chunk branch 2 times, most recently from 3c4aef4 to 74fb72c Compare September 20, 2024 10:43
@akuzm
Copy link
Member

akuzm commented Sep 20, 2024

I wonder if it makes any difference at all.

Of course it makes a difference, the reason we're doing this is that I found a 4% regression on a particular query. In general, the hypertable expansion is one of the most performance critical areas we have, so your intuition that an extra catalog lookup there costs nothing is wrong. We've spend a lot of effort on optimizing it in every possible way, including reducing the catalog lookups, and before that a random select * would plan for a second. So we should make the new code reasonably efficient as well, by using the existing caching infrastructure.

@erimatnor erimatnor force-pushed the cache-am-in-chunk branch 2 times, most recently from 6242428 to 3bd5ba9 Compare September 20, 2024 11:13
@erimatnor
Copy link
Contributor Author

I think we can merge much more parts of the TAM planning in this PR in the following way:

1. Port the check for hyperstore Oid as is, just make it `get_am_oid("hyperstore", /* missing_ok = */ true)`, so that it works even w/o the TAM

2. Would be good to cache the TAM oid the same way as extension oid.

3. Port the `ts_relation_uses_hyperstore(Oid relid)` as is. The cached TAM oid will be InvalidOid until we implement the acutal TAM, so the function will return false.

4. Port the code that uses this function where it makes sense. I think it's OK if some of it becomes dead code, because we'll be testing it in the TAM PR anyway.

This way we can actually test the place where I saw the regression in this PR, which was my original motivation to suggest splitting it. And also remove much more changes from the TAM PR. What do you think?

I added the planner hook code, and also did some refactoring there because the checks got a little unwieldy. Let me know what you think. I don't think it should hurt anything performance wise but we'll check with tsbench.

I am not too fond of having unused skeleton code without anything that is there to use it. But the code is there now anyway so we can assess any impact as you wished.

@mkindahl
Copy link
Contributor

I wonder if it makes any difference at all.

Of course it makes a difference, the reason we're doing this is that I found a 4% regression on a particular query.

It would be interesting to understand what query that is and the variance in execution time for that query.

In general, the hypertable expansion is one of the most performance critical areas we have, so your intuition that an extra catalog lookup there costs nothing is wrong.

Hypertable expansion is done just once for each query, so this makes me wonder what kind of query you are running?

How many catalog lookups are done for the query?

Do you have a lot of chunks in the query?

What is the execution time of the query?

We've spend a lot of effort on optimizing it in every possible way, including reducing the catalog lookups, and before that a random select * would plan for a second. So we should make the new code reasonably efficient as well, by using the existing caching infrastructure.

As I said in the review, I see no problems in merging this since I think it is at best an improvement of execution time and at worst makes no difference. Not sure what you're objecting to here.

@erimatnor erimatnor changed the title Cache table AM in Chunk struct Add ability to plan based on chunk table AM Sep 20, 2024
src/chunk.c Outdated Show resolved Hide resolved
tsl/src/planner.c Outdated Show resolved Hide resolved
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, I've redone the benchmark manually and now the performance is the same as on main branch (ordered append planning suite, e.g. SELECT * FROM ht_chunk_1k ORDER BY time DESC LIMIT 1;)

A chunk can use different table access methods so to support this more
easily, cache the AM oid in the chunk struct. This allows identifying
the access method quickly at, e.g., planning time.
@erimatnor
Copy link
Contributor Author

Thanks for the changes, I've redone the benchmark manually and now the performance is the same as on main branch (ordered append planning suite, e.g. SELECT * FROM ht_chunk_1k ORDER BY time DESC LIMIT 1;)

Ok, that's good.

@akuzm FYI, I did a small additional change to src/import/allpaths.c to apply the same changes there as in other places. It should not change the performance profile, but you might want to just throw a quick eye at that.

@erimatnor erimatnor merged commit ca0a62b into timescale:main Sep 20, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants