Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with Parallel hints combined with removal of indexes in Index hints #164

Closed
michaelpq opened this issue Nov 6, 2023 · 11 comments
Assignees

Comments

@michaelpq
Copy link
Collaborator

michaelpq commented Nov 6, 2023

A user has reported a regression in the latest release series regarding subject, related to commit 684986a (applies to all stable branches). Imagine two simple tables:

CREATE TABLE bookings (
    book_ref varchar NOT NULL,
    book_date timestamp with time zone NOT NULL,
    total_amount numeric NOT NULL 
); 
CREATE INDEX ON bookings(total_amount);
CREATE UNIQUE INDEX ON bookings(book_ref) INCLUDE (book_date);
ALTER TABLE bookings ADD CONSTRAINT bookings_pkey PRIMARY KEY
    USING INDEX bookings_book_ref_book_date_idx;
ANALYZE bookings;
                                                                                                                                                                           
CREATE TABLE bookings_tmp
    AS SELECT * FROM bookings ORDER BY book_ref;
ALTER TABLE bookings_tmp ADD PRIMARY KEY(book_ref);
ANALYZE bookings_tmp;

Now here is an exotic case:

/*+
    IndexScan(a bookings_total_amount_idx)
    IndexScan(b bookings_tmp_not_exist)
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

Under 1.4.1, we are getting this plan:

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: (a.book_ref = b.book_ref)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Seq Scan on bookings_tmp b
(9 rows)

I think that this is not correct, for two reasons:

  • IndexScan(b bookings_tmp_not_exist) is listed as used, but it does not make sense because it lists an index that does not exist.
  • Why is the sequential scan enforced under a gather? Shouldn't this use an Index Scan under the gather based on the pkey of bookings_tmp? I get that both behavior are debatable, but this looks strange for me as historical behavior.

Now, under 1.4.2, we get this plan:

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: (a.book_ref = b.book_ref)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Seq Scan on bookings_tmp b
(7 rows)

This is better in the fact that IndexScan(b bookings_tmp_not_exist) is marked as unused. Still, this is worse in the fact that the Parallel hint with the number of workers gets entirely ignored, leading to a regression.

I've been considering a couple of options about what to tweak in the case where an index restriction is applied, but my mind comes down to a few conclusions:

  • For now, I would like to choose a revert of 684986a & friends on stable branches, getting back to the previous-still-somewhat-broken behavior.
  • Add a few more regression tests based on what I saw reported back to me here. I'm OK to send a patch about that.
  • @samimseih has mentioned me a potential solution to enforce only a subset of the ENABLE masks for the GUCs controlling if some scans should be used or not, rather than bypass the whole. This slightly impacts the existing plans, so perhaps a revert on HEAD of the original patch makes sense anyway. Or folks here are OK to let the new behavior on HEAD? I mean, it correctly discards indexes, so removing what looks like an improvement in hint detection would be sad, and it has the merit of letting the change brew longer with the release of this project for 17~.
@samimseih
Copy link

Thanks for the report @michaelpq

After we spoke offline, I spent time looking further into this. What I observed is the follows:

  1. skipping over setup_scan_method_enforcement https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3955-L3956 does not prevent the setting up of parallel enforcement https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3986 I confirmed with some additional debugging statements.

  2. Also, Testing the repro case on 1.4.1, I discovered some very inconsistent/odd parallel enforcement behavior that 684986a exposes in 1.4.2, but I do not think is the cause of. See the cases below.

postgres=# \dx
                    List of installed extensions
     Name     | Version |   Schema   |         Description          
--------------+---------+------------+------------------------------
 pg_hint_plan | 1.4.1   | hint_plan  | 
 plpgsql      | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size=0;                                                                               
set max_parallel_workers_per_gather To 4;

I have a repro of the test case mentioned in the thread.

postgres=# /*+                            
    IndexScan(a bookings_total_amount_idx)
    IndexScan(b bookings_tmp_not_exist)
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Seq Scan on bookings_tmp b
(9 rows)

Here is where things get interesting. If only the Parallel hints are supplied, no parallel hint is enforced.

postgres=# /*+                            
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/

EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

                        QUERY PLAN                        
----------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Seq Scan on bookings a
         Filter: (total_amount > 28000.00)
   ->  Seq Scan on bookings_tmp b
(5 rows)

I can force the parallel plan by keeping only index scans enabled. But, notice it's not a Parallel Seq scan on "b", but rather a Parallel Index Scan.

set enable_mergejoin = off;
set enable_seqscan = off;
set enable_hashjoin = off;
SET
SET
SET
postgres=# /*+                            
    Parallel(a 6 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Gather
         Workers Planned: 6
         ->  Parallel Index Scan using bookings_pkey on bookings a
               Filter: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Index Scan using bookings_tmp_pkey on bookings_tmp b
(9 rows)

It is clear that parallel enforcement, including the "hard" option, is not showing the expected results even in 1.4.1

thoughts?

Regards,

Sami

@samimseih
Copy link

For now, I would like to choose a revert of 684986a & friends on stable branches, getting back to the previous-still-somewhat-broken behavior.

With the above said, I do agree to revert the patch on stable branches, since the existing behavior, while not great, is what the users expect.

@michaelpq
Copy link
Collaborator Author

It is clear that parallel enforcement, including the "hard" option, is not showing the expected results even in 1.4.1.

Yeah. The fact that we would completely discard the Parallel hints when mixing more than one table sounds strange to me.

At this stage, I think that we'd better document all that in the shape of tests where one could look at a regression.diffs to see the difference generated when tweaking the code, or development is not going to scale well. I'm OK if we add comments like "strangely, this ignores both Parallel hints, perhaps these should be pushed under two Gather nodes". Or stuff like that.

I'll go put the stable branches in their previous stable with a revert, as I doubt that we've appreciated all the plan manipulations possible yet. Changing that in a major release is less or an issue to me. At this stage I'd rather prioritize plan stability to not impact users across these minor updated.

@samimseih
Copy link

I agree. I'll submit a patch for Parallel Hint tests ( including those mentioned above ).

michaelpq added a commit that referenced this issue Nov 7, 2023
This reverts commit a3646e1, that we have found to do more weird
plan manipulations by ignoring in some cases Parallel hints when they
should not.

The behavior before this commit has also some historical weirdness, but
impacting the stability of plans is not a good idea in stable branches,
so for now this change is reverted.  Additional regression tests will be
added on HEAD to track all the behaviors around parallel hints that have
been seen, while keeping the new plan changes only for 17~ and new major
release integration.

Per discussion on #164.

Backpatch-through: 13
michaelpq added a commit that referenced this issue Nov 7, 2023
This reverts commit a3646e1, that we have found to do more weird
plan manipulations by ignoring in some cases Parallel hints when they
should not.

The behavior before this commit has also some historical weirdness, but
impacting the stability of plans is not a good idea in stable branches,
so for now this change is reverted.  Additional regression tests will be
added on HEAD to track all the behaviors around parallel hints that have
been seen, while keeping the new plan changes only for 17~ and new major
release integration.

Per discussion on #164.

Backpatch-through: 13
michaelpq added a commit that referenced this issue Nov 7, 2023
This reverts commit a3646e1, that we have found to do more weird
plan manipulations by ignoring in some cases Parallel hints when they
should not.

The behavior before this commit has also some historical weirdness, but
impacting the stability of plans is not a good idea in stable branches,
so for now this change is reverted.  Additional regression tests will be
added on HEAD to track all the behaviors around parallel hints that have
been seen, while keeping the new plan changes only for 17~ and new major
release integration.

Per discussion on #164.

Backpatch-through: 13
michaelpq added a commit that referenced this issue Nov 7, 2023
This reverts commit a3646e1, that we have found to do more weird
plan manipulations by ignoring in some cases Parallel hints when they
should not.

The behavior before this commit has also some historical weirdness, but
impacting the stability of plans is not a good idea in stable branches,
so for now this change is reverted.  Additional regression tests will be
added on HEAD to track all the behaviors around parallel hints that have
been seen, while keeping the new plan changes only for 17~ and new major
release integration.

Per discussion on #164.

Backpatch-through: 13
@michaelpq michaelpq self-assigned this Nov 7, 2023
@samimseih
Copy link

samimseih commented Nov 10, 2023

Attached are tests that cover several of the gaps discovered in this discussion:

1/ Ensure that Parallel enforcement does not break when a non-existent index is used in an IndexScan hint.
2/ Demonstrate that Parallel hints cannot be enforced on empty tables.

0001-v1-Add-additional-parallel-hint-regression-tests.patch

@michaelpq
Copy link
Collaborator Author

Thanks for the patch.

The patch you are proposing for the additional tests has no explanation about what's being done, like the interactions between the non-existing indexes and the hints. It may be good to document now with some big XXX that some of these plans are weird because of YYY.

I am not sure that it is a good idea to use and empty the existing tables t5 and t6 for these new tests. Perhaps splitting that into a new file would make more sense?

@samimseih
Copy link

The patch you are proposing for the additional tests has no explanation about what's being done,

I hesitated doing that as it's not the norm to add much documentation to the tests, but that is not a good reason not to. I will add a better description.

I am not sure that it is a good idea to use and empty the existing tables t5 and t6 for these new tests. Perhaps splitting that into a new file would make more sense?

I am creating new tables for these tests in the sql/ut-init.sql file. I did not think it was worth adding a new init and test files for these few tests.

+CREATE TABLE s1.t5 (LIKE s1.t1 INCLUDING ALL);
+CREATE TABLE s1.t6 (LIKE s1.t1 INCLUDING ALL);

@samimseih
Copy link

After second thought, showing that the non-existing index does not prevent parallel hint enforcement is not necessary. So, the test now shows the important part. For empty tables, parallel hints can only be enforced on index scans and not sequential scans.

Attached is the revised test.

Regards,

Sami
0001-Add-Parallel-hint-tests-for-empty-tables.patch

@michaelpq
Copy link
Collaborator Author

a3646e1 still exists on HEAD and now PG17. Now is the time to do a re-evaluation of what has been committed, knowing the reasons why the patch had to be backed out of past stable branches.

@samimseih
Copy link

samimseih commented Jul 23, 2024

Looking at the discussion above, 684986a exposed pre-existing odd behavior with parallel enforcement. Especially, if there are no rows in a table, parallel sequential scans cannot be enforced. This scenario is covered by the suggested test case [1]. Do you think there is value in these test?

In the case mentioned at the top of the thread, 1.4.1 is able to perform a parallel sequential scan on an empty table by "accident". Because an IndexScan hint is used on bookings_tmp_not_exist, the sequential scan is disabled by pg_hint_plan ( enable_seqscan = off ). However, because bookings_tmp_not_exist is not an existing index, this leaves Postgres with only the sequential scan access method. The use of parallel with the sequential scan is surprising. I was not able to reproduce the same plan by forcing parallelism.

set max_parallel_workers_per_gather To 4;  
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size=0;                                                                             
                                                                                                                                                                                                
EXPLAIN SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

I think 684986a is a good improvement and I am not sure if the pre-existing behavior is the behavior that we want to maintain for the next major version release.

[1] https://github.com/ossc-db/pg_hint_plan/files/13387722/0001-Add-Parallel-hint-tests-for-empty-tables.patch

michaelpq added a commit that referenced this issue Aug 20, 2024
As a result of the discussion with Sami Imseih, this commit adds tests
showing that for empty tables, only parallel index scans can be enforced
and not sequential scans.  This is an old, historical, rather weird
behavior, that may not be worth changing in the long run as empry tables
are not something users rely a lot when using pg_hint_plan, but let's
add a test tracking if any future change impacts the plans generated.

Author: Sami Imseih
Backpatch-through: 17

Per discussion on issue #164.
michaelpq added a commit that referenced this issue Aug 20, 2024
As a result of the discussion with Sami Imseih, this commit adds tests
showing that for empty tables, only parallel index scans can be enforced
and not sequential scans.  This is an old, historical, rather weird
behavior, that may not be worth changing in the long run as empry tables
are not something users rely a lot when using pg_hint_plan, but let's
add a test tracking if any future change impacts the plans generated.

Author: Sami Imseih
Backpatch-through: 17

Per discussion on issue #164.
@michaelpq
Copy link
Collaborator Author

Looking at the discussion above, 684986a exposed pre-existing odd behavior with parallel enforcement. Especially, if there are no rows in a table, parallel sequential scans cannot be enforced. This scenario is covered by the suggested test case [1]. Do you think there is value in these test?

I'm sold to it. It's really a weird behavior, THB, but we've had that for a long time and nobody has complained about it, either. There is a good argument for changing that in the long run, perhaps, and make sure that the workers triggered work the same way without the need for the NoSeqScan hints. In any case, such a test will be able to track if the behavior changes. I've done that with e92d75f, and now let's close this issue.

I think 684986a is a good improvement and I am not sure if the pre-existing behavior is the behavior that we want to maintain for the next major version release.

I'm OK to treat that as a new feature in PG17. We'll see based on the feedback, and impacting plans after a major version upgrade stresses me less than for a minor upgrade. Postgres itself has breakages every year after its majors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants