WIP Fix issue #7674 about UPDATE SET(..), with indirection #7675

c2main · 2024-08-23T14:46:19Z

Issue reported to Data Bene support. See #7674

The fix looks good, at least it breaks nothing and the bug not visible.

I am unsure to add all regressions tests where I did, maybe their own file?

Also the PR is not yet ready regarding citus_indent, but if you can already provide some feedback.

Queries of the form: ``` UPDATE ... SET (a, b) = (SELECT '1', '2') UPDATE ... SET (b, a) = (SELECT '2', '1') ``` Should do the same thing, but currently the order of the attributes in `SET (...)` as rewriten for pushdown is only based on the physical ordering of the attributes in the relation. This leads to several subtle problems including situation where a DROPped then reADDed attributes will change its placement in the attribute list. There are maybe more tests to add in other situation where a SET (MULTIEXPR) is possible, though UPDATE form is pretty unique as alternatives are not supported by citus: `(INSERT .. ON CONFLICT SET (a,b).....`

ruleutils in Citus is based on PostgreSQL source code, but in PostgreSQL ruleutils is not used at the planner stage. For instance, it is assumed after parser that targetList are ordered as they were read, but it's not true after rewriter, the resulting rewrite tree is then provided to planner (and citus), but the ordering of the list is not granted anymore. It's similar to others previous issues reported and still open, as well as to other bugfixes/improvment over time, the most noticable being the ProcessIndirection() which is for domain and similar. However, the implications of this bug are huge for users of `UPDATE SET (...)`: 1. if you used to order by columns order, you're maybe safe: `SET (col1, col2, col3, ...)` 2. if you used not to order by column order: `SET (col2, col1, col3, ...)` then you probably found a problem, or you have one. Note about 1. that despite appearance and your QA, you are at risk: if physical columns ordering is changed (for example after DROPping/ADDing some), the same query which use to apparently works well will silently update other columns... As it is this code is not optimized for performance, not sure it'll be needed.

c2main · 2024-11-19T20:01:01Z

@naisila just a message to help you find this PR when you'll get a chance to look at it...

naisila

#5692 is also trying to fix quite a similar problem. Reviewers had objected to that PR because we want to avoid modifying ruleutils as much as possible and try to fix such issues through the distributed planner. Have you tried to investigate this issue through the lens of the distributed planner hook?

c2main · 2024-11-21T09:04:56Z

#5692 is also trying to fix quite a similar problem. Reviewers had objected to that PR because we want to avoid modifying ruleutils as much as possible and try to fix such issues through the distributed planner. Have you tried to investigate this issue through the lens of the distributed planner hook?

Thank you for the feedback. No I didn't look at distributed planner hook, I will evaluate this approach.

c2main · 2024-11-21T12:27:45Z

#5692 is also trying to fix quite a similar problem. Reviewers had objected to that PR because we want to avoid modifying ruleutils as much as possible and try to fix such issues through the distributed planner. Have you tried to investigate this issue through the lens of the distributed planner hook?

Thank you for the feedback. No I didn't look at distributed planner hook, I will evaluate this approach.

I just looked:

apparently by doing at this level, we'll have to test sooner for the conditions, thus impacting all queries (a bit like HasUnresolvedExternParamsWalker() is doing).
And It's not clear to me yet if patching just a single "planner" will be enough...

I understand the motivation to not touch too much ruleutils, but also my understanding is that ruleutils has been imported precisely to achieve a goal which is not achieved because it contains several limitation due to reasons well explained in Onur's PR (tree rewritten and optimized by postgres before being in the hands of citus).

By touching only distributed planner I fear that the bug will just reappear in some corner case situation, but I am still exploring, sharing here as I'm sure those points have been discussed in the past but I didn't search history...

c2main · 2024-11-21T18:29:32Z

pg_update_query_def..() functions are only internal functions to ruleutils, the entry point for citus is pg_get_query_def.
And this last one is handling cases and all up to the function I patched.

If the patch is elsewhere then I suppose it will be required to do essentially what pg_query_def() is doing and more or less extract this code (and there is more than one function involved here) to citus_ruleutils.c or similar.
Even if writing a very precise walker I think there is also a performance impact which might appear if we need to control on each iteration, or have the code do its own recursion.

IMHO, this is a "codepath" expecting to diverge a lot from upstream because they deserve distinct purpose, for many cases the solution is just to not allow that in Citus and maybe it's an alternative to consider here. Like some other similar queries.

Maybe I missed alternatives ?

naisila · 2024-11-22T07:46:02Z

Thanks for your investigations. The team will look at this in more detail as soon as we get a chance.

This reverts commit 73f4f67.

klando · 2024-11-22T21:39:12Z

Thanks for your investigations. The team will look at this in more detail as soon as we get a chance.

I've tried to patch in lower layer, it's possible, but less convenient and more error prone.
So I've pushed up in distribute_planner hook function, only when needsDistributedPlanning is true.

If it works for you this way I can squash the commits and maybe improve names/comments ?

It'll probably be interesting to add more test related to indirection here.

We may need to reorder parts of the planner tree we are receiving in dsitributed_planner hook because it has been optimized by PostgreSQL. However we only need to rewrite tree which will be used to produce a query string, otherwise it's pointless. Proceed very early in the planner hook, which is probably a good place to pull up some similar fixes applied here and there. See the exported function from citus_ruleutils.c: void RebuildParserTreeFromPlannerTree(Query *query) Added a new include in citus_ruleutils.c: miscadmin.h from PostgreSQL, because it contains the check_stack_depth() function which is a good idea to use as we don't know the size of the query..

c2main · 2024-12-02T10:37:09Z

Thanks for your investigations. The team will look at this in more detail as soon as we get a chance.

I've tried to patch in lower layer, it's possible, but less convenient and more error prone. So I've pushed up in distribute_planner hook function, only when needsDistributedPlanning is true.

If it works for you this way I can squash the commits and maybe improve names/comments ?

@naisila @onurctirtir is it aligned with what you had in mind ?

c2main · 2024-12-13T06:29:42Z

Thanks for your investigations. The team will look at this in more detail as soon as we get a chance.

I've tried to patch in lower layer, it's possible, but less convenient and more error prone. So I've pushed up in distribute_planner hook function, only when needsDistributedPlanning is true.
If it works for you this way I can squash the commits and maybe improve names/comments ?

@naisila @onurctirtir is it aligned with what you had in mind ?

This PR has been open for several month already, it fixes a transparent bug with data alteration. What is required to increase priority or getting some more attention ?

colm-mchugh · 2024-12-20T12:10:23Z

@c2main Thanks for your work on this, we appreciate your reworking the fix to apply it in the distributed planner hook. We would like the fix to be detected and applied at a lower level than in the current PR, for example in CreateModifyPlan() (or at a lower level, in RouterJob()). If this proves to be infeasible then we'd prefer to go with the first approach of applying the fix in citus_ruleutils.c. Thanks again for your efforts, we can work with you in progressing this.

klando · 2024-12-23T15:02:48Z

@c2main Thanks for your work on this, we appreciate your reworking the fix to apply it in the distributed planner hook. We would like the fix to be detected and applied at a lower level than in the current PR, for example in CreateModifyPlan() (or at a lower level, in RouterJob()). If this proves to be infeasible then we'd prefer to go with the first approach of applying the fix in citus_ruleutils.c. Thanks again for your efforts, we can work with you in progressing this.

When patching in CreateModifyPlan() the following query on a distributed table is failing (only this one), apparently it does not enter this function at all (I've not checked why):

with qq3 as (
    update test_dist_indirection
    SET (col_text, col_bool)
      = (SELECT 'full', true)
    where id = 3
    returning *
),
qq4 as (
    update test_dist_indirection
    SET (col_text, col_bool)
      = (SELECT 'fully', true)
    where id = 4
    returning *
)
select * from qq3 union all select * from qq4;

However, adding my rewrite function just before PlanRouterQuery() in RouterJob() does the job.

Maybe it's possible to push down a bit more in PlanRouterQuery() but then there are those if/else cases to manage precisely:

if (fastPathRouterQuery)
...
else
...
if (isMultiShardQuery)
...
else
...

Is it worth pursuing in this direction ?

Meantime, I have update PR with the Routerjob() patch.

codecov · 2024-12-23T15:19:18Z

Codecov Report

Attention: Patch coverage is 27.02703% with 54 lines in your changes missing coverage. Please review.

Project coverage is 49.10%. Comparing base (248ff5d) to head (0b434dc).
Report is 4 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (248ff5d) and HEAD (0b434dc). Click for more details.

HEAD has 78 uploads less than BASE

Flag BASE (248ff5d) HEAD (0b434dc)

15_regress_check-pytest 1 0

14_regress_check-pytest 1 0

16_regress_check-pytest 1 0

_upgrade 18 2

15_regress_check-follower-cluster 1 0

14_regress_check-columnar-isolation 1 0

15_regress_check-columnar-isolation 1 0

16_regress_check-query-generator 1 0

16_regress_check-columnar-isolation 1 0

14_16_upgrade 1 0

15_regress_check-enterprise-isolation-logicalrep-2 1 0

15_regress_check-enterprise-isolation-logicalrep-3 1 0

14_regress_check-columnar 1 0

14_regress_check-enterprise-isolation-logicalrep-3 1 0

14_regress_check-follower-cluster 1 0

16_regress_check-follower-cluster 1 0

16_regress_check-enterprise-failure 1 0

15_regress_check-enterprise-failure 1 0

14_regress_check-enterprise-failure 1 0

14_regress_check-query-generator 1 0

14_15_upgrade 1 0

15_16_upgrade 1 0

14_regress_check-enterprise-isolation-logicalrep-2 1 0

16_regress_check-enterprise-isolation-logicalrep-3 1 0

16_regress_check-enterprise-isolation-logicalrep-2 1 0

14_regress_check-enterprise 1 0

15_regress_check-vanilla 1 0

15_regress_check-columnar 1 0

14_regress_check-split 1 0

16_regress_check-enterprise 1 0

15_regress_check-enterprise-isolation-logicalrep-1 1 0

15_regress_check-multi-mx 1 0

16_regress_check-multi-mx 1 0

16_regress_check-failure 1 0

14_regress_check-multi-mx 1 0

14_regress_check-failure 1 0

15_cdc_installcheck 1 0

15_regress_check-failure 1 0

15_regress_check-operations 1 0

14_regress_check-operations 1 0

16_regress_check-operations 1 0

15_regress_check-query-generator 1 0

16_regress_check-columnar 1 0

14_regress_check-vanilla 1 0

14_regress_check-enterprise-isolation 1 0

16_regress_check-split 1 0

15_regress_check-split 1 0

15_regress_check-enterprise 1 0

16_regress_check-vanilla 1 0

16_regress_check-enterprise-isolation 1 0

15_regress_check-enterprise-isolation 1 0

16_regress_check-isolation 1 0

14_regress_check-isolation 1 0

15_regress_check-multi 1 0

15_regress_check-isolation 1 0

16_regress_check-multi 1 0

16_regress_check-enterprise-isolation-logicalrep-1 1 0

16_cdc_installcheck 1 0

15_regress_check-multi-1 1 0

16_regress_check-multi-1 1 0

14_regress_check-multi-1 1 0

14_regress_check-enterprise-isolation-logicalrep-1 1 0

14_regress_check-multi 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #7675       +/-   ##
===========================================
- Coverage   89.69%   49.10%   -40.60%     
===========================================
  Files         283      283               
  Lines       60510    59770      -740     
  Branches     7541     7367      -174     
===========================================
- Hits        54276    29349    -24927     
- Misses       4079    27794    +23715     
- Partials     2155     2627      +472

colm-mchugh · 2024-12-24T01:40:00Z

However, adding my rewrite function just before PlanRouterQuery() in RouterJob() does the job.

Nice.

Maybe it's possible to push down a bit more in PlanRouterQuery() but then there are those if/else cases to manage precisely:
if (fastPathRouterQuery)
...
else
...
if (isMultiShardQuery)
...
else
...
Is it worth pursuing in this direction ?

Yes, but if it proves too cumbersome because of the different query types it's probably not worth pursuing. For example, if special case code is required for each query type and PlanRouterQuery() gets cluttered.. but please investigate at least, we'll leave it to your judgement if its worth adjusting the PR as a consequence.

c2main · 2024-12-24T11:07:10Z

However, adding my rewrite function just before PlanRouterQuery() in RouterJob() does the job.

Nice.
Maybe it's possible to push down a bit more in PlanRouterQuery() but then there are those if/else cases to manage precisely:
if (fastPathRouterQuery)
...
else
...
if (isMultiShardQuery)
...
else
...
Is it worth pursuing in this direction ?
Yes, but if it proves too cumbersome because of the different query types it's probably not worth pursuing. For example, if special case code is required for each query type and PlanRouterQuery() gets cluttered.. but please investigate at least, we'll leave it to your judgement if its worth adjusting the PR as a consequence.

Pushing down a little bit more, 2 call places are required:

diff --git a/src/backend/distributed/planner/multi_router_planner.c b/src/backend/distributed/planner/multi_router_planner.c
index 84352415f..36aaeeecd 100644
--- a/src/backend/distributed/planner/multi_router_planner.c
+++ b/src/backend/distributed/planner/multi_router_planner.c
@@ -1869,13 +1869,6 @@ RouterJob(Query *originalQuery, PlannerRestrictionContext *plannerRestrictionCon
        }
        else
        {
-               /*
-                * We may need to reorder parts of the planner tree we are receiving here.
-                * We expect to produce an SQL query text but our tree has been optimized by
-                * PostgreSL rewriter already...
-                * FIXME is there conditions to reduce the number of calls ?
-                */
-               RebuildParserTreeFromPlannerTree(originalQuery);
                (*planningError) = PlanRouterQuery(originalQuery, plannerRestrictionContext,
                                                                                   &placementList, &shardId, &relationShardList,
                                                                                   &prunedShardIntervalListList,
@@ -2364,6 +2357,8 @@ PlanRouterQuery(Query *originalQuery,
 
                if (!IsMergeQuery(originalQuery))
                {
+                       /* OK */
+                       RebuildParserTreeFromPlannerTree(originalQuery);
                        planningError = ModifyQuerySupported(originalQuery, originalQuery,
                                                                                                 isMultiShardQuery,
                                                                                                 plannerRestrictionContext);
@@ -2450,6 +2445,8 @@ PlanRouterQuery(Query *originalQuery,
        bool isUpdateOrDelete = UpdateOrDeleteOrMergeQuery(originalQuery);
        if (!(isUpdateOrDelete && RequiresCoordinatorEvaluation(originalQuery)))
        {
+               /* OK */
+               RebuildParserTreeFromPlannerTree(originalQuery);
                UpdateRelationToShardNames((Node *) originalQuery, *relationShardList);
        }

the first call before ModifyQuerySupported() is not "enough".
it's counterintuitive (for me) to have to call before UpdateRelationToShardNames() (I tried to push down into this one but it leads to more calls to RebuildParserTreeFromPlannerTree()). I was wondering if maybe some function names were misleading or just facing a side effect, both case it looks a bit wrong to go deeper into this direction: eventually we'll get back to ruleutils to only be called when appropriate.

At the moment, the RebuildParserTreeFromPlannerTree can be called several times, the most complex being the failing query mentioned earlier (with double CTE on UPDATE, followed by a SELECT on the CTEs), it outlines that each CTE + the SELECT are handled (3 calls to rewrite completely the query).

So what are we trying to solve here ?

less calls to rewrite: have to be done very early on a larger query tree
less calls to rewrite: have to be done very late only when rewriting query with ruleutils and actively entering the path of interest.
in-between: try to figure where there are few calls, and not called when not required, and not recalled for nothing (while pushing down, I also got cases with 5 "rewrites" instead of the current 3 for the "complex query").

There is very probably a design choice to be made, if possible not by me but Citus committers on what contract Citus want to have with parser tree vs rewritter tree vs planner tree:

fix ruleutils as we go to circumvent the side effects of the rewriter on the planner tree received
assume the planner tree is not good for all situation, and fix early in the process in order to reverse planner tree as near as possible to the parser tree (before "rewrite").
patch here and there and assume associated maintenance cost.

My understanding is that the current contract is the 3rd (which include a bit of contract 1 and contract 2). My preferences in design is 2. (it may allow to handle more queries in Citus), then the 1. for workarounds (and evaluate fixes upstream).

From a PostgreSQL point of view, we also have problem around this topic (to show VIEW definition IIRC, and for ANALYZE text output). Having the parser tree available would be very handy. The other way around is maybe to push-back ruleutils fixes to PostgreSQL, if at all possible, to reduce the diff with upstream.

c2main mentioned this pull request Aug 26, 2024

bad planner tree sent to executor in some MULTIEXPR scenario #7676

Open

c2main added 2 commits November 19, 2024 20:27

c2main force-pushed the fix-update-multi-attr-select branch from 099eab0 to 73f4f67 Compare November 19, 2024 19:44

naisila reviewed Nov 21, 2024

View reviewed changes

Revert "Fix issue citusdata#7674 related to Indirection in UPDATE SET()"

b0de950

This reverts commit 73f4f67.

c2main added 2 commits November 22, 2024 22:44

Add more testing for UPDATE with indirection

af39a27

It'll probably be interesting to add more test related to indirection here.

c2main force-pushed the fix-update-multi-attr-select branch from 6d69c20 to 1cf93c1 Compare November 22, 2024 21:44

Move the patch a bit later in RouterJob()

0b434dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Fix issue #7674 about UPDATE SET(..), with indirection #7675

WIP Fix issue #7674 about UPDATE SET(..), with indirection #7675

c2main commented Aug 23, 2024 •

edited

Loading

c2main commented Nov 19, 2024

naisila left a comment •

edited

Loading

c2main commented Nov 21, 2024

c2main commented Nov 21, 2024

c2main commented Nov 21, 2024

naisila commented Nov 22, 2024

klando commented Nov 22, 2024

c2main commented Dec 2, 2024

c2main commented Dec 13, 2024

colm-mchugh commented Dec 20, 2024

klando commented Dec 23, 2024

codecov bot commented Dec 23, 2024

colm-mchugh commented Dec 24, 2024

c2main commented Dec 24, 2024 •

edited

Loading

WIP Fix issue #7674 about UPDATE SET(..), with indirection #7675

Are you sure you want to change the base?

WIP Fix issue #7674 about UPDATE SET(..), with indirection #7675

Conversation

c2main commented Aug 23, 2024 • edited Loading

c2main commented Nov 19, 2024

naisila left a comment • edited Loading

Choose a reason for hiding this comment

c2main commented Nov 21, 2024

c2main commented Nov 21, 2024

c2main commented Nov 21, 2024

naisila commented Nov 22, 2024

klando commented Nov 22, 2024

c2main commented Dec 2, 2024

c2main commented Dec 13, 2024

colm-mchugh commented Dec 20, 2024

klando commented Dec 23, 2024

codecov bot commented Dec 23, 2024

Codecov Report

colm-mchugh commented Dec 24, 2024

c2main commented Dec 24, 2024 • edited Loading

c2main commented Aug 23, 2024 •

edited

Loading

naisila left a comment •

edited

Loading

c2main commented Dec 24, 2024 •

edited

Loading