Skip to content

Fix partial left join update when columns have default values or join table is in where clause#2207

Merged
max-hoffman merged 8 commits intodolthub:mainfrom
BarakatX2:fix-partial-left-join-update
Jan 23, 2024
Merged

Fix partial left join update when columns have default values or join table is in where clause#2207
max-hoffman merged 8 commits intodolthub:mainfrom
BarakatX2:fix-partial-left-join-update

Conversation

@BarakatX2
Copy link
Contributor

I ran into an issue where I was getting a panic when in update queries using left joins with at least some rows being null. The cases where I could confirm this happens is when the query planner wraps the JoinNode in Project or Filter or both. The logic that is supposed to skip null rows for update fails to skip when the node is not directly a JoinNode. I don't know if there are other node types that should be handled the same way in the toJoinNode function I added. This fix resolved the issues I was having.

Side note:
Another related issue I ran into while working on the tests for this was the same type of error happening when the join table is backed by IndexedTableAccess. It seems to set the root join filter to true, with a nested node that has the actual filter. The true filter causes the update skip logic to allow updates on every attempted join, even when null. This was happening for a bit but it stopped while I was messing with the test data so I couldn't keep diagnosing it.

@timsehn
Copy link
Contributor

timsehn commented Dec 20, 2023

@max-hoffman will look at this today.

@max-hoffman
Copy link
Contributor

@BarakatX2 I've spent a few hours looking into this to understand how update joins work and I'm seeing a variety of problems with the underlying code, not caused your PR:

  • Filters aren't pushed to the correct leaf relations for UPDATE_JOIN, producing incorrect result sets (we can't do an IS NULL filter on a join RHS after a left join)
  • I'm not sure we have great projections, having, and aggregations treatment for update joins either
  • It seems like we execute the join twice (after filters, etc) I think as an attempt to true-up invalid result sets. You noticed that this is a problem for join types where we eliminate the filter b/c it is often redundant
  • The selectivity edit you add works for 2 tables but won't work for 3, I don't think we support 3+ table update joins yet though

I think this PR does fix some queries, we could probably expand the whitelist to Limit/Offset/Sort/Distinct/Having/Grouping/Window if we wanted to be safe.

I'm more worried about all of the related problems with how update joins are executed though. This would be a good one to be on @jycor 's radar just because of how many problems I'm seeing.

I'm going to get someone else to take a look, see if I am missing anything. Otherwise my suggested edits would be 1) expand whitelist, 2) more queries that touch the whitelist, and then 3) probably put the tests in script_queries.go because of the specificity of the setup to the test queries.

}
}

func toJoinNode(node sql.Node) *plan.JoinNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use switch n := node.(type) instead of having to recast every time
Example:

func toJoinNode(node sql.Node) *plan.JoinNode {
	switch n := node.(type) {
	case *plan.JoinNode:
		return n
	case *plan.Filter:
		return toJoinNode(n.Child)
	case *plan.Project:
		return toJoinNode(n.Child)
	default:
		return nil
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is what I've changed it to with all the other node types mentioned above added. I will hopefully be able to push the updates sometime this week.

* An update query contains a left join where only some rows join
* The update query contains a where clause on the join table and/or joins a table with default values
@BarakatX2 BarakatX2 force-pushed the fix-partial-left-join-update branch from 45f8981 to e7e0a1a Compare January 19, 2024 04:41
@BarakatX2
Copy link
Contributor Author

I've pushed the updated test and the additional node traversal. Of the nodes handled by toJoinNode, I was only able to make the test hit:

  • TopN from the order by with Limit
  • Sort from order by without Limit
  • Project from the aliasing
  • Limit

I was able to hit plan.Filter before, but couldn't figure it out after making the new test. I did run into the IndexedTableAccess problem again while trying to figure that out, so I added the fix for that. If anyone has suggestions for changing the query or adding more queries to test the other node types, please let me know.

@max-hoffman max-hoffman merged commit 9424ebd into dolthub:main Jan 23, 2024
@max-hoffman
Copy link
Contributor

Thank you for taking the time to test and contribute to GMS @BarakatX2! We appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants