-
-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kvexec] merge join #8561
base: main
Are you sure you want to change the base?
[kvexec] merge join #8561
Conversation
#benchmark |
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
#benchmark |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using opposite logic is more readable:
https://github.com/dolthub/dolt/compare/max/kv-merge-join...james/refactor?expand=1
While goto
s aren't the best, I think it's still pretty understandable, so LGTM
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as bad as you built it up to be, generally not too hard to understand.
I think readability would be improved by making the loops explicit, keeping goto statements for true jumps rather than "go to beginning of this loop"
if ita, ok := getIta(n.Right()); ok && len(r) == 0 && simpleLookupExpressions(ita.Expressions()) { | ||
if _, _, dstIter, _, dstTags, dstFilter, err := getSourceKv(ctx, n.Right(), false); err == nil && dstIter != nil { | ||
if srcMap, srcIter, _, srcSchema, srcTags, srcFilter, err := getSourceKv(ctx, n.Left(), true); err == nil && srcSchema != nil { | ||
if _, _, _, dstIter, _, _, dstTags, dstFilter, err := getSourceKv(ctx, n.Right(), false); err == nil && dstIter != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One easy readability win here would be to return a struct or an interface from this method. 3 return values is usually fine, 4 is pushing it, this is front page of reddit level bad
@@ -263,74 +305,79 @@ func getPhysicalColCount(schemas []schema.Schema, splits []int, projections []ui | |||
// getSourceKv extracts prolly table and index specific structures needed | |||
// to implement a lookup join. We return either |srcIter| or |dstIter| | |||
// depending on whether |isSrc| is true. | |||
func getSourceKv(ctx *sql.Context, n sql.Node, isSrc bool) (prolly.Map, prolly.MapIter, index.SecondaryLookupIterGen, schema.Schema, []uint64, sql.Expression, error) { | |||
func getSourceKv(ctx *sql.Context, n sql.Node, isSrc bool) (prolly.Map, prolly.Map, prolly.MapIter, index.SecondaryLookupIterGen, schema.Schema, schema.Schema, []uint64, sql.Expression, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Def model this after getMergeKv
Probably want to return a pointer to a struct and an err
|
||
func getMergeKv(ctx *sql.Context, n sql.Node) (mergeState, error) { | ||
ms := mergeState{} | ||
//var secMap prolly.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove these unused vars
type coveringNormalizer func(val.Tuple) (val.Tuple, val.Tuple, error) | ||
|
||
type mergeState struct { | ||
idxMap prolly.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth giving these fields comments, probably the type as well
return ms, err | ||
} | ||
|
||
priMap := durable.ProllyMapFromIndex(priIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming there's an LD check somewhere above this that will prevent this breaking on old repos?
I'm fine breaking old repos but it shouldn't panic
return candidate, true, nil | ||
} | ||
|
||
func (l *mergeJoinKvIter) buildCandidate(ctx *sql.Context, leftKey, leftVal, rightKey, rightVal val.Tuple) (sql.Row, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just buildRow?
goto match | ||
} | ||
|
||
incr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just call this compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only exits from this block are to the next block (matchBuf
) or to exhaustLeft
, otherwise it loops.
exhaustLeft
is terminal and is better expressed as a method (see below), so this whole block would be better expressed as a for {
loop with a break
statement when it's done
goto matchBuf | ||
} | ||
|
||
match: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here: most of the goto
statements are just loop statements, and of the 2 that aren't one is to the terminal exhaustLeft
. Write as a loop, makes the intent clearer
goto exhaustLeft | ||
} | ||
|
||
func (l *mergeJoinKvIter) tryReturn(ctx *sql.Context, leftKey, leftVal, rightKey, rightVal val.Tuple) (sql.Row, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably call this buildResultRow
, add a comment about the boolean return value
|
||
func schemaIsCovering(sch schema.Schema, projections []uint64) bool { | ||
cols := sch.GetAllCols() | ||
if len(projections) > cols.Size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this about? Could use a comment
This isn't the best perf win on linux, but it counteracts the
sql.Row
interface PR which otherwise would swing merge join +30% in the wrong direction.TODO: