Fix schema for call to hash.HashOf() in HashLookups#3038
Conversation
|
Can you give more context about what this means? Was there an issue caused by this? Are there regression tests? What goes wrong when the schema is used here? |
hash.HashOf() in HashLookups
enginetest/queries/join_queries.go
Outdated
| }, | ||
| }, | ||
| { | ||
| // Since hash.HashOf takes in a sql.Schema to convert and hash keys, |
There was a problem hiding this comment.
This comment is confusing without context. This is a regression test but it doesn't describe what issue was being fixes.
There was a problem hiding this comment.
The issue is that we were passing in the wrong schema; now, we're passing in the right one.
sql/plan/hash_lookup.go
Outdated
| var sch sql.Schema | ||
| if tup, isTup := e.(*expression.Tuple); isTup { | ||
| for _, expr := range tup.Children() { | ||
| sch = append(sch, &sql.Column{Type: expr.Type()}) |
There was a problem hiding this comment.
I'm worried about the performance implications of creating a schema every time we call GetHashKey. And it's not clear what was wrong with using the HashLookup's precomputed schema: does the value of row have a different schema based on which side of the join we're computing the hash for?
There was a problem hiding this comment.
I can pass in nil for schema instead or just store it once we create the key.
HashLookup's "precomputed" schema is the schema of the join not the key.
There was a problem hiding this comment.
We shouldn't pass in nil because we need to convert the keys to the same type before computing the hash. Otherwise if the tables being joined have different-but-convertible types for the join column, we might not detect that two rows are equal because they'll have different hashes.
It looks like in the proposed change, we're passing in a nil schema if the expression isn't a tuple. That's probably incorrect. I'm surprised that it's not breaking any tests.
We should precompute the schema of the key column(s) and store it in the HashLookup. We should also add a test where the tables being joined have different types that would produce different hashes, to make sure we handle that correctly.
enginetest/queries/join_queries.go
Outdated
| { | ||
| // Since hash.HashOf takes in a sql.Schema to convert and hash keys, | ||
| // we need to pass in the right keys. | ||
| Name: "HashLookups regression test", |
There was a problem hiding this comment.
Can we split this into two different tests?
The incorrect schema was used in the
hash.HashOf()function.n.Schema()is the schema of the entire JoinNode; we just needed the schema of the key.'Test bump: https://github.com/dolthub/ld/pull/20634