perf: keyword lookups in the tokenizer#7606
Conversation
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
|
I'm loving this. |
go/vt/sqlparser/keywords_test.go
Outdated
| if !ok { | ||
| t.Fatalf("keyword %q failed to match", kw.name) | ||
| } | ||
| if lookup != kw.id { | ||
| t.Fatalf("keyword %q matched to %d (expected %d)", kw.name, lookup, kw.id) | ||
| } |
There was a problem hiding this comment.
nit: use require over t.fatal
go/vt/sqlparser/parse_test.go
Outdated
| if err != nil { | ||
| t.Errorf(" Error: %v", err) | ||
| t.Fatal(err) | ||
| } |
There was a problem hiding this comment.
nit: use require.NoError(t, err)
| if err != nil { | ||
| t.Error(scanner.Text()) | ||
| t.Errorf(" Error: %v", err) | ||
| t.Errorf("failed to parse %q: %v", query, err) | ||
| } |
There was a problem hiding this comment.
nit: use require.NoError(t, err)
| if err != nil { | ||
| b.Fatal(err) | ||
| } |
There was a problem hiding this comment.
nit: use require.NoError(t, err)
| if err != nil { | ||
| b.Fatal(err) | ||
| } |
| if err != nil { | ||
| b.Fatal(err) | ||
| } |
|
I love how entertaining AND descriptive your PRs are @vmg :) |
Signed-off-by: Vicent Marti <vmg@strn.cat>
|
@harshit-gangal: I've added Ready to merge. 👌 |
What about quicktest? Does it add overhead? |
|
@deepthi I haven't tested it, I just noticed the |
Description
Happy Thursday everyone, this week we're bringing
sqlparserperformance improvements. I had a chance to sit with @frouioui and look at some of the profiles that we're now acquiring from his Are We Fast Yet (TM) work. There was nothing glaringly obvious that would provide massive optimization gains (as one would expect at this point, Vitess is quite optimized already), but for the normal request lifecycle, all of thesqlparseroperations on the AST are always quite hot, and most importantly for our goals, CPU-bound.Let's start squeezing some blood out of this stone: this particular PR comes from allocation benchmarks: the code in our SQL Tokenizer that processes SQL keywords allocates so much memory that it's showing as a hotspot in a CPU profiler and very clearly an allocation hotstop in the memory profiler.
Why does it do all these allocations? Well, right now it is copying the current token it's processing into a temporary buffer (this is the buffer that gets returned to the caller) and then it does yet another copy of the buffer to lowercase it so it can be looked up in our keywords table (people following at home will surely remember that SQL keywords are case insensitive).
Let's improve this with some very classical Compiler Theory approaches: Instead of using a hash table to lookup keywords, use a perfect table (a perfect table is a minimal hash table where lookups cannot collide -- it is measurably faster than a normal hashtable, even the one built-in in the Go runtime). And since we now have a perfect hash table, we control the hashing algorithm used for lookups... So we can switch the algorithm to perform the hashing case-insensitively. This makes it so we don't have to create lowercase copies of all keywords!
Results are :gucci: in the most realistic parse benchmarks. The pathological benchmarks do not regress.
Related Issue(s)
Checklist
Deployment Notes
Impacted Areas in Vitess
Components that this PR will affect: