Merged
Conversation
daeho-ro
approved these changes
Mar 3, 2026
Contributor
|
🤖 An automated task has requested bottles to be published to this PR. Caution Please do not push to this PR branch before the bottle commits have been pushed, as this results in a state that is difficult to recover from. If you need to resolve a merge conflict, please use a merge commit. Do not force-push to this PR branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Created by
brew bumpCreated with
brew bump-formula-pr.Details
release notes
table import -uto skip missing table columns and match CSV warning behaviorFix dolthub/dolt#10589
dolt table import -uno longer fails immediately when the destination table has additional columns missing from the file.Cache remote DoltDB instances across pushes, use parented commits with bounded depth for incremental git deltas, write table files as single blobs instead of split .records/.tail intermediates, and run periodic git gc to repack cache repos.
When StatementBegin encounters an error (e.g., table not found in staging root), it stores the error in wtu.err but leaves tableWriter as nil. The Update and Delete methods were dereferencing tableWriter before checking if it was nil, causing a panic.
This fix adds an early return to check for errors from StatementBegin before attempting to use tableWriter, preventing the nil pointer dereference.
go-mysql-server
plan.Limitfor doltgresgenerate_seriesPR
TopNoptimizations dolthub/go-mysql-server#3432 replacedplan.Limitnodes withplan.TopN, because they are unnecessary for anything in dolt and gms. However, because of the waygenerate_seriesworks withLIMIT, we actually need to keep theplan.Limitat the top.This should have minimal impact on performance
/guard on new databases to Dolt and addsql.ErrWrongDBNameandsql.SQLErrorto automatically add other error metadataParent dolthub/dolt#10431
greatestandleastfunctionsfixes dolthub/dolt#10562
Summary
sql.NumberTypefor system numeric system-variable types:SystemBoolTypesystemIntTypesystemUintTypesystemDoubleTypesql.IsNumberTyperecognizes system numeric types.AlignBucketspath).Testing
go test ./sql/types ./sql/statsgo test ./sql/... -run TestDoesNotExist -tags=gms_pure_goRelated
ALTER TABLE DROP CONSTRAINTto remove unique indexesWhen adding a unique constraint to a table, it wasn't possible to remove it using
ALTER TABLE DROP CONSTRAINT. This change allows unique indexes to be removed with this syntax.Related to: Fix ALTER TABLE ... DROP CONSTRAINT for UNIQUE constraints dolthub/doltgresql#2359
sql.StringTypeinterface forsystemStringTypefixes dolthub/dolt#10534
part of dolthub/dolt#10535
fixes dolthub/dolt#10527
Using the index of the first found
.rune as an offset to get an index column name was causing column names to not be correctly matched when the table name contained periods. Using the length of the table name as an offset avoids this issue and is also slightly more performant.In a join node, the
scopeLenvariable is the total number of columns that represent values from outer scopes.Previously, we set this value equal to the number of columns in the immediate parent scope. When there are multiple nested scopes, this value is incorrect.
This PR adds an example of when this matters: a non-lateral join in multiple nested scopes would return an incorrect number of columns. A projection above this join would then index into this returned row, resulting in an out-of-bounds error.
TopNoptimizationsChanges:
LIMIT(PROJECT(SORT(...)))withPROJECT(TOPN(...))TopN(Limit = 1), usetopRowIterto avoid interface conversions and extra mallocs.Benchmarks: TopN test dolthub/dolt#10522 (comment)
Vector indexes require NOT NULL columns, similar to spatial indexes. Previously, creating a vector index on a nullable column was silently accepted, which could cause runtime panics if any of the elements were NULL.
This adds validation in both the analyzer (CREATE TABLE) and DDL execution (CREATE INDEX / ALTER TABLE) paths, returning a clear error message.
There were several problems with the previous join iterator implementation:
The behavior with subqueries is the main motivation for this PR.
Previously, in order to expose values from outer scopes to iterators, we would dynamically inject PrependNodes into subquery build plans. These nodes would insert values into the beginning of returned rows, allowing parent iterators to read them and use them in expressions. To compensate for this, we would also inject StripRowNodes into joins. StripRowNodes are the opposite of PrependNodes, removing columns from their iterators.
This logic was incredibly difficult to reason about correctly:
-- Lateral joins would re-include all the values from the outermost scope in the next scope, effectively doubling the number of columns with each nesting level.
-- StripRowNodes would only be generated based on the innermost scope, resulting in some injected values not being removed.
-- StripRowNodes would be generated under each join node, including between join nodes in a multi-table join. Join nodes would thus would need to re-insert these values in order to compensate... but were expected to not re-insert columns corresponding to values defined by a parent join node, except for lateral joins... reasoning about this correctly quickly becomes untenable.
Ultimately, there's no reason why join nodes can't handle this directly. And removing the StripRowNodes type and replacing it with logic in the join iterators actually makes the logic much more consistent: Parent iterators should assume that all child iterators contain prepended values for outer scopes, and values determined by the node's schema, and nothing else. And the parent iterator returns rows that also have this property.
Wrapping subquery nodes in a
Limitnode before checking if it was aSubqueryAliaswas causing us to not findSubqueryAliasnodes. This was noticed when investigating dolthub/dolt#10472. We should probably be inspecting the whole node instead of simply checking the node type, but this is a quick fix.filed dolthub/dolt#10494 to add better test coverage and address TODOs
fixes dolthub/dolt#10472
All RecursiveCTEs are wrapped by a SubqueryAlias node. We currently don't push filters through a RecursiveCTE so pushing the filter below the SubqueryAlias just causes it to be awkwardly sandwiched in between and might make some queries less optimal if the filter contains outerscope columns since the SQA gets marked as uncacheable. We probably can push filters through a RecursiveCTE in some cases, but we should spend more time thinking about what that means (see dolthub/dolt#10490).
Also contains some refactors that I noticed along the way.
skipped tests will be fixed in #3427
related to dolthub/dolt#10472
Scope length should only be set when assigning indexes if the scope length then is not zero. The scope length set during join planning is doesn't actually refer to the correct scope length, and if it was actually supposed to be zero, it was never correct set back to zero, causing a panic.
fixes dolthub/dolt#10435
Disables merge and range heap joins between text and number type columns
ColumnIds forEmptyTableFixes dolthub/dolt#10434
EmptyTableimplementsTableIdNodeso it was usingColumns()to get theColumnIds.EmptyTable.WithColumns()is only ever called for testing purposes; as a result, theColSetreturned is empty. This causes the column toColumnIdmapping to be incorrectly off set, leading to the wrong index id assigned.This fix adds a case for
EmptyTableincolumnIdsForNodeto add placeholderColumnIdvalues so the mappings are correctly aligned. I considered setting the actualColSetforEmptyTablebut there's actually not a good way to do that. Regardless, the index id will be set either using the name of the column or using the Projector node that wraps the EmptyTable.Similar to
SetOp,EmptyTableprobably shouldn't be aTableIdNode(see dolthub/dolt#10443)fixes dolthub/dolt#10304
Despite what the comment said, it's not safe to join remaining tables with CrossJoins during
buildSingleLookupPlan. It is only safe to do so if every filter has been successfully matched tocurrentlyJoinedTables. Otherwise, we end up dropping filters.For example, we could have a query like
select from A, B, inner join C on B.c0 <=> C.c0where table A has a primary key and tables B and C are keyless.columnKeymatches A's primary key column and A would be added tocurrentlyJoinedTables. Since the only filter references B and C and neither are part ofcurrentlyJoinedTabes, nothing is ever added tojoinCandidates. However, it's unsafe to join all the tables with CrossJoins because we still need to account for the filter on B and C.fixes dolthub/dolt#10284
part of dolthub/dolt#10340
benchmarks
Time.Subcall inTIMESTAMPDIFFwithmicrosecondsDifffixes dolthub/dolt#10397
Time.Subdoesn't work for times with a difference greater than 9,223,372,036,854,775,807 (2^63 - 1) nanoseconds, or ~292.47 years. This is becauseTime.Subreturns aDuration, which is really anint64representing nanoseconds. MySQL only stores time precision to the microsecond so we actually don't care about the difference in nanoseconds.However, there's no easy way to directly expose the number of microseconds or seconds since epoch using the public functions for
Time-- this is because seconds since epoch are encoded differently with different epochs depending on whether the time is monotonic or not (Jan 1, 1885 UTC or Jan 1, 0001 UTC).Time.SubusesTime.secto normalizeTimeobjects to seconds since the Jan 1, 0001 UTC epoch. ButTime.secisn't public so we can't call it ourselves. AndTime.SecondandTime.Nanosecondonly give the second and nanosecond portion of a wall time, not the seconds/nanoseconds since an epoch. However,Time.UnixMicrodoes give us the microseconds since Unix epoch (January 1, 1970 UTC)...by callingTime.secand then converting that to Unix time.So
microsecondsDiffcalculates the difference in microseconds between twoTimeobjects, getting their microsecond values by callingTime.UnixMicroon both of them. This isn't the most efficient but it's the best we can do with public functions.monthandquarterfor timestampdiff based on date and clock time valuesfixes dolthub/dolt#10393
Refactors logic for
yearinto separate functionmonthDiffso that it can be reused formonthandquarterThis adds a
sql.StatementRunnerto all hooks, as they may want to execute logic depending on the hook statement. For example, cascade table deletions could literally just runDROPon the cascading items when dropping a table rather than trying to manually craft nodes which are subject to change over time.yearfortimestampdiffbased on date and clock time valuesfixes dolthub/dolt#10390
Previous calculation incorrectly assumed every year has 365 days (doesn't account for leap years) and month has 30 days (doesn't account for over half the months). The offset from this wasn't noticeable with smaller time differences but became apparent with larger time differences. This still needs to be fixed for
monthandquarter(#10393)fixes dolthub/dolt#10385
3404: Bump google.golang.org/protobuf from 1.28.1 to 1.33.0

Bumps google.golang.org/protobuf from 1.28.1 to 1.33.0.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/dolthub/go-mysql-server/network/alerts).
The analyzer tries to improve query plans by moving filter expressions to be directly above the table that they reference. Previously, this had a limitation in that it would treat references to aliases in Project nodes as opaque: if the alias expression references a table, the analysis wouldn't consider the filter to reference that table.
As a result, it wasn't possible to push a filter into multiple subqueries unless a previous optimization eliminated the Project node.
This PR enhances the analysis with the following steps:
Reasoning about the safety is tricky here. We should replace a GetField with the underlying expression if and only if we're actually moving the Filter beneath the Project.
The main concerns would be:
In practice I don't think the first concern can happen because it would require that the filter is getting pushed to some nameable-but-not-opaque node between the Projects, which I don't think exists.
The second concern requires that the project aliases an expression that doesn't reference any tables but can't be replaced with its underlying expression in a filter, and I don't think that's possible either.
Makes it so
create indexqueries with expression argument will produce a warning instead of an error.float64tofloat64We save on conversion costs by avoiding a call to convert float64 to float64.
Unfortunately this has little to no impact on any of our benchmarks because
groupbyIteruses concurrency.benchmarks: group by sum test dolthub/dolt#10359 (comment)
strings.ToLowercalls ingatherTableAliasbenchmarks: tolower bump dolthub/dolt#10355 (comment)
transform.InspectWithOpaquefunctionThis changes
transform.Inspectto not apply the helper function on children ofsql.OpaqueNodes.Additionally, it adds a
transform.InspectWithOpaquethat does.This is to match
transform.Nodeandtransform.NodeWithOpaque.There are still some inconsistencies between the different transform helper functions:
convertLeftAndRightThis PR removes
c.convertLeftAndRight, which avoids calls toc.Left().Type()andc.Right().Type().Not entirely sure why receiver methods would impact performance this much, but benchmarks say so.
Benchmarks: test revert compare dolthub/dolt#10342 (comment)
part of dolthub/dolt#10284
part of dolthub/dolt#10335
We attempt to push filter expressions deeper into the tree so that they can reduce the size of intermediate iterators. Ideally, we want to push filters to directly above the data source that they reference.
Previously, we only pushed filters if they only referenced a single table, since pushing a filter that referenced multiple tables could potentially move the filter to a location where one of the referenced tables is no longer in scope. However, if the extra table references refer to a table in an outer scope or lateral scope, pushing the filter is completely safe. GetFields that reference an outer or lateral scope can be effectively treated as literals for the purpose of this optimization.
This PR changes
getFiltersByTable, a function that maps tables onto the filters that reference those tables. Previously it would ignore filters that reference multiple tables. Now, it allows those filters provided that the extra references are to outer/lateral scopes.This improves many of the plan tests:
A small number of neutral / slightly negative changes:
Previously, when building scopes for subqueries that appear inside joins, we would only track a single parent join node. If the subquery had multiple join parents, we would only be able to resolve references to the innermost subquery. This inhibits the optimizations we can perform.
This PR uses a custom tree walker to track a list of parent join nodes, and includes an example of a query that was not previously possible to optimize.
The previous implementation of the
applyIndexesFromOuterScopesoptimization uses thegetTablesByNamefunction to map table name references onto tables. But this function only locates ResolvedTables, and other optimizations rely on this behavior.I've split
getTablesByNameinto two different functions:getResolvedTablesByName, which has the original behavior, andgetNamedChildren, which takes a parent node and identifies all nameable nodes in its children, including both ResolvedTables and TableAliases.fixes dolthub/dolt#10311
branched from #3381
Enums of different types join based on their string value, not their underlying int value. Ensures join correctness for enums of different types by doing the following:
part of dolthub/dolt#10311
This change allows avoiding unnecessary type conversions when filtering by index and also simplifies some range filter expressions.
This causes foreign keys, particularly enums and sets, to be compared based on their internal values, instead of based on the generalized converted type.
This PR enhances the "applyIndexesFromOuterScope" analysis pass to transform filters on tablescans into indexed table lookups, when the table's column is being compared with a column visible from a lateral join.
An example of a query that can be optimized with this change:
select x, u from xy, lateral (select * from uv where y = u) uv;Users don't often write lateral joins, but the engine can transform
WHERE EXISTSexpressions into lateral joins, and this lets us optimize those too.This is a partial fix for dolthub/dolt/issues/10297
When parsing a subquery alias, we create a new column id for each column in the SQA schema. The scope mapping is a dictionary on the SQA node that maps those column ids onto the expressions within the subquery that determine their values, and is used in some optimizations. For example, in order to push a filter into a subquery, we need to use the scope mapping to replace any GetFields that were pointing to the SQA with the expressions those fields map to. If for whatever reason the SQA doesn't have a scope mapping, we can't perform that optimization.
We parse views by recursively calling the parser on the view definition. This works but it means that the original parser doesn't have any references to the expressions inside the view, which prevents us from creating the scope mapping.
This PR attempts to fix this. Instead of defining the SQA columns in the original parser (where we no longer have access to the view's scope), we now create the columns while parsing the view, and attach them to the scope object for the view definition. Then we store that scope in a field on the Builder, so that the original parser can copy them into its own scope.
This feels hacky, but was the best way I could think of to generate the scope mappings and ensure they're visible outside the view.
IsExprcase toreplaceVariablesInExprStored procedures containing
ISexpressions that referenced a procedure variable were breaking.NATURAL FULL JOINFixes dolthub/dolt#10268
Part of dolthub/dolt#10295
Relies on dolthub/vitess#447 and adds test
errgroup.Group.Go()calls for consistent panic recoveryEach spawned goroutine needs a panic recovery handler to prevent an unexpected panic from crashing the entire Go process. This change introduces a helper function that wraps goroutine creation through
errgroup.Groupand installs a panic recovery handler and updates existing code to use it.Related PRs:
LoadOnlycontextfixes dolthub/dolt#10287
fixes dolthub/dolt#10288
This PR adds a
LoadOnlyflag to a trigger context so that the body of a CreateTrigger statement is not unnecessarily parsed. This avoids parsing the trigger body every time aninsert,update, ordeleteis called and only parses it when the trigger is actually relevant to the event. This also avoids parsing the trigger body whenshow triggeris called.This PR also rearranges some things in
buildCreateTriggerto be more performant.drop viewfixes dolthub/dolt#10201
absfixes dolthub/dolt#10171
fixes dolthub/dolt#10270
Add case for bool types and add default case that tries to convert value to Float64.
Spawning a new go routine prevented Doltgres' panic handler from catching the panic. This was discovered by Doltgres regression tests crashing from the unhandled panic.
fixes dolthub/dolt#10258
An empty right iterator in an exists iterator should be treated the same as an EOF. Previously, we were treating an empty right iterator as if it would be returning a single nil row, but this is wrong. An empty right iterator would imply an empty set, which is not the same as a single nil row.
time.Timelibrary callsInstead of calling
Year(),Month(),Day(),Hours(),Minutes(), andSeconds()individually, just usetime.Date()andtime.Clock().Benchmarks: faster date parse bump dolthub/dolt#10249 (comment)
ANDandORfiltersfixes dolthub/dolt#10243
Previously, if one of the children of an
ANDorORexpression was a literal that evaluated tofalseortrue(respectively), we would return the child. However, this caused a typing issue sinceANDandORare booleans while its children can be of any type. Take the expression19 or 's'for example; the left child19evaluates totrue, making the expression true every time, but returning the left child19as a simplification would be incorrect since we would want the expression to simplify to a booleantrue, not the number 19.Now, instead of returning the child that determines the result of an
ANDorORexpression, we return the boolean value that the express...View the full release notes at https://github.com/dolthub/dolt/releases/tag/v1.83.1.