Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexes on virtual generated columns generate incorrect results. #8276

Closed
nicktobey opened this issue Aug 15, 2024 · 3 comments · Fixed by dolthub/go-mysql-server#2641
Closed
Labels
bug Something isn't working correctness We don't return the same result as MySQL sql Issue with SQL

Comments

@nicktobey
Copy link
Contributor

main*> create table t1 (a int primary key, b int as (a + 1) virtual);
main*> insert into t1(a) values (1), (2);
Query OK, 2 rows affected (0.00 sec)
main*> select * from t1;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 2 | 3 |
+---+---+
2 rows in set (0.00 sec)

main*> select * from t1 where b = 2 order by a;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.00 sec)

main*> create index i1 on t1(b);
main*> select * from t1 where b = 2 order by a;
Empty set (0.00 sec)

main*> describe select * from t1 where b = 2 order by a;
+-----------------------------+
| plan                        |
+-----------------------------+
| Sort(t1.a ASC)              |
|  └─ IndexedTableAccess(t1)  |
|      ├─ index: [t1.b]       |
|      └─ filters: [{[2, 2]}] |
+-----------------------------+
4 rows in set (0.00 sec)

After adding the index, the same select query returns a different, incorrect result.

@nicktobey
Copy link
Contributor Author

This only happens on Dolt. Running GMS with its in-memory storage returns the correct result.

@timsehn timsehn added sql Issue with SQL correctness We don't return the same result as MySQL bug Something isn't working labels Aug 20, 2024
@nicktobey
Copy link
Contributor Author

Inspecting the secondary index, it appears to be incorrect.

In this specific example, the secondary index looks like this:

SerialMessage {

    { key: , 01000000 value:  }
    { key: , 02000000 value:  }
}

The first key column is b which appears to have been given a NULL value.

Inserting new rows and updating existing rows causes the correct values to be inserted into the index. It's only creating the index that gives the wrong result.

What's interesting is that virtual columns in the table can interfere with the creation of indexes on non-virtual columns:

create table t6(pk int primary key, b int generated always as (pk+1), c int);
insert into t6(pk, c) values (1, 10), (2, 20);
create index c6 on t6(c);

Results in c6 looking like this:

SerialMessage {

    { key: , 01000000 value:  }
    { key: , 02000000 value:  }
}

However, adding an index on b appears to fix it.

My theory to what's going on here is that since virtual columns don't appear in the primary index, they create an off-by-one error when creating a secondary index. For instance, in the above example, c is the third column in the table, so when creating a secondary index, we look for the third column in the primary... but there are only two columns in the primary, so we read past the end of the tuple, which returns NULL.

@nicktobey
Copy link
Contributor Author

There's a couple things going on here.

  1. As seen in the function indexCreateRequiresBuild in ddl_iters.go, creating an index on a virtual column always results in a table rebuild. This feels unnecessary.
func indexCreateRequiresBuild(n *plan.AlterIndex) bool {
	return n.Constraint == sql.IndexConstraint_Unique || indexOnVirtualColumn(n.Columns, n.TargetSchema())
}
  1. According to the docstring for IndexBuildingTable, tables which support indexes on virtual columns must implement this interface. AlterableDoltTable does not, which causes a different code path for rebuilding the table.
// IndexBuildingTable is an optional extension to IndexAlterableTable that supports the engine's assistance in building
// a newly created index, or rebuilding an existing one. This interface is non-optional for tables that wish to create
// indexes on virtual columns, as the engine must provide a value for these columns.
  1. The code path that gets used to rebuild the table ends up taking an Inserter and a RowIter from the underlying AlterableDoltTable. This seems like it should work, except that the RowIter doesn't take into account any projections from virtual columns. It's unclear if it's intended to or not. This could be a bug, or it could be intended and changing it would break other things.

  2. This would have been caught sooner, except that prollyRowIter, the struct for iterating over the rows in a prolly tree, appears to have it's rowLen field incorrectly set, matching the total number of projections in the table schema (which in the case of tables with virtual columns, is greater than the actual number of rows in the physical table.) This causes the extra rows to be seen as nil.

(2) seems like vestigal code and an outdated docstring, the IndexBuildingTable interface isn't actually used in Dolt.

Mostly likely what's going on here is that (3) wasn't written with virtual columns in mind. There's a mismatch going on here where the RowIter is returning the underlying physical rows but the Inserter expects the projected rows, including virtual columns. This should be a mismatch between the row lengths, but (4) results in the missing columns being treated as nil instead.

We could likely get around this specific issue by not rebuilding the table, but this could still result in incorrect behavior is the table gets rebuilt for other reasons.

But we shouldn't ignore (1) either, because needlessly rebuilding the table is slow and expensive. We should fix both issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctness We don't return the same result as MySQL sql Issue with SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants