Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle indexes on virtual columns #2641

Merged
merged 7 commits into from
Aug 27, 2024
Merged

Conversation

nicktobey
Copy link
Contributor

Fixes dolthub/dolt#8276

Lots of small behaviors around virtual columns were not working correctly:

  • Adding an index on a virtual column triggered a table rebuild even when this wasn't necessary
  • Rebuilding a table that contained virtual columns could lead to incorrect results
  • Inserting into a table with a virtual column could update indexes incorrectly
  • Adding a generated column to the start of a table could lead to incorrect results

This PR adds tests for these cases and fixes them by tweaking the logic for projections on tables with generated columns.

@nicktobey
Copy link
Contributor Author

Notably, there's a comment in the previous logic for adding columns with default/generated values:

// Alter the new default if it refers to other columns. The column indexes computed during analysis refer to the
// column indexes in the new result schema, which is not what we want here: we want the positions in the old
// (current) schema, since that is what we'll be evaluating when we rewrite the table.

However, it appears that in fact the exact opposite is true: the column indexes computed during analysis of an ADD COLUMN statement refer to the original schema, but rewriting the table requires transforming the expression to use indexes for the new schema.

@nicktobey nicktobey force-pushed the nicktobey/indexed-virtual branch from f74a998 to 0f9211a Compare August 27, 2024 01:35
@nicktobey nicktobey force-pushed the nicktobey/indexed-virtual branch from 0f9211a to 5bea118 Compare August 27, 2024 01:35
@nicktobey nicktobey requested a review from zachmu August 27, 2024 17:23
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fixes

},
},
{
Name: "creating unique index on stored generated column",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for enforcing uniqueness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
Query: "select * from t1 where b = 2 order by a",
Expected: []sql.Row{{1, float64(2)}},
Skip: true, // https://github.com/dolthub/dolt/issues/8276
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be skipped?

},
},
{
Name: "creating unique index on virtual generated column",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check uniqueness is enforced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -276,6 +276,14 @@ func (td *TableData) numRows(ctx *sql.Context) (uint64, error) {
// throws an error if any two or more rows share the same |cols| values.
func (td *TableData) errIfDuplicateEntryExist(cols []string, idxName string) error {
columnMapping, err := td.columnIndexes(cols)

// We currently skip validating duplicates on unique virtual columns.
for _, i := range columnMapping {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a skipped test assertion for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added comments.

Unfortunately the assertion will get skipped for both GMS and Dolt, but I don't think we have a way to just disable it for GMS.

@nicktobey nicktobey merged commit 65d7991 into main Aug 27, 2024
7 of 8 checks passed
@nicktobey nicktobey deleted the nicktobey/indexed-virtual branch August 27, 2024 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexes on virtual generated columns generate incorrect results.
2 participants