Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Aug 18, 2022

fixes #5442

@nastra nastra marked this pull request as draft August 18, 2022 09:40
@github-actions github-actions bot added the core label Aug 18, 2022
// if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, the history cannot be
// [(t1, s1), (t3, s3)] because it appears that s3 was current during the time between t2
// and t3 when in fact s2 was the current snapshot.
newSnapshotLog.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem was introduced in #3664. The original code that this comment applies to cleared the log when it encountered a snapshot that has been expired and removed. That's the correct behavior.

The problem is that I combined that behavior with the check to suppress intermediate snapshots in a transaction. We should suppress those snapshots in the log, but we don't need to clear the log. Here's what I think should happen instead:

        if (snapshotsById.containsKey(snapshotId)) {
          if (!intermediateSnapshotIds.contains(snapshotId)) {
            // copy the log entries that are still valid
            newSnapshotLog.add(logEntry);
          }
        } else {
          // any invalid entry causes the history before it to be removed. otherwise, there could be
          // history gaps that cause time-travel queries to produce incorrect results. for example,
          // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, the history cannot be
          // [(t1, s1), (t3, s3)] because it appears that s3 was current during the time between t2
          // and t3 when in fact s2 was the current snapshot.
          newSnapshotLog.clear();
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 This looks correct to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes total sense, thanks

@nastra nastra force-pushed the snapshot-log-history branch from eadf13b to 61da89b Compare August 18, 2022 17:24
@nastra nastra marked this pull request as ready for review August 18, 2022 17:25
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM once tests pass.

Considering that this change reverts an earlier refactor, I don't think that additional manual testing is necessarily needed.

files(FILE_A, FILE_B),
statuses(Status.ADDED, Status.ADDED));

org.assertj.core.api.Assertions.assertThat(table.history()).containsAll(initialHistory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider asserting on the entire history that should be there -- which should be the initial history + the one entry post txn.commitTransaction().

@rdblue rdblue added this to the Iceberg 0.14.1 Release milestone Aug 18, 2022
@rdblue rdblue merged commit 585fd0c into apache:master Aug 18, 2022
@rdblue
Copy link
Contributor

rdblue commented Aug 18, 2022

Thanks for fixing this, @nastra!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transaction with multiple statements clear table history

3 participants