-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Clean up transaction oracle as we go #1198
Conversation
e8b5093
to
e8eba91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @damz . I've added some comments to better understand the code.
// A commit at the read timestamp is expected. | ||
// But, any commit after the read timestamp should cause a conflict. | ||
if ts, has := o.commits[ro]; has && ts > txn.readTs { | ||
return true | ||
if committedTxn.ts <= txn.readTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here
If the
committedTxn.ts
is less thantxn.readTs
that implies that thecommittedTxn
finished before the current transaction started. We don't need to check for conflict in that case.
@@ -184,12 +177,50 @@ func (o *oracle) newCommitTs(txn *Txn) uint64 { | |||
ts = txn.commitTs | |||
} | |||
|
|||
for _, w := range txn.writes { | |||
o.commits[w] = ts // Update the commitTs. | |||
if ts > o.lastCleanupTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check? I'm wondering in what case can this check be false.
In non-managed mode, ts
will always be greater than the o.lastCleanupTs
since we always get increasing ts
.
In managed mode, the user could accidentally give an incorrect txn.commitTs
. In that case we should complain about it.
I think we should remove the if
and add y.AssertTruef(ts > o.lastCleanupTs, "ts: %d should not be less than lastCleanup: %d", ts, o.lastCleanup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that this is suspicious-looking. I think I was under the (mistaken) assumption that there could be case where the commit timestamp would not increase, but obviously that would break assumptions all over the place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If the new timestamp is smaller than the previous one, it would mess up the look-ups. We assume that newer values with be at a higher level (level 0, level 1, etc) with higher timestamps.
Let's just complain to the user in that case. This could potentially mean there's something seriously wrong with badger or whoever is using badger. We shouldn't quitely continue here.
reads []uint64 // contains fingerprints of keys read. | ||
writes []uint64 // contains fingerprints of keys written. | ||
update bool // update is used to conditionally keep track of reads. | ||
reads []uint64 // contains fingerprints of keys read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reads
is a slice here which means that if we keep reading the same key
again and again, it will be added to the reads
list which could cause OOM error. This can be fixed separately. I know it's not being introduced in this PR.
@@ -51,17 +49,21 @@ type oracle struct { | |||
readMark *y.WaterMark // Used by DB. | |||
|
|||
// commits stores a key fingerprint and latest commit counter for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be updated.
@@ -172,6 +160,11 @@ func (o *oracle) newCommitTs(txn *Txn) uint64 { | |||
return 0 | |||
} | |||
|
|||
if !o.isManaged { | |||
o.doneRead(txn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the o.doneRead
call from here because txn.Discard()
will be called for every transaction and the discard
method will call o.doneRead(...)
.
I think we can also get rid of the doneRead
variable from the txn
struct. The only reason it was needed was because we were calling o.doneRead()
at multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to take advantage of the fact we have acquired the lock to do the clean up. For that to work we need to mark the transaction as done reading first.
maxReadTs = o.readMark.DoneUntil() | ||
} | ||
|
||
if maxReadTs <= o.lastCleanupTs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how we could end up in this condition. o.lastCleanupTs
should always be less than the maxReadTs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The <
part is just defensive programming. I can replace that with an assert if you prefer.
The ==
part is an optimization: do not run clean up if the maxReadTs
(which is the read timestamp of the oldest transaction that is still in flight) has not increased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The < part is just defensive programming. I can replace that with an assert if you prefer.
I understand your point but I think we should complain here with a y.Assert
.
The == part is an optimization: do not run clean up if the maxReadTs (which is the read timestamp of the oldest transaction that is still in flight) has not increased.
Oh, yes. That makes sense. Thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue was marked as stale and no activity has occurred since then, therefore it will now be closed. Please, reopen if the issue is still relevant. |
#1275 contains an updated version of this PR. |
In the current implementation, if you happen to always have at least one write transaction open the memory usage of the transaction oracle is unbounded. It is actually relatively easy to hit when batch importing data. If you have more than one
WriteBatch
active during the import the transaction oracle will never be cleaned up.This is a RFC on an approach to fix this. The core idea is to:
oracle.commit
map into one map per previously committed transaction; (this allows Go to release memory sooner than when performing deletes on a single map);oracle.newCommitTs
to do the cleanupI am assuming here that the number of committed-but-still-tracked transactions is small, which makes an implementation based on a simple slice reasonable. If that's not the case we will need some form of a sorted data-structure (i.e. a b-tree) here.
Comments welcome.
This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"