Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Clean up transaction oracle as we go #1198

Closed
wants to merge 1 commit into from

Conversation

damz
Copy link
Contributor

@damz damz commented Jan 15, 2020

In the current implementation, if you happen to always have at least one write transaction open the memory usage of the transaction oracle is unbounded. It is actually relatively easy to hit when batch importing data. If you have more than one WriteBatch active during the import the transaction oracle will never be cleaned up.

This is a RFC on an approach to fix this. The core idea is to:

  • Avoid increasing contention on purely read transactions; so only clean up the transaction oracle when write transactions are committed even if technically we could free memory sooner;
  • Split the big oracle.commit map into one map per previously committed transaction; (this allows Go to release memory sooner than when performing deletes on a single map);
  • Take advantage of the fact that we have acquired the oracle lock in oracle.newCommitTs to do the cleanup

I am assuming here that the number of committed-but-still-tracked transactions is small, which makes an implementation based on a simple slice reasonable. If that's not the case we will need some form of a sorted data-structure (i.e. a b-tree) here.

Comments welcome.


This change is Reviewable

@damz damz requested a review from a team January 15, 2020 22:02
@damz damz force-pushed the pr/oracle-cleanup branch from e8b5093 to e8eba91 Compare January 16, 2020 04:01
@coveralls
Copy link

coveralls commented Jan 16, 2020

Coverage Status

Coverage decreased (-0.1%) to 69.874% when pulling e8eba91 on damz:pr/oracle-cleanup into 3747be5 on dgraph-io:master.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @damz . I've added some comments to better understand the code.

// A commit at the read timestamp is expected.
// But, any commit after the read timestamp should cause a conflict.
if ts, has := o.commits[ro]; has && ts > txn.readTs {
return true
if committedTxn.ts <= txn.readTs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here

If the committedTxn.ts is less than txn.readTs that implies that the committedTxn finished before the current transaction started. We don't need to check for conflict in that case.

@@ -184,12 +177,50 @@ func (o *oracle) newCommitTs(txn *Txn) uint64 {
ts = txn.commitTs
}

for _, w := range txn.writes {
o.commits[w] = ts // Update the commitTs.
if ts > o.lastCleanupTs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check? I'm wondering in what case can this check be false.

In non-managed mode, ts will always be greater than the o.lastCleanupTs since we always get increasing ts.
In managed mode, the user could accidentally give an incorrect txn.commitTs. In that case we should complain about it.

I think we should remove the if and add y.AssertTruef(ts > o.lastCleanupTs, "ts: %d should not be less than lastCleanup: %d", ts, o.lastCleanup)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that this is suspicious-looking. I think I was under the (mistaken) assumption that there could be case where the commit timestamp would not increase, but obviously that would break assumptions all over the place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the new timestamp is smaller than the previous one, it would mess up the look-ups. We assume that newer values with be at a higher level (level 0, level 1, etc) with higher timestamps.

Let's just complain to the user in that case. This could potentially mean there's something seriously wrong with badger or whoever is using badger. We shouldn't quitely continue here.

reads []uint64 // contains fingerprints of keys read.
writes []uint64 // contains fingerprints of keys written.
update bool // update is used to conditionally keep track of reads.
reads []uint64 // contains fingerprints of keys read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reads is a slice here which means that if we keep reading the same key again and again, it will be added to the reads list which could cause OOM error. This can be fixed separately. I know it's not being introduced in this PR.

@@ -51,17 +49,21 @@ type oracle struct {
readMark *y.WaterMark // Used by DB.

// commits stores a key fingerprint and latest commit counter for it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated.

@@ -172,6 +160,11 @@ func (o *oracle) newCommitTs(txn *Txn) uint64 {
return 0
}

if !o.isManaged {
o.doneRead(txn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the o.doneRead call from here because txn.Discard() will be called for every transaction and the discard method will call o.doneRead(...).

I think we can also get rid of the doneRead variable from the txn struct. The only reason it was needed was because we were calling o.doneRead() at multiple places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to take advantage of the fact we have acquired the lock to do the clean up. For that to work we need to mark the transaction as done reading first.

maxReadTs = o.readMark.DoneUntil()
}

if maxReadTs <= o.lastCleanupTs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how we could end up in this condition. o.lastCleanupTs should always be less than the maxReadTs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The < part is just defensive programming. I can replace that with an assert if you prefer.

The == part is an optimization: do not run clean up if the maxReadTs (which is the read timestamp of the oldest transaction that is still in flight) has not increased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The < part is just defensive programming. I can replace that with an assert if you prefer.

I understand your point but I think we should complain here with a y.Assert.

The == part is an optimization: do not run clean up if the maxReadTs (which is the read timestamp of the oldest transaction that is still in flight) has not increased.

Oh, yes. That makes sense. Thanks.

@stale
Copy link

stale bot commented Mar 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Mar 1, 2020
@jarifibrahim jarifibrahim removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Mar 6, 2020
muXxer added a commit to muXxer/badger that referenced this pull request Mar 24, 2020
muXxer added a commit to muXxer/badger that referenced this pull request Mar 24, 2020
muXxer added a commit to muXxer/badger that referenced this pull request Mar 24, 2020
@stale
Copy link

stale bot commented Mar 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Mar 31, 2020
@stale
Copy link

stale bot commented Apr 7, 2020

This issue was marked as stale and no activity has occurred since then, therefore it will now be closed. Please, reopen if the issue is still relevant.

@stale stale bot closed this Apr 7, 2020
@jarifibrahim
Copy link
Contributor

#1275 contains an updated version of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/stale The issue hasn't had activity for a while and it's marked for closing.
Development

Successfully merging this pull request may close these issues.

3 participants