Conversation
|
@jmozah If this PR is ready for review, please elaborate on what exactly is the bug and how it is fixed in the PR description. |
| // add new entry to gc index ONLY if it is not present in pinIndex | ||
| ok, err := db.pinIndex.Has(item) | ||
| if err != nil { | ||
| fmt.Println("mode_get: Not adding in gcIndex", hex.EncodeToString(item.Address)) |
| // add new entry to gc index ONLY if it is not present in pinIndex | ||
| ok, err := db.pinIndex.Has(item) | ||
| if err != nil { | ||
| fmt.Println("mode_put: Not adding in gcIndex", hex.EncodeToString(item.Address)) |
|
|
||
| ok, err := db.pinIndex.Has(item) | ||
| if err != nil { | ||
| fmt.Println("mode_set: Not adding in gcIndex", hex.EncodeToString(item.Address)) |
| // Add in gcIndex only if this chunk is not pinned | ||
| ok, err := db.pinIndex.Has(item) | ||
| if err != nil { | ||
| fmt.Println("mode_set: Not adding in gcIndex", hex.EncodeToString(item.Address)) |
| yes, err := db.pinIndex.Has(item) | ||
| if err == nil && yes { | ||
| fmt.Println("GCing pinned item ", hex.EncodeToString(item.Address)) | ||
| } |
There was a problem hiding this comment.
This can be removed as it is used only to print a line.
|
Update: Just now @santicomp2014 confirmed that this works in there setup. |
| pinnedChunks := make([]swarm.Address, 0) | ||
|
|
||
| // upload random chunks above db capacity to see if chunks are still pinned | ||
| for i := 0; i < 2000; i++ { |
There was a problem hiding this comment.
if the capacity is 100, why do you need to insert 6000? and every time the CI runs?
There was a problem hiding this comment.
That's the test case,... The issue occurs when we bombard chunks greater that dbCapacity ..more than once... 2000 is a big number for CI, May be we can reduce it to 200.
| } | ||
| gotChunk, err := db.Get(context.Background(), storage.ModeGetRequest, swarm.NewAddress(outItem.Address)) | ||
| if err != nil { | ||
| fmt.Println("Pinned chunk missing ", addr) |
There was a problem hiding this comment.
can you please remove all fmt.Println references? If there's a reason to annotate an error please use t.Fatalf("something has occured %v" , err)
|
|
||
| } | ||
|
|
||
| func addRandomChunks(t *testing.T, count int, db *DB) { |
There was a problem hiding this comment.
why don't you use this in all of the tests above?
There was a problem hiding this comment.
Because other tests dont need this.
There was a problem hiding this comment.
i'm seeing a lot of Puts and Sets above this, if this test helper would have a few parameters, it could save you all of the manual Puts and Sets
You don't need to please me @jmozah. I want to know what I'm approving. Having a broad problem space that is difficult to define makes such PRs very problematic. Please note now that you're adding extra DB operations for every operation that we're doing (extra It is also unclear to me how is it, that chunks that are pinned are not cleared out of the gc index when the exclude trigger is hit within the garbage collection run. If I would put my money on something it would be that something there is broken (either some |
That was a joke. :-)
That not true, sorry. if you don't understand the fix, I can shepherd you through the problem and the fix.
Actually they will get cleared from |
There was a problem hiding this comment.
@jmozah I understand the fix perfectly fine. You're claiming that the chunks should not be in the garbage collection index if they are in the pin index, but that is not correct according to the old design. That is why the gcExcludeIndex index was introduced, so that you could do a gcIndex ¬ gcExcludeIndex then GC. I accept the fact that there might be some data race or some other problem and that it could be theoretically that some chunks might not be in the exclude index in the first place, which might make them to get GC. But the nature of the fix does not expose where is this consistency problem in localstore.
The changes that you've done in this PR basically mean that gcExcludeIndex is no longer necessary. You can try to remove it completely and see if the problem persists, since in that case we would be doing a significant amount of redundant operations.
Did you check that the chunks that are missing also exist in the exclude gc index? Their presence in the pin index before the fix is one thing, but their protection from being garbage collected is secured with their presence in the exclude index.
Another question about the tests that you've introduced here: do they fail persistently without the fix? do they flake? if yes by which ratio?
|
|
||
| } | ||
|
|
||
| func addRandomChunks(t *testing.T, count int, db *DB) { |
There was a problem hiding this comment.
i'm seeing a lot of Puts and Sets above this, if this test helper would have a few parameters, it could save you all of the manual Puts and Sets
|
|
||
| _, err = db.Get(context.Background(), storage.ModeGetRequest, rch.Address()) | ||
| if err != nil { | ||
| fmt.Println("Pinned chunk missing ", rch.Address()) |
There was a problem hiding this comment.
Again and again we ask to remove fmt.Println from the codebase. Can we please agree that you review your own code and get rid of such trivial mistakes that consistently come up in every review?
There was a problem hiding this comment.
oops... my bad.. will remove it
No. That is how we designed. I would be very happy if you could help me remember what the design was. You can look at the original places we did this. but left out in few other places, which is biting us now.
As said above,
They always pass, as i said i could not reproduce this in unit tests cases here. The same scenario creates issue in @santicomp2014's setup. So i added it as a precaution. |
|
So then the problem is clear no? If pinning the chunk is two-phased operation then there will always be a possibility for a data race where a GC runs between the upload phase and the pinning phase. Solving this thus must be an atomic operation where we can say that the chunk upload API can take a header saying to pin the chunk immediately on upload, and then create a And to emphasize, there is still a possibility for a data race even with this fix, where the chunk is added to the gc without it being in the pin index, this can happen when the chunk is accessed before it is pinned (you are relying on the fact that it is in the pin index when you |
|
I'm also willing to bet a few bucks that the problem is still there, and that it will show itself again when you try to upload and pin a large collection (let's say a large directory with a lot of files, that will take a bit longer to traverse for pinning) |
Pinning is a two phased operation always. You can upload using a API and later call a another API to pin. That's the design at that time (although i wont do it if i do it in bee). That does not mean there will be a race condition. That means that there can be chunks that are going to be GC'd and we need to remove them from gcIndex.
That's precisely the reason of gcExcludeIndex. You still don't seem to get the gcExcludeIndex sole existence.
Can you explain the data race? where is the data race? without knowing here is the data race how are you claiming that it is narrowing the possibility.
I am also of the same impression that the problem might still be there. until we know the issue exactly, we cannot say for sure that it is fixed. Nor i am claiming that this will fix it 100%. This is a guard fix that needs to be there logically as per the design and it is not here. This PR adds it. Got it. |
Problem: Sometime a pinned chunk gets GCd and is not available in
localstore. This was not simulatable. It once occurred in FairData setup and now in @santicomp2014 's setup in global pinning.Fix: Even though we don't know the exact scenario when this happens, a code walk through found out that we are not checking for
pinIndexpresence when a chunk is added ingcIndex. We found 4 such places that are missed out and fixing it in this PR.