-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlockService: AddBlock & AddBlocks write twice to datastore when used with Bitswap #7956
Comments
This is why we end up calling But I'd really like to get rid of the second write as it's kind of racy and we can end up calling:
|
But yeah. I think the concern is that if we don't write the block there, the caller may also not have written the block yet and then we'll think we have the block when we don't. |
Is this something that gets resolved automatically when we finally convert the blockstore to be keyed on multihashes? Is this important enough to bother worrying about before we do that conversion? |
Is this something that gets resolved automatically when we finally convert the blockstore to be keyed on multihashes?
No, unfortunately.
Is this important enough to bother worrying about before we do that conversion?
Probably not, but it probably won't conflict either.
|
From what I'm reading here the issue boils down to the undocumented According to that answer either As for the original performance consideration I think we need to clarify what we're trying to avoid: "write twice to datastore" is not the same as calling both Blockstore's |
(A case might be argued in the |
2022-03-11 conversation: this is a performance issue. We need to do some more triage to understand if we're doubling our writes to disk. It's less bad if we're doing two checks. @schomatis analysis to date is that we're just doing two checks, which lowers the priority. |
Next step: need more information on which method to fix. Someone will need to dig into the code more and make the plan. |
Complementary food for thought: why does an exchange (bitswap) write blocks on disk to begin with? It looks to me that this is the root cause of that problem. Why is that not done only in the blockservice? What if you just want to retrieve blocks without persisting them? I came to this issue for a different reason, that seems to tie to the problem you all describe. My problem is that I'm trying to write a custom data pipeline where I re-route blocks pinned to a CAR file. In that context I need to re-route the blocks pulled from the exchange, but once bitswap is constructed in go-ipfs I don't have access to the blockstore and I can't override that. It would be much simpler and more composable if the exchange would just exchange and not bother with writing on disk. |
@MichaelMure I've been deep into bitswap's internal lately and AFAIT it's because it want to cache blocks you just downloaded. |
Sure, but why would that be a bitswap responsibility? The write happen in
Also, this comment is pretty telling: https://github.com/ipfs/go-bitswap/blob/84973686518be4831ee86e7b6b3f1b9834d0ce97/bitswap.go#L475-L478 |
It's software we can make stuff up. It just look like an architectural choice that stuck. |
Actually I would go even further. The only other place the blockstore is used in bitswap is to record stats (is the incoming blocks new or already present locally?). I'd be very tempted to remove that and drop the blockstore entirely from bitswap. |
Any rough timeframe on this? Please ping me on that PR. |
A month ? Two maybe ? |
That's a bit long for my need. I might patch bitswap and blockservice to remove the blockstore in bitswap. Would that be of interest for a PR? Edit: on the client part of the code |
Yes if you do that before me 100% send a PR. thx |
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
@Jorropo see ipfs/go-bitswap#571 and ipfs/go-blockservice#92. I didn't remove the stats that use the blockstore, I'll let you judge if that make sense during your client/server split. |
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956
This leave the responsibility and choice to do so to the caller, typically go-blockservice. This has several benefit: - untangle the code - allow to use an exchange as pure block retrieval - avoid double add Close ipfs/kubo#7956 This commit was moved from ipfs/go-bitswap@a052ec9
Version information:
current
Description:
Calling AddBlock or AddBlocks on the block service appears to write to the data store twice. The reason is that the blockservice calls Put here:
https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L146
And then calls exchange.HasBlock here:
https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L153
HasBlock eventually calls PutMany here:
https://github.com/ipfs/go-bitswap/blob/master/bitswap.go#L372
I produces a test to repro the issue (it's pretty ugly but should demonstrate that I'm not just reading it wrong): https://github.com/ipfs/go-blockservice/blob/test/prove_double_write/test/blocks_test.go#L108
I believe you could fix it by changing https://github.com/ipfs/go-bitswap/blob/master/bitswap.go#L371 from:
to
but I dunno what side effects that might cause.
I just was reading through the code trying to figure some stuff out about designing new ipld-prime interfaces and I saw that. I dunno what the penalty is for double writing but when adding a big DAG I can imagine it might be large.
The text was updated successfully, but these errors were encountered: