-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:splitstore:Configure max space used by hotstore and GC makes best effort to respect #10391
Conversation
e6814fb
to
a994153
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm, ok, looks sane.
Potential concern: can we end up continuously moving gc in every compaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea makes a ton of sense. Couple of comments
Would be great to hook-up those sizes / size estimates to the metrics endpoint somehow - https://github.com/filecoin-project/lotus/blob/master/metrics/metrics.go#L160
log.Warnf("Attention! Estimated moving GC size %d is not within safety buffer %d of target max %d, performing aggressive online GC to attempt to bring hotstore size down safely", copySizeApprox, targetBuffer, s.cfg.HotstoreMaxSpaceTarget) | ||
log.Warn("If problem continues you can 1) temporarily allocate more disk space to hotstore and 2) reflect in HotstoreMaxSpaceTarget OR trigger manual move with `lotus chain prune hot-moving`") | ||
log.Warn("If problem continues and you do not have any more disk space you can run continue to manually trigger online GC at aggressive thresholds (< 0.01) with `lotus chain prune hot`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth using the alert system to raise a (generally much more visible / loud) alert.
See e.g. https://github.com/filecoin-project/lotus/blob/master/node/modules/alerts.go
(alerts are for things which tell the user that something either is, or is about to be severely broken, and they need to take action)
Reasonable concern, this is definitely possible. In my opinion this is more of a feature than a bug however. Misconfigurations are certainly possible but I would say we don't expose any worse footgun than allowing FullGCFreq = 1 with current config. What I've come to believe is there are only two real problems users face with splitstore: disk overflow and chain out of sync. The only risk with too much moving GC when it is done without overflowing disk is that it gets the chain out of sync. But I think we can address chain out of sync additionally in #10392 or an extension after fixing a solution to disk overflow. |
This is how I configure my experimental splitstore nodes, anything besides "do moving GC every time" makes no sense from an operational perspective: you want your dataset to always be as small as possible. |
While I agree that "do moving GC every time" often makes sense from an operational perspective I think this is overstated. Moving GC is objectively more expensive than online in terms of memory, compute and disk io. Paying the cost on every compaction to keep datastore down is likely often a good idea but keeping more disk around and then doing this in batches is a reasonable point in the tradeoff space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, 2 small nits, but other than that looks good.
If it's not perfect bugs can be fixed in the RC process
Thanks for the make gen assist @magik6k !! |
Related Issues
#10388
Proposed Changes
Additional Info
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps