Avoid deletion of blocks which are not shipped #3346

codesome · 2020-10-14T11:58:31Z

This is for blocks storage ingesters

Which issue(s) this PR fixes:
Fixes #2868

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2020-10-14T12:02:58Z

pkg/ingester/ingester_v2.go

+		// If there is any issue with the shipper, we should be conservative and not delete anything.
+		level.Error(util.Logger).Log("msg", "failed to read shipper meta during deletion of blocks", "user", u.userID, "err", err)
+		return nil


Should we have a metric for this to alert on? Because if this issue persists, queries could skip some data and the disk can run out of space.

In our alerts we do alert on disk running out of space. I'd say that + error message is enough.

Signed-off-by: Ganesh Vernekar <[email protected]>

roidelapluie · 2020-10-14T16:44:56Z

Can we invert this and make deleteable shipperMeta.Uploaded ? Or would that be 'too aggressive'?

pracucci

Thanks @codesome for working on this! A couple of things:

How difficult would be adding a unit test on this?
I was checking in one of our production clusters if thanos.shipper.json actually contains all shipped blocks and I've found an ingester with some blocks missing in the shipper meta file (but successfully shipped to the bucket). I'm investigating the root cause, but this could be a blocker for this PR

CHANGELOG.md

pkg/ingester/ingester_v2.go

pracucci · 2020-10-15T10:24:32Z

I was checking in one of our production clusters if thanos.shipper.json actually contains all shipped blocks and I've found an ingester with some blocks missing in the shipper meta file (but successfully shipped to the bucket). I'm investigating the root cause, but this could be a blocker for this PR

This could fix it: thanos-io/thanos#3321

codesome · 2020-10-15T12:17:39Z

Can we invert this and make deleteable shipperMeta.Uploaded ? Or would that be 'too aggressive'?

Not super sure about how querying is setup, but queriers depend on ingesters for some data for upto some hours and there might be gaps in queries if we aggressively delete blocks like that.

codesome · 2020-10-15T12:18:25Z

How difficult would be adding a unit test on this?

I will check, might need some mocking of shipper

pracucci · 2020-10-15T12:35:11Z

Can we invert this and make deleteable shipperMeta.Uploaded ? Or would that be 'too aggressive'?

Not super sure about how querying is setup, but queriers depend on ingesters for some data for upto some hours and there might be gaps in queries if we aggressively delete blocks like that.

We should never delete a block until the (configured) retention period is reached. What we're doing in this PR is adding an extra protection to not delete a block until shipped (eg. authentication or networking issues).

pracucci · 2020-10-15T12:36:17Z

How difficult would be adding a unit test on this?

I will check, might need some mocking of shipper

A couple of things:

In unit tests we already have a shipperMock
If you run the ingester with the local filesystem as backend, you should be able to actually run the real shipper

Signed-off-by: Ganesh Vernekar <[email protected]>

…pped-blocks Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2020-10-19T14:18:14Z

@pracucci I have added a unit test now. I think we have to first get thanos-io/thanos#3321 vendored and then go with this, right?

pracucci · 2020-10-19T16:55:35Z

@pracucci I have added a unit test now. I think we have to first get thanos-io/thanos#3321 vendored and then go with this, right?

@codesome Correct. I opened a PR #3363 for the Thanos upgrade.

pracucci

LGTM, thanks! But let's wait for #3363 before merging.

pstibrany

LGTM, left non-blocking nits.

pkg/ingester/ingester_v2.go

Signed-off-by: Ganesh Vernekar <[email protected]>

pracucci

Thanos upgrade PR has been merged, so we can merge this PR too.

…rgid-ctx * 'master' of github.com:cortexproject/cortex: Enforce integration tests default flags config to never be overwritten (cortexproject#3370) Avoid deletion of blocks which are not shipped (cortexproject#3346) Upgrade Thanos to latest master (cortexproject#3363) Migrate CircleCI workflows to GitHub Actions (2/3) (cortexproject#3341) Remove comments that doesn't seem right (cortexproject#3361) add ingester interface (cortexproject#3352) Fail fast an ingester if unable to load existing TSDBs (cortexproject#3354) Fixed Gossip memberlist members joining when addresses are configured using DNS-based service discovery (cortexproject#3360) Export distributor method to get ingester replication set (cortexproject#3356) Correct link for Block Storage reference (cortexproject#3234) Added section on Cleaner. (cortexproject#3327) Update prometheus vendor to master (cortexproject#3345) adding GHA CI env variable check (cortexproject#3351) Add ingesters shuffle sharding support on the read path (cortexproject#3252)

Avoid deletion of blocks which are not shipped

229799c

Signed-off-by: Ganesh Vernekar <[email protected]>

pull-request-size bot added the size/M label Oct 14, 2020

Add CHANGELOG entry

cb82275

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome commented Oct 14, 2020

View reviewed changes

Ganesh Vernekar added 2 commits October 14, 2020 18:32

Use the DefaultBlocksToDelete from TSDB

c46440d

Signed-off-by: Ganesh Vernekar <[email protected]>

Fix lint and tests

aaad629

Signed-off-by: Ganesh Vernekar <[email protected]>

pracucci reviewed Oct 15, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

pkg/ingester/ingester_v2.go Outdated Show resolved Hide resolved

pkg/ingester/ingester_v2.go Outdated Show resolved Hide resolved

Ganesh Vernekar added 2 commits October 19, 2020 19:45

Add unit test and fix review comments

c2f7744

Signed-off-by: Ganesh Vernekar <[email protected]>

Merge remote-tracking branch 'upstream/master' into dont-delete-unshi…

2bc28f3

…pped-blocks Signed-off-by: Ganesh Vernekar <[email protected]>

pull-request-size bot added size/L and removed size/M labels Oct 19, 2020

pracucci approved these changes Oct 19, 2020

View reviewed changes

pracucci requested a review from pstibrany October 19, 2020 17:02

pstibrany approved these changes Oct 20, 2020

View reviewed changes

pkg/ingester/ingester_v2.go Outdated Show resolved Hide resolved

pkg/ingester/ingester_v2.go Outdated Show resolved Hide resolved

Fix review comments

8de34e2

Signed-off-by: Ganesh Vernekar <[email protected]>

pracucci approved these changes Oct 20, 2020

View reviewed changes

pracucci merged commit 40d8240 into cortexproject:master Oct 20, 2020

Avoid deletion of blocks which are not shipped #3346

Avoid deletion of blocks which are not shipped #3346

Uh oh!

Conversation

codesome commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codesome Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

pstibrany Oct 20, 2020

Choose a reason for hiding this comment

Uh oh!

roidelapluie commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pracucci commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codesome commented Oct 15, 2020

Uh oh!

codesome commented Oct 15, 2020

Uh oh!

pracucci commented Oct 15, 2020

Uh oh!

pracucci commented Oct 15, 2020

Uh oh!

codesome commented Oct 19, 2020

Uh oh!

pracucci commented Oct 19, 2020

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

pstibrany left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codesome commented Oct 14, 2020 •

edited

Loading

roidelapluie commented Oct 14, 2020 •

edited

Loading

pracucci commented Oct 15, 2020 •

edited

Loading