-
Notifications
You must be signed in to change notification settings - Fork 833
Close TSDB and delete local data when TSDB is idle for a long time. #3491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close TSDB and delete local data when TSDB is idle for a long time. #3491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good job! I did a first pass review and I will do a more accurate review once initial feedback has been discussed and/or addressed. I would also mark this feature as experimental in the "v1 guarantees" doc page: WDYT?
pkg/ingester/ingester_v2.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised this check has a problem: the lastUpdate
is not correct after an ingester restart. An option could be checking the timestamp against max(lastUpdate, TSDB max time)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When TSDB is opened, we set lastUpdate
to current time to avoid premature idle flush and close. This will only be a problem if ingesters never stay up for longer than configured idle timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of current time, we can set it to last sample time from TSDB when opening, as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When TSDB is opened, we set lastUpdate to current time to avoid premature idle flush and close.
Then we're protected 👍
Instead of current time, we can set it to last sample time from TSDB when opening, as you suggest.
Could make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
pkg/ingester/ingester_v2.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think would be great to have a couple of metrics (counters) on succeeded and failed closing idle TSDBs. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added metrics for number of idle-checks, and various check results. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Thanks @pstibrany for addressing my comments. LGTM 👍 |
What this PR does: This PR adds ability for ingester to close idle TSDB and delete local data when TSDB is idle for a long time (no data is appended to it). Closing/deletion only happens if head is empty (possibly thanks to previous force-compaction due to being idle), and all blocks were shipped.
This PR also takes care of hiding user metrics, while keeping the registry to avoid breaking the counters over all users.Support for metrics removal has been extracted to separate PR, #3511.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]