-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
As per the faster recovery benefits discussed here, it is desirable to find the right size for your particular setup. However, unless I am not looking in the right places, I haven't been able to find a good way to figure out the right size of the translog from the information that is currently exposed. I think it would be helpful if the translog stats were extended to also expose the age of oldest entry to aid with the monitoring and sizing of the translog (index.translog.retention.size).
For example, suppose I typically expect my nodes to go down for ~30 minutes (i.e. maintenance/upgrades/patches). I would want to make sure that my translog is sized properly to hold operations for at least that long (probably more as buffer).
If this feature were present then I might see that my shards have their translog's sized at 512mb and the oldest entry is 20m. I might then conclude that doubling my translog will retain enough entries for 40 minutes, covering my expected downtime.
I realize for the scenario I have described, you could do something like setting index.translog.retention.age=40m and index.translog.retention.size=$big_number but it would be nice to take the guess work out of the equation (what is $big_number?). By adding this stat, you could monitor it over time and alarm on it.