-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis brain storage of incident data still isn't working #13
Comments
Just found another issue with the |
I think I have a full-on fix for #14 ready to go - doing final testing now. If it works, this issue won't be needed anymore, as the cleanup will be handled differently. |
fixes victorops#14, closes victorops#13 - Added new environment variable (HUBOT_VICTOROPS_KEEP) to control how long incidents are stored in the brain - Moved incident list to _private.VO_INCIDENTS, instead of as children of _private - Now using .getTime() to compare timestamps for incident expiry - Now using timestamps within the JSON data for each alert, rather than maintaining a separate list (VO_INCIDENT_KEYS) - Bot will log how many days incidents are to be kept on initialization
This one's going to be long, but hopefully, I've got a solution. I was digging around in the depths of redis to figure out how best to store control call data, and I discovered that there are a couple of problems with the current handling of the incident list in the redis brain.
When I looked at the redis data block, I saw the following:
(Except there were lots more incidents tracked). So this means that hubot doesn't use redis at the top level, but rather creates a json block stored as the value for the
hubot:storage
key, and@robot.brain.set
is setting values under_private
within that JSON block.Notice the odd thing there?
VO_INCIDENT_KEYS
is blank. That's not a redaction, it's the observed behavior. And it stayed that way. I ranredis-cli monitor
to keep an eye on it, and through every update, that value stayed a blank list. Commenting out the call tocleanupBrain
"fixed" that, so I figured I'd do some more digging into that method to see what else might be odd there.Here's the current implementation (in case it gets lost to the sands of time):
The
new Date(item.timestamp)
creates a date object, which thegetDate()
method converts to an integer representing the day of the month (1-31) (reference). This means a couple of things.new Date
on the other side of the comparison is being compared to a raw integer. This isn't going to go well. Because of this, every timecleanupBrain()
is called, the entirety ofVO_INCIDENT_KEYS
gets filtered out, since the integer is always less than the Date object.new Date().getDate()
doesn't work. The.getDate()
on the left side means that the +1 isn't adding a day, it's adding a day-of-month, so any items created on the 31st of a given month will never be removed (there will never be a 32nd of the month), and objects created on the 30th won't qualify for removal until the 31st of the following month.The fix for this is to use the
.getTime()
method. This converts the date to an epoch time instead, so instead of adding 1 to stand in for 24 hours, we add 86400000 (milliseconds in a day). This works out to something like:The second issue I found is one that we actually just removed the code for, without actually fixing it. It used to look like this:
Now that I've done some more digging into what's going on in the redis brain, I can see what was going on in the
#@robot.brain.remove item.name
line. Up inreceiveWS
, under theTIMELINE_LIST_REPLY_MESSAGE
block, the incident is being written two places:... and ...
The first call, above, adds the full alert JSON directly under
_private
, while the second call manipulates the contents list stored (at the same level) asVO_INCIDENT_KEYS
.This means that, by dropping the
@robot.brain.remove item.name
line, we rendered the bot incapable of cleaning up the full JSON of any alert it has received, while the list that was intended to be used to prune the larger body of data is being constantly purged (meaning that there is no way short of manual manipulation to remove those entries). This means that the size of the redis brain has the potential to get quite large over time, as no old entries will ever be removed.Based on this, it looks like we really do need the
@robot.brain.remove
line, so I uncommented it, and now that we've got everything else going for us, I think we're in good shape.I'm working on testing this at the moment to make sure that it really does work now, before I submit a PR. I'll be able to change the 'timeout' value from 1 day down to a few hours tomorrow morning to see if my test alert that I just created gets dropped from the brain.
The text was updated successfully, but these errors were encountered: