-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker lambda metrics #734
Conversation
- also add type dimension for harvester metric
cc @michaelwmcnamara @jacobwinch This is a first attempt at adding embedded metrics to the worker lambdas. The format of the The current state of this PR doesn't attempt to abstract the common functionality yet because I was faced with a strange error:
Did you come across this situation when creating your original PR? EDIT: I've tested again this morning with the same code and I now see the logs in kibana and the metrics in aws so maybe there was a temporal issue in the elk stack somewhere (or I was being impatient) |
"Timestamp" -> end.toEpochMilli, | ||
"CloudWatchMetrics" -> List(Map( | ||
"Namespace" -> s"Notifications/${env.stage}/workers", | ||
"Dimensions" -> List(List("platform")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waisingyiu @frankie297
Dimensions allow us to view metrics in greater granularity. For the sender lambdas, I wonder whether it could be useful to see the metric by breakingNews/other? E.g. we could get metrics:
ios + breakingNews
ios + other
android + breakingNews
android + other
Or, maybe just knowing the processing time for android vs ios will be enough. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be useful when we need to look at the performance of a particular breaking news notification? We may show metrics for breaking news only, and we could associate the metric with a particular breaking news notification based on the time. Please ignore me if cloudwatch metric should not be used for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @waisingyiu :) If I recall correctly, providing the notificationId as a metric dimension would incur additional cost (ref: https://docs.google.com/document/d/10AnOZ4MLjuTO7mXaySoVO2SwmroLmns4gGmsfY7egmw/edit#heading=h.ajh00ba9hm68). I think if we needed to analyse the performance of a specific notificationId we'd need to rely on our logs+kibana dashboard (I could've misinterpreted this though!).
For the metrics, I was hoping to generate a minimum (static) subset of dimensions that would allow us to analyse performance/trends. Based on your response I think there would be value in providing an additional dimension for the sender lambda metrics:
platform
: ios or androidtype
: breakingNews or other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we needed to analyse the performance of a specific notificationId we'd need to rely on our logs+kibana dashboard (I could've misinterpreted this though!).
I think this is the correct approach 👍
I also agree that adding platform
and type
as dimensions would give us some extra benefits without increasing costs too much.
I don't remember facing this problem. We'd expect all Lambda logs to show up in ELK pretty quickly, certainly within a minute or two.
Let's keep an eye on things when this is merged and investigate further if it happens again! |
"harvester.notificationProcessingEndTime.string" -> end.toString, | ||
), "Finished processing notification event") | ||
) | ||
records.foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the harvester batchSize is 1. I think if we increased this then logging in the finally
block won't provide us with accurate metrics. Not something that I think I need to address now, just a consideration for the future
I think for now I'd like to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great work! I have just two queries:
-
Will the
app notifications
dashboard in ELK continue to work after this PR? -
I notice that the
startTime
come from the attributessentTimestamp
of the event record rather than being set as the start of the function. Does AWS populate this attribute, or the service upstream? What time exactly is it?
Happy to approve. Thanks David.
Thanks! The About the Hope that makes sense! |
What does this change?
This adds metrics for the worker lambdas (the harvester and the senders) to give us a longer-term view of the performance of these services.
The change has been deployed to CODE and the metrics have been added to the notifications dashboard:
Harvester:
^ dimension of
type
allows us to view processingTime metric separately forbreakingNews
notifications andother
types.Sender lambdas:
^ dimensions of
platform
andtype
allow us to view processingTime metrics for ios and android, separated further bybreakingNews
notifications orother
types.