-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace invalidation from start to finish #710
Conversation
View Vercel preview at instant-www-js-e2e-tracer-jsv.vercel.app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
(when (instance? Throwable ret) | ||
(throw ret))))) | ||
p (promise)] | ||
(tracer/with-span! {:name "ws/send-json!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing the full time it takes to send the json to the client. Looking at honeycomb, the send lock is almost never locked when we get here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, perhaps it's because with grouped-queue, many of our messages are serialized per session. I wonder if we get something like set-presence and transact at the same time, if the lock will start to come in effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason we rarely wait on the lock is that calling Websockets/sendText
is very quick--undertow is probably just putting a message onto a queue. All of the waiting happens between when we call sendText
and when the complete
callback is called.
Right now we don't have a great way of seeing the full lifecycle of a transaction, from when the user calls
db.transact
to the point where the clients subscribed to that query re-render with new data. Our best metric right now is manually visiting the examples page and checking/unchecking tasks.This PR introduces end-to-end tracing that will allow us to track from when
transact!
finishes to when we finish delivering the final websocket :refresh message for the transaction. It's not quite the full latency that the user sees, but it's close.We use the transaction id to tie all of the spans to a single trace. It's one thing that we can track across multiple machines and know it will be the same on all machines. We generate our own parent trace-id and span-id from the tx-id and then create parent spans with that trace-id. That allows us to have spans from different machines all share the same parent.
It produces a trace diagram that looks like this:
All of the spans are just single points because the parent spans have no way of knowing when the child spans complete.
We only log one out of every 10,000 transactions (configurable via instant-config) and we add an extra
:entropy
attr that will encourage refinery to forward our spans.