Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace invalidation from start to finish #710

Merged
merged 8 commits into from
Jan 14, 2025
Merged

Trace invalidation from start to finish #710

merged 8 commits into from
Jan 14, 2025

Conversation

dwwoelfel
Copy link
Contributor

Right now we don't have a great way of seeing the full lifecycle of a transaction, from when the user calls db.transact to the point where the clients subscribed to that query re-render with new data. Our best metric right now is manually visiting the examples page and checking/unchecking tasks.

This PR introduces end-to-end tracing that will allow us to track from when transact! finishes to when we finish delivering the final websocket :refresh message for the transaction. It's not quite the full latency that the user sees, but it's close.

We use the transaction id to tie all of the spans to a single trace. It's one thing that we can track across multiple machines and know it will be the same on all machines. We generate our own parent trace-id and span-id from the tx-id and then create parent spans with that trace-id. That allows us to have spans from different machines all share the same parent.

It produces a trace diagram that looks like this:

image

All of the spans are just single points because the parent spans have no way of knowing when the child spans complete.

We only log one out of every 10,000 transactions (configurable via instant-config) and we add an extra :entropy attr that will encourage refinery to forward our spans.

Copy link

View Vercel preview at instant-www-js-e2e-tracer-jsv.vercel.app.

Copy link
Contributor

@stopachka stopachka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

(when (instance? Throwable ret)
(throw ret)))))
p (promise)]
(tracer/with-span! {:name "ws/send-json!"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing the full time it takes to send the json to the client. Looking at honeycomb, the send lock is almost never locked when we get here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perhaps it's because with grouped-queue, many of our messages are serialized per session. I wonder if we get something like set-presence and transact at the same time, if the lock will start to come in effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we rarely wait on the lock is that calling Websockets/sendText is very quick--undertow is probably just putting a message onto a queue. All of the waiting happens between when we call sendText and when the complete callback is called.

@dwwoelfel dwwoelfel merged commit 1f61651 into main Jan 14, 2025
27 checks passed
@dwwoelfel dwwoelfel deleted the e2e-tracer branch January 14, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants