Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access pattern observation in keyspace ("pagetrace") #10275

Open
jcsp opened this issue Jan 3, 2025 · 7 comments
Open

Access pattern observation in keyspace ("pagetrace") #10275

jcsp opened this issue Jan 3, 2025 · 7 comments
Assignees
Labels
a/observability Area: related to observability c/storage/pageserver Component: storage: pageserver t/incident Issue type: incident in our service

Comments

@jcsp
Copy link
Collaborator

jcsp commented Jan 3, 2025

In INC-362 we saw strong signals that the client (compute) was getting something wrong with caching: we suspect it is re-requesting the same data repeatedly, but can't prove it.

To diagnose issues like this, we need an ability to get a raw dump of the keys touched by getpage requests.

Candidate impls:

  • Something that does a tcpdump and parses it
  • Something built into the pageserver that dumps out data for a specific tenant into some local file over some time period, for later retrieval and analysis.
@jcsp jcsp changed the title Access pattern observation in keyspace Access pattern observation in keyspace ("pagetrace") Jan 3, 2025
@erikgrinaker
Copy link
Contributor

In past systems, we had an API endpoint that would allow us to temporarily enable and output debug/trace logging at runtime for specific source code files with regex-filtering. So we could e.g. enable trace-logging for the getpage handler and regex-filter by tenant/shard to dump keys for 30 seconds.

Might be a simple and general solution, if our logging/tracing library supports it.

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 3, 2025

Yeah, this should evolve into something with an API for toggling tracing per tenant (we may even have an issue for that somewhere). However, because we use grafana for logs, and that doesn't cope well with passing around big dumps, if we want to get some dump of like 100K keys to then visualize somehow, we'll probably need to output those some other way (or embrace some other system for recording results that works better than Loki)

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 3, 2025

Aside: my favorite one of these was EMC isilon, where you could subscribe to performance metrics on a particular directory in a filesystem, good times.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jan 3, 2025

Yeah, these debug events would be emitted via the API endpoint response as a stream, not via the regular log sink.

@erikgrinaker erikgrinaker self-assigned this Jan 6, 2025
@erikgrinaker erikgrinaker added c/storage/pageserver Component: storage: pageserver a/observability Area: related to observability t/incident Issue type: incident in our service labels Jan 6, 2025
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jan 6, 2025

The tracing crate does indeed allow arbitrary subscriptions to the event stream. I propose we add an API route /trace which subscribes to the event stream and outputs them to the response body. Example parameters:

  • level: log level to emit (default DEBUG?).
  • seconds: number of seconds to dump events for (default 30).
  • regex: regular expression filter.
  • file: filter events by source code file path.
  • field[<name>]: span field filter (e.g. field[tenant_id]=foo).

Wdyt?

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 6, 2025

I'm a little anxious about using trace+regex here, the overhead could be substantial, and we'll probably be using this in situations where we already have a performance problem.

I was thinking about maybe something designed for minimum cost, like:

  • A piece of state on Tenant/Timeline that controls whether to trace (i.e. for a non-traced tenant, the overhead would just be one load and one branch)
  • Record + output some very dense binary structure for getpage requests (e.g. a stream of records that are a 16 byte key, an 8 byte timestamp, a 4 byte runtime).
  • A threshold for the recording buffer to bound how much memory this can eat, e.g. 32MB should be enough for recording 1 million requests if we use a very dense encoding

@erikgrinaker
Copy link
Contributor

Discussed offline. The performance risks of a generalized tracing endpoint appear too big for us to ship something to production for debugging in a matter of days. We'll do the simple, performant thing for now: add an API endpoint that registers a fixed-size channel for a timeline, and emits compact binary data to the client via HTTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/observability Area: related to observability c/storage/pageserver Component: storage: pageserver t/incident Issue type: incident in our service
Projects
None yet
Development

No branches or pull requests

2 participants