Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of NewJWTSVID API #5801

Open
sorindumitru opened this issue Jan 21, 2025 · 0 comments
Open

Improving performance of NewJWTSVID API #5801

sorindumitru opened this issue Jan 21, 2025 · 0 comments
Labels
priority/backlog Issue is approved and in the backlog

Comments

@sorindumitru
Copy link
Collaborator

sorindumitru commented Jan 21, 2025

When an agent wants to fetch a JWT-SVID for a workload it uses the agent.v1.NewJWTSVID API. This does the following:

  • Fetches all authorized entries for the agent, which ends up cloning protobuf structures for them (see Prevent mutations by entry cache callers #3215 for the reason for that).
  • Put all of them in the map.
  • Verify if the entry id the agent requested is part of that map.

Most of this work is unnecessary. We don't need all authorized entries for the decission, we just need the entry if the agent is authorized for it.

I've done a small POC for fetching only the authorized entry id and this looks to behave better. I get spire-server CPU usage to drop by ~50%. Flame graphs also show that more time is spent doing useful work, signing SVIDs.

before:
Image
after:
Image

Fetching X509-SVIDs does the same things but there it makes more sense to do the pre-processing since it needs to sign a batch of SVIDs.

There's probably a bunch of improvement that can be made here:

  • Instead of getting all the authorized entries, only get the entries that the agent has requested. This could work for both JWT-SVID and X509-SVID
  • Instead of storing protobuf messages, store an intermediary type or at least return an intermediary type. should be faster than cloning protobuf messages. It might end up being a bigger change since we need to modify multiple places to use the new type.

Doing both will probably give us the best performance improvement, but even with just the first option we get a good improvement.

Happy to work on this if the request makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

2 participants