Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add service name and version to User-Agent header #509

Closed
thomasklinger1234 opened this issue Aug 4, 2021 · 14 comments · Fixed by #514
Closed

Add service name and version to User-Agent header #509

thomasklinger1234 opened this issue Aug 4, 2021 · 14 comments · Fixed by #514

Comments

@thomasklinger1234
Copy link

Is your feature request related to a problem?

We are providing APM servers for multiple teams as a service internally. When analyzing the request logs from our APM servers, we cannot identify clients/agents easily to troubleshoot the individual APM agent. We only get two informations:

  • User Agent
  • IP Address

which are hard to use when performing analytics without access to the IP ranges.

It would be super helpful to optionally be able to customize the user agent header that the agent is sending (currently elasticapm-java/1.23.0 or similar).

Describe the solution you'd like

Add a new optional property to enable customization of the user agent. When sending data to APM server, the agent will now use the header elasticapm-java/1.23.0/{my-identifier}. The optional part could be either inferred automatically, e.g. by using an existing property like service_name or be explicitly set in the configuration file with the property reporter_user_agent.

Describe alternatives you've considered

None.

Additional context

Here is a screenshot of our analytics tools that we use for getting insights into the requests that we receive at the APM server. As you can see, we can only distinguish on the IP range which is not useful as our clients use NAT gateways and other proxies which makes the IP not helpful.

Screenshot 2021-08-04 at 17 30 17

@thomasklinger1234
Copy link
Author

@eyalkoren
Copy link
Contributor

Sounds reasonable and quite easy to do, only thing to consider is potential abuse.
Do you use only Java agents?

@thomasklinger1234
Copy link
Author

Sounds reasonable and quite easy to do, only thing to consider is potential abuse.

Could you be so kind as to explain the abuse potential? The security implications would also be interesting to us when we are instructing our clients to use this feature.

Do you use only Java agents?

Mostly Java. Which is why I opened the feature request here at first. We also have go, node.js and Python agents running.

@eyalkoren
Copy link
Contributor

Could you be so kind as to explain the abuse potential? The security implications would also be interesting to us when we are instructing our clients to use this feature.

I can't see such, but it's the one thing that comes in mind that worth consideration.

Mostly Java. Which is why I opened the feature request here at first. We also have go, node.js and Python agents running.

Right, in which case we may consider moving it to the https://github.com/elastic/apm repo.
@felixbarny WDYT?

@felixbarny
Copy link
Member

It seems useful to me, agree that it should be something that is specified consistently across agents. I think it is fine, though, if not all agents implement that feature.

@thomasklinger1234 is it important that the identifier is configurable? Or is it enough for the agent to just add the service name?

Or do you need to be able to identify a particular instance of an agent? If yes, what would be the best identifier for that? The hostname? The service.node.name? The ephemeral id (UUID) of the agent?

@felixbarny
Copy link
Member

Maybe a good default could be elasticapm-$language/$agent.version $service.name/$service.version.

@thomasklinger1234
Copy link
Author

It seems useful to me, agree that it should be something that is specified consistently across agents. I think it is fine, though, if not all agents implement that feature.

@thomasklinger1234 is it important that the identifier is configurable? Or is it enough for the agent to just add the service name?

Or do you need to be able to identify a particular instance of an agent? If yes, what would be the best identifier for that? The hostname? The service.node.name? The ephemeral id (UUID) of the agent?

@felixbarny For starters, I would be totally happy if the service name or any "human-readable" identifier would be sent that tells us what client sends the data. For our particular use case, the hostname or the UUID would not be useful because reverse-engineering that without knowlege about our client's environment is infeasible.

Your suggestion elasticapm-$language/$agent.version $service.name/$service.version sounds awesome!

@thomasklinger1234
Copy link
Author

Hey @felixbarny, any news on this? I may be also able to implement this and create a PR if needed :)

@felixbarny
Copy link
Member

Hey @thomasklinger1234, no news here. The next step would be to create a cross-agent spec for that, to make the header consistent across all agents.

This is the file that needs to be modified to describe the new behavior: https://github.com/elastic/apm/blob/master/specs/agents/transport.md

Also, there were some discussions in the past about the user-agent: #88

@felixbarny felixbarny changed the title Make agent user-agent customizable Add service name and version to User-Agent header Sep 21, 2021
@felixbarny felixbarny transferred this issue from elastic/apm-agent-java Sep 21, 2021
@alex-fedotyev
Copy link

Hi @thomasklinger1234 !

Since there were no alternatives proposed, I wonder what you think about following approach.
Today when transaction, span, error or metric documents are going through APM server, we already save several fields which could be useful for your use case.

  • service.name
  • service.version
  • agent.name
  • agent.version
  • observer.name (APM server instance)
  • observer.host (APM server host)

And several others, like host.ip or container.id, etc.

This could allow putting together dashboards to analyze which services are sending more transactions by volume as well as analyzing individual load by various APM servers (in case you provision separate APM server for separate teams).

Below is a sample dashboard I created using Kibana Lens.
image

What do you think?

@SylvainJuge
Copy link
Member

Hi @thomasklinger1234 ,

I've just opened a PR to implement the shared specification on all agents : #514

Once this is properly reviewed and merged, we can start doing the implementation on the Java agent side.

@thomasklinger1234
Copy link
Author

@SylvainJuge @alex-fedotyev thanks for your efforts, I have been busy the last time and could not respond so sorry for that. Awesome work from both of you, that looks very helpful for us. Excited to see the PR being merged and the implementation as well :)

@trentm
Copy link
Member

trentm commented Oct 27, 2021

Just mentioning a couple possible concerns. I don't think we need to deal with these in the spec, at least not right now.

Need we guard against non-ascii chars in service.version? The APM intake spec requires that service.name matches ^[a-zA-Z0-9 _-]+$, so that is fine. The spec says nothing about the allowed range of chars in service.version. Technically that could make it possible for a weird service.version to break the RFC spec for User-Agent: https://datatracker.ietf.org/doc/html/rfc7231#appendix-D My poor read of the ABNF production User-Agent = product *( RWS ( product / comment ) ) is that things only get problematic if service.version includes parentheses, \, and likely gets "interesting" if it includes non-ASCII characters. I suppose we are also relying on header encoding in each of the languages to guard against newline characters in service.version playing games with header injection attacks.

Another potential concern is length. Both service.name and service.version can be up to 1024 chars in length. That could potentially have impact on the size of the header block in requests.

@trentm
Copy link
Member

trentm commented Oct 28, 2021

Yah, I will probably do some sanitization of serviceVersion for the Node.js APM agent. With:

const apm = require('.').start({
  serviceName: 'foo',
  serviceVersion: 'bar\nblah'
})

apm.startTransaction('t')
apm.endTransaction()

I get this error from Node's http lib when attempting the intake request:

% node foo.js
node:_http_outgoing:570
    throw new ERR_INVALID_CHAR('header content', name);
    ^

TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["User-Agent"]
    at ClientRequest.setHeader (node:_http_outgoing:579:3)
    at new ClientRequest (node:_http_client:256:14)
    at request (node:http:96:10)
    at Client.get [as _transportGet] (node:http:107:15)
    at Client._pollConfig (/Users/trentm/el/apm-nodejs-http-client2/index.js:307:20)
    at new Client (/Users/trentm/el/apm-nodejs-http-client2/index.js:164:10)
    at Config.httpTransport [as transport] (/Users/trentm/el/apm-agent-nodejs13/lib/config.js:282:25)
    at Agent.start (/Users/trentm/el/apm-agent-nodejs13/lib/agent.js:256:32)
    at Object.<anonymous> (/Users/trentm/el/apm-agent-nodejs13/foo.js:1:26)
    at Module._compile (node:internal/modules/cjs/loader:1101:14) {
  code: 'ERR_INVALID_CHAR'
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants