-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-2674 Strip sensitive data from the url #6417
Conversation
…ttributesExtractor
Hey @MALPI , |
hey @mateuszrzeszutek, thanks for reaching out. We would need to sing a Company CLA. Could you provide us a PDF or something like that? |
I honestly have no idea where to get it from. Maybe you can try the "please submit a support request ticket" link in the CLA bot message above; or maybe you can ask around in the CNCF #opentelemetry slack. |
I am trying to sort this out with the legal department @mateuszrzeszutek. Is there anything that I can improve on the PR itself meanwhile? |
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
return url; | ||
} | ||
// replace username & password | ||
return url.replaceFirst("(?<=\\/\\/)(.+)(?=@)", "username:password"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex pattern matching is rather expensive; even more so when you're not using a precompiled Pattern
.
How about using indexOf()
to find out the positions of //
and @
instead?
Also, you could just simply remove the credentials (together with the @
character); this would be more in line with the spec:
http.url MUST NOT
contain credentials passed via URL in form ofhttps://username:[email protected]/
. In such case the attribute's value should behttps://www.example.com/
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Mateusz Rzeszutek <[email protected]>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Mateusz Rzeszutek <[email protected]>
… potential performance issues.
… contains an at character.
if(url.contains("@")) { | ||
return url.substring(0, url.indexOf("//")+2) + url.substring(url.indexOf("@")+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no need to lookup "@"
twice:
if(url.contains("@")) { | |
return url.substring(0, url.indexOf("//")+2) + url.substring(url.indexOf("@")+1); | |
int atIndex = url.indexOf("@"); | |
if (atIndex != -1) { | |
return url.substring(0, url.indexOf("//")+2) + url.substring(atIndex+1); |
@MALPI can you run |
… run indexOf twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks @MALPI !
int atIndex = url.indexOf("@"); | ||
// replace username & password | ||
if (atIndex != -1) { | ||
return url.substring(0, url.indexOf("//") + 2) + url.substring(atIndex + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the url is http://somehost?foo=b@r
or http://somehost/p@ge
? I believe you should check that @
occurs before any of /?#
and strip only then. Secondly can we be certain that the url actually contains //
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add the check for query parameters. I know that basic auth works as well with ftp, but this would require a protocol prefix too? I could add a check for //
before @
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @MALPI, here's a similar url parser I wrote recently to pull out the "authority" part of a url, in case it helps with inspiration: https://github.com/microsoft/ApplicationInsights-Java/blob/026888fcaa5ba2e0d6a15c06bb68a8862d8852bd/agent/azure-monitor-exporter/src/main/java/com/azure/monitor/opentelemetry/exporter/implementation/utils/UrlParser.java#L10-L16
Lauri pointed out a scenario that needs to be implemented
@laurit @mateuszrzeszutek @trask Can you have another look? |
...o/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractorTest.java
Outdated
Show resolved
Hide resolved
...o/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractorTest.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Trask Stalnaker <[email protected]>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Trask Stalnaker <[email protected]>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Trask Stalnaker <[email protected]>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Trask Stalnaker <[email protected]>
hey @trask @mateuszrzeszutek anything else? |
} | ||
|
||
int atIndex = url.lastIndexOf("@", index - 1); | ||
int questionMarkIndex = url.indexOf("?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is questionMarkIndex
really needed?
int questionMarkIndex = url.indexOf("?"); | ||
|
||
// '@' char is present and is not a query param | ||
if (atIndex == -1 || (questionMarkIndex != -1 && atIndex > index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also check whether @
is the last character and if it is return the url as is. Try with https://github.com@
java.net.URI parses it so that github.com@
is authority not userinfo.
int questionMarkIndex = url.indexOf("?"); | ||
|
||
// '@' char is present and is not a query param | ||
if (atIndex == -1 || (questionMarkIndex != -1 && atIndex > index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atIndex > index
will always be false -- as Lauri's mentioned, you don't need that questionMarkIndex
variable.
Sorry, I was on vacation, I didn't check GH last week at all. |
…is last value of string instead.
All good. Hope you enjoyed your vacation. To be honest this PR got slightly out of hands and I am wondering if it was really the smartest decision to change the implementation from the Regex I had in the beginning to this. We now have to exclude all the cases where an |
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
...va/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Outdated
Show resolved
Hide resolved
Unfortunately regex patterns can be quite slow; and since this code runs on the hot path of every HTTP client call, it needs to be as fast as possible, and create as little garbage as possible. While manual iterating over the string can be inconvenient, it's faster. |
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java Co-authored-by: Mateusz Rzeszutek <[email protected]>
# Conflicts: # instrumentation-api-semconv/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
Thanks @MALPI ! 🚀 |
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with `username:password`. Co-authored-by: Malte <[email protected]> Co-authored-by: Mateusz Rzeszutek <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with `username:password`. Co-authored-by: Malte <[email protected]> Co-authored-by: Mateusz Rzeszutek <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with `username:password`. Co-authored-by: Malte <[email protected]> Co-authored-by: Mateusz Rzeszutek <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>
Fixes #2674 by replacing basic auth information as part of the URL with
username:password
.