-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrumentation raises uncaught exception on non utf-8 request header sequences #1478
Instrumentation raises uncaught exception on non utf-8 request header sequences #1478
Comments
Hi! value.decode("latin1")
'Mozilla/5.0 (Linux; Android 8.0.0; Moto Z² FORCE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.126 Mobile Safari/537.36 OPR/72.5.3767.693' User-agent header value: |
@srikanthccv Hi! |
I maybe wrong, but I think the encoding assumption was made based on ASGI specification. |
@srikanthccv hi! Any news? |
I couldn't find anything related to encoding assumption I mentioned previously. I am not not an expert with encoding; how does |
User agent header has a substring Examples: >>> b"Moto Z\xb2".decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 6: invalid start byte
>>>b"Moto Z\xb2".decode("latin1")
'Moto Z²' I also found this one on stackoverflow and it works too. b"Moto Z\xb2".decode("unicode-escape")
'Moto Z²' |
I think regardless of the byte stream encoding, if the requests didn't blow up without this middleware, then they shouldn't blow up with it. I don't think it's correct that this middleware enforces a stricter header rule, even if the existing one is more permissive than it should. Would it be sensible to either not decode them, or try-catch the decode and pass-through, or try more encodings? |
This is one of the general principles of the project. However, I would recommend fixing the encoding issue. I am fine with either approach 1. catch and log exception 2. encoding(s) that do not produce issues. |
So may be use "unicode-escape" as fallback? |
@srikanthccv ping |
That sounds good to me. We can probably have both fallback and try-catch pass through. |
Great! I will make PR this week |
Is this PR stalled ? Is there anything one (or me) can do to improve the state of this PR ? |
Hi, @thomasleveil ! Yes, it is 😞 If somebody can improve this PR, it will be great! |
@nkhitrov I tried your branch and while in its current state, tox fails, if you rebase your branch on origin/master, then |
I run across similar encoding issues previously, and in my search for answers found a relevant S.O. question, which I answered once I'd figured things out: https://stackoverflow.com/a/2090224/253599 From one of the linked references there is specific mention that the default HTTP encoding according to RFC2616 was meant to be
In practice, I guess this might all boil down to "if the content encoding is ambiguous, refuse the temptation to guess". Is it feasible to fail gracefully when the |
@tysonclugg RFC 2616, section 3.7.1 applies to body of HTTP messages. My undestanding is that HTTP header format is describled in RFC 2616, section 4.2 Message Headers :
RFC 822, section 3.1.2 STRUCTURE OF HEADER FIELDS :
|
@thomasleveil I think you missed the point. There is likely a bug in some other software that results in the condition that causes the error reported here (ie: the data wasn't encoded correctly). But that condition should not result in OpenTelemetry taking out the ASGI service in question, regardless of how many RFCs have been violated. Our task is to instrument the service in question, even when it does the wrong thing. Some would even argue that we are required to do this at all times, especially when things go wrong. That will likely mean we need to fail gracefully, rather than blowing up when dealing with incorrectly encoded data. Have we considered using |
Describe your environment
Poetry (version 1.2.2)
Python 3.10.4
opentelemetry-instrumentation-fastapi = ">=0.32b0,<1.0dev"
Steps to reproduce
Send an HTTP request to a Python ASGI application with
opentelemetry-instrumentation-fastapi
instrumentation, which contains non utf-8 sequences in its request headers.What is the expected behavior?
REST request does not fail without the instrumentation, and thus should not fail with the instrumentation either.
What is the actual behavior?
Additional context
Issue happens in this line, which assumes utf-8 encoding:
opentelemetry-python-contrib/instrumentation/opentelemetry-instrumentation-asgi/src/opentelemetry/instrumentation/asgi/__init__.py
Line 346 in bc57cc0
The text was updated successfully, but these errors were encountered: