Instrumentation raises uncaught exception on non utf-8 request header sequences #1478

marianogappa · 2022-12-07T14:23:28Z

Describe your environment
Poetry (version 1.2.2)
Python 3.10.4
opentelemetry-instrumentation-fastapi = ">=0.32b0,<1.0dev"

Steps to reproduce
Send an HTTP request to a Python ASGI application with opentelemetry-instrumentation-fastapi instrumentation, which contains non utf-8 sequences in its request headers.

What is the expected behavior?
REST request does not fail without the instrumentation, and thus should not fail with the instrumentation either.

What is the actual behavior?

ERROR 2022-12-07 00:47:10,938 run_asgi Exception in ASGI application
 
Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 560, in __call__
    collect_custom_request_headers_attributes(scope)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 345, in collect_custom_request_headers_attributes
    headers = {
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 346, in <dictcomp>
    _key.decode("utf8"): _value.decode("utf8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 372, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/routing.py", line 443, in handle
    await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/root/.cache/pypoetry/virtualenvs/--REDACTED--/lib/python3.10/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 584, in __call__
    duration = max(round((default_timer() - start) * 1000), 0)
UnboundLocalError: local variable 'start' referenced before assignment

Additional context
Issue happens in this line, which assumes utf-8 encoding:

opentelemetry-python-contrib/instrumentation/opentelemetry-instrumentation-asgi/src/opentelemetry/instrumentation/asgi/__init__.py

Line 346 in bc57cc0

_key.decode("utf8"): _value.decode("utf8")

The text was updated successfully, but these errors were encountered:

nkhitrov · 2023-01-17T02:24:43Z

Hi!
I have the same error. Maybe should add fallback to decode with latin1?

value.decode("latin1")
'Mozilla/5.0 (Linux; Android 8.0.0; Moto Z² FORCE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.126 Mobile Safari/537.36 OPR/72.5.3767.693'

User-agent header value: b'Mozilla/5.0 (Linux; Android 8.0.0; Moto Z\xb2 FORCE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.126 Mobile Safari/537.36 OPR/72.5.3767.69342'

nkhitrov · 2023-02-13T14:21:02Z

@srikanthccv Hi!
What you think about fallback to latin1 encoding? I can create PR with fix

srikanthccv · 2023-02-13T15:05:42Z

I maybe wrong, but I think the encoding assumption was made based on ASGI specification.

nkhitrov · 2023-02-13T15:25:53Z

I did not find any useful information about encoding in the asgi spec. But I found usage of latin-1 for headers in uvicorn and starlette

nkhitrov · 2023-03-01T14:18:23Z

@srikanthccv hi! Any news?

srikanthccv · 2023-03-02T02:00:09Z

I couldn't find anything related to encoding assumption I mentioned previously. I am not not an expert with encoding; how does latin-1 fix the encoding issues?

nkhitrov · 2023-03-02T13:09:25Z

User agent header has a substring Z². Web frameworks can use utf-8 and latin-1 as I know. Symbol ² (\xb2 as bytes) can not be decoded with utf-8. So the first thing I decided to do was to check latin-1 encoding.

Examples:

>>> b"Moto Z\xb2".decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 6: invalid start byte

>>>b"Moto Z\xb2".decode("latin1")
'Moto Z²'

I also found this one on stackoverflow and it works too.

b"Moto Z\xb2".decode("unicode-escape")
'Moto Z²'

marianogappa · 2023-03-02T13:32:24Z

I think regardless of the byte stream encoding, if the requests didn't blow up without this middleware, then they shouldn't blow up with it. I don't think it's correct that this middleware enforces a stricter header rule, even if the existing one is more permissive than it should.

Would it be sensible to either not decode them, or try-catch the decode and pass-through, or try more encodings?

srikanthccv · 2023-03-02T14:10:54Z

try-catch the decode and pass-through

This is one of the general principles of the project. However, I would recommend fixing the encoding issue.

I am fine with either approach 1. catch and log exception 2. encoding(s) that do not produce issues.

nkhitrov · 2023-03-02T15:23:58Z

encoding(s) that do not produce issues.

So may be use "unicode-escape" as fallback?

nkhitrov · 2023-03-06T12:40:04Z

@srikanthccv ping

srikanthccv · 2023-03-06T14:56:48Z

That sounds good to me. We can probably have both fallback and try-catch pass through.

nkhitrov · 2023-03-06T14:59:29Z

Great! I will make PR this week

thomasleveil · 2023-07-10T18:14:18Z

Is this PR stalled ? Is there anything one (or me) can do to improve the state of this PR ?

nkhitrov · 2023-07-12T16:20:32Z

Hi, @thomasleveil !

Yes, it is 😞
I don't have time to deal with tox issues right now. When I try to run tests, it starts putting dependencies and goes into an infinite loop (maybe not infinite, but it happens for a very long time)

If somebody can improve this PR, it will be great!

thomasleveil · 2023-07-12T17:31:59Z

@nkhitrov I tried your branch and while in its current state, tox fails, if you rebase your branch on origin/master, then tox -e py311-test-instrumentation-asgi pass successfully

see open-telemetry#1478 open-telemetry#1814

tysonclugg · 2023-07-24T04:15:09Z

I run across similar encoding issues previously, and in my search for answers found a relevant S.O. question, which I answered once I'd figured things out: https://stackoverflow.com/a/2090224/253599

From one of the linked references there is specific mention that the default HTTP encoding according to RFC2616 was meant to be ISO-8859-1, not UTF-8:

The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter.

In practice, I guess this might all boil down to "if the content encoding is ambiguous, refuse the temptation to guess".

Is it feasible to fail gracefully when the Content-Type header is missing (or just plain wrong) and handle random bytes? Would doing so make the path forward for this issue clear?

thomasleveil · 2023-07-25T13:11:33Z

@tysonclugg RFC 2616, section 3.7.1 applies to body of HTTP messages. My undestanding is that HTTP header format is describled in RFC 2616, section 4.2 Message Headers :

HTTP header fields, which include general-header (section 4.5),
request-header (section 5.3), response-header (section 6.2), and
entity-header (section 7.1) fields, follow the same generic format as
that given in Section 3.1 of RFC 822 [9].

RFC 822, section 3.1.2 STRUCTURE OF HEADER FIELDS :

Once a field has been unfolded, it may be viewed as being com-
posed of a field-name followed by a colon (":"), followed by a
field-body, and terminated by a carriage-return/line-feed.
The field-name must be composed of printable ASCII characters
(i.e., characters that have values between 33. and 126.,
decimal, except colon). The field-body may be composed of any
ASCII characters, except CR or LF.

tysonclugg · 2023-07-27T12:58:29Z

@thomasleveil I think you missed the point.

There is likely a bug in some other software that results in the condition that causes the error reported here (ie: the data wasn't encoded correctly). But that condition should not result in OpenTelemetry taking out the ASGI service in question, regardless of how many RFCs have been violated.

Our task is to instrument the service in question, even when it does the wrong thing. Some would even argue that we are required to do this at all times, especially when things go wrong. That will likely mean we need to fail gracefully, rather than blowing up when dealing with incorrectly encoded data.

Have we considered using _value.decode("utf-8", errors="surrogateescape")?

marianogappa added the bug Something isn't working label Dec 7, 2022

srikanthccv added help wanted Extra attention is needed instrumentation labels Dec 7, 2022

ncbraner mentioned this issue Dec 19, 2022

Update keys() to be able to iterate over a dict by updating the loop … #1492

Open

2 tasks

mike-baker-on added a commit to mike-baker-on/opentelemetry-python-contrib that referenced this issue Jan 6, 2023

Fix UnboundLocalError in asgi handler (open-telemetry#1478)

d48c7f2

mike-baker-on mentioned this issue Jan 6, 2023

Fix UnboundLocalError in asgi handler #1564

Draft

4 tasks

nkhitrov mentioned this issue Mar 7, 2023

WIP: Add latin-1 fallback decoding for asgi headers #1713

Draft

11 tasks

srikanthccv mentioned this issue May 20, 2023

[asgi] Headers with special characters cause the request to fail #1814

Closed

thomasleveil mentioned this issue Jul 10, 2023

[ASGI] UnboundLocalError local variable 'start' referenced before assignment #1883

Closed

thomasleveil added a commit to thomasleveil/opentelemetry-python-contrib that referenced this issue Jul 12, 2023

[asgi] add test for header decoding bug

4200318

see open-telemetry#1478 open-telemetry#1814

emdneto mentioned this issue Aug 30, 2024

Add fallback decoding for asgi headers #2837

Merged

6 tasks

xrmx closed this as completed in #2837 Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation raises uncaught exception on non utf-8 request header sequences #1478

Instrumentation raises uncaught exception on non utf-8 request header sequences #1478

marianogappa commented Dec 7, 2022

nkhitrov commented Jan 17, 2023

nkhitrov commented Feb 13, 2023

srikanthccv commented Feb 13, 2023

nkhitrov commented Feb 13, 2023

nkhitrov commented Mar 1, 2023

srikanthccv commented Mar 2, 2023

nkhitrov commented Mar 2, 2023

marianogappa commented Mar 2, 2023

srikanthccv commented Mar 2, 2023

nkhitrov commented Mar 2, 2023

nkhitrov commented Mar 6, 2023

srikanthccv commented Mar 6, 2023

nkhitrov commented Mar 6, 2023

thomasleveil commented Jul 10, 2023

nkhitrov commented Jul 12, 2023

thomasleveil commented Jul 12, 2023

tysonclugg commented Jul 24, 2023

thomasleveil commented Jul 25, 2023

tysonclugg commented Jul 27, 2023 •

edited

Loading

Instrumentation raises uncaught exception on non utf-8 request header sequences #1478

Instrumentation raises uncaught exception on non utf-8 request header sequences #1478

Comments

marianogappa commented Dec 7, 2022

nkhitrov commented Jan 17, 2023

nkhitrov commented Feb 13, 2023

srikanthccv commented Feb 13, 2023

nkhitrov commented Feb 13, 2023

nkhitrov commented Mar 1, 2023

srikanthccv commented Mar 2, 2023

nkhitrov commented Mar 2, 2023

marianogappa commented Mar 2, 2023

srikanthccv commented Mar 2, 2023

nkhitrov commented Mar 2, 2023

nkhitrov commented Mar 6, 2023

srikanthccv commented Mar 6, 2023

nkhitrov commented Mar 6, 2023

thomasleveil commented Jul 10, 2023

nkhitrov commented Jul 12, 2023

thomasleveil commented Jul 12, 2023

tysonclugg commented Jul 24, 2023

thomasleveil commented Jul 25, 2023

tysonclugg commented Jul 27, 2023 • edited Loading

tysonclugg commented Jul 27, 2023 •

edited

Loading