Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Pydantic types that cannot be instantiated - deserialization #2334

Closed
1 of 4 tasks
geeshta opened this issue Sep 22, 2023 · 4 comments
Closed
1 of 4 tasks

Bug: Pydantic types that cannot be instantiated - deserialization #2334

geeshta opened this issue Sep 22, 2023 · 4 comments
Labels
Bug 🐛 This is something that is not working as expected (De)serialization This is related to our Serialization (msgspec) Help Wanted 🆘 This is good for people to work on

Comments

@geeshta
Copy link
Contributor

geeshta commented Sep 22, 2023

Description

When deserializing object, Litestar checks if a given value matches a given target_type. Additional type_decoders can be provided to convert a type given after deserialization to the expected type.

The problem is - when using Pydantic models and PydanticDTO's - Pydantic has some types that cannot be instantiated and are really not supposed to be used elsewhere other than model declaration - for example EmailStr.

When deserializing an object like:

{
  "email": "[email protected]"
}

using a DTO given in the MCVE -> the expected type of email during runtime is EmailStr (see screenshot). Msgspec after deserialization returns a str so a decoder is needed. However, there is no way to instantiate EmailStr - it is only meant to be used as a type annotation for a Pydantic model.
cast doesn't change the type at runtime it is only useful for type checkers. So such decoder is impossible.

The solution should be changing the target_type to str in the default_deserializer function - line 91 of litestar/serialization/msgspec_hooks.py and consequently in the call to msgspec.json.decode at line 187 of the same file. Internally by Pydantic, they're treated as a str after validation is performed.

Which should be handled by the PydanticDTO. Also there's more types in Pydantic that cannot be instantiated like ImportStr and maybe others. The target type for these should always be they're naive counterpart.

URL to code causing the issue

No response

MCVE

import logging

from litestar import Litestar, post
from litestar.contrib.pydantic import PydanticDTO
from litestar.dto import DTOConfig
from pydantic import BaseModel, EmailStr

from typing import cast


class EmailModel(BaseModel):
    email: EmailStr


class EmailDTO(PydanticDTO[EmailModel]):
    config = DTOConfig()


@post("/email", dto=EmailDTO)
async def accept_email(data: EmailModel) -> None:
    logging.info(data)


def is_emailstr_type(obj_type: type[EmailStr]) -> bool:
    return obj_type is EmailStr


def emailstr_decoder(obj_type: type[EmailStr], value: str) -> EmailStr:
    # How?
    
    return cast(EmailStr, value)


app = Litestar(
    route_handlers=[accept_email], type_decoders=[(is_emailstr_type, emailstr_decoder)]
)

Steps to reproduce

  1. Run the Litestar app from the MCVE
  2. Send POST request with the body of {"email": "[email protected]"} to /email
  3. msgspec.ValidationError occurs
  4. Also when removing type_decoders, a similar error appears

Screenshots

Expected type cannot be instantiated

Logs

Full traceback
INFO:     127.0.0.1:58390 - "POST /email HTTP/1.1" 400 Bad Request
ERROR - 2023-09-22 21:00:43,846 - litestar - config - exception raised on http connection to route /email

Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/serialization/msgspec_hooks.py", line 187, in decode_json
    return msgspec.json.decode(
           ^^^^^^^^^^^^^^^^^^^^
msgspec.ValidationError: decoding to str: need a bytes-like object, type found - at `$.email`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 184, in _get_response_data
    kwargs["data"] = await kwargs["data"]
                     ^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/_kwargs/extractors.py", line 427, in dto_extractor
    return data_dto(connection).decode_bytes(body)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/base_dto.py", line 96, in decode_bytes
    return backend.populate_data_from_raw(value, self.asgi_connection)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/_backend.py", line 299, in populate_data_from_raw
    source_data=self.parse_raw(raw, asgi_connection),
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/_backend.py", line 206, in parse_raw
    result = decode_json(value=raw, target_type=self.annotation, type_decoders=type_decoders)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/serialization/msgspec_hooks.py", line 191, in decode_json
    raise SerializationException(str(msgspec_error)) from msgspec_error
litestar.exceptions.base_exceptions.SerializationException: decoding to str: need a bytes-like object, type found - at `$.email`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/middleware/exceptions/middleware.py", line 191, in __call__
    await self.app(scope, receive, send)
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 79, in handle
    response = await self._get_response_for_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 131, in _get_response_for_request
    response = await self._call_handler_function(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 160, in _call_handler_function
    response_data, cleanup_group = await self._get_response_data(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 186, in _get_response_data
    raise ClientException(str(e)) from e
litestar.exceptions.http_exceptions.ClientException: 400: decoding to str: need a bytes-like object, type found - at `$.email`
Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/serialization/msgspec_hooks.py", line 113, in default_deserializer
    return decoder(target_type, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: decoding to str: need a bytes-like object, type found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/serialization/msgspec_hooks.py", line 187, in decode_json
    return msgspec.json.decode(
           ^^^^^^^^^^^^^^^^^^^^
msgspec.ValidationError: decoding to str: need a bytes-like object, type found - at `$.email`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 184, in _get_response_data
    kwargs["data"] = await kwargs["data"]
                     ^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/_kwargs/extractors.py", line 427, in dto_extractor
    return data_dto(connection).decode_bytes(body)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/base_dto.py", line 96, in decode_bytes
    return backend.populate_data_from_raw(value, self.asgi_connection)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/_backend.py", line 299, in populate_data_from_raw
    source_data=self.parse_raw(raw, asgi_connection),
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/dto/_backend.py", line 206, in parse_raw
    result = decode_json(value=raw, target_type=self.annotation, type_decoders=type_decoders)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/serialization/msgspec_hooks.py", line 191, in decode_json
    raise SerializationException(str(msgspec_error)) from msgspec_error
litestar.exceptions.base_exceptions.SerializationException: decoding to str: need a bytes-like object, type found - at `$.email`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/middleware/exceptions/middleware.py", line 191, in __call__
    await self.app(scope, receive, send)
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 79, in handle
    response = await self._get_response_for_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 131, in _get_response_for_request
    response = await self._call_handler_function(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 160, in _call_handler_function
    response_data, cleanup_group = await self._get_response_data(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geeshta/prog/litestar/pydbug/.venv/lib/python3.11/site-packages/litestar/routes/http.py", line 186, in _get_response_data
    raise ClientException(str(e)) from e
litestar.exceptions.http_exceptions.ClientException: 400: decoding to str: need a bytes-like object, type found - at `$.email`
### Litestar Version

2.0.1

Platform

  • Linux
  • Mac
  • Windows
  • Other (Please specify in the description above)

Funding

  • If you would like to see an issue prioritized, make a pledge towards it!
  • We receive the pledge once the issue is completed & verified
Fund with Polar
@geeshta geeshta added Bug 🐛 This is something that is not working as expected Triage Required 🏥 This requires triage labels Sep 22, 2023
@provinzkraut provinzkraut removed the Triage Required 🏥 This requires triage label Sep 22, 2023
@provinzkraut provinzkraut added the Help Wanted 🆘 This is good for people to work on label Oct 13, 2023
@jw-can
Copy link

jw-can commented Oct 18, 2023

Schema validation of Pydantic and msgspec can't be used transparently/interchangeably: like pointed out by @geeshta, the type EmailStr is used by Pydantic for validation, effectively processing str. However, this type cannot be used in the schema validation of msgspec, as it expects an instance of EmailStr, which is not feasible.

As far as I understand, currently in Litestar msgspec always does schema validation, irrespective if there's a validation later on by a plugin, e.g. Pydantic?

The problem is that when a PydanticDTO is used, BOTH Pydantic and msgspec do a schema validation, which poses a problem for types like EmailStr. In this case, msgspec shouldn't do any schema validation. Schema validation by msgspec can be disabled by removing the type argument from msgspec.json.decode in msgspec_hooks.py.

I think the behaviour should be that msgspec only does schema validation if there's no other plugin doing that instead. Would you agree?

@peterschutt
Copy link
Contributor

When the msgspec transfer models are created for the pydantic type, the type on the transfer model should be annotated str, and the validation to email string should only happen when the pydantic model is instantiated with the data that has been decoded and validated by msgspec.

E.g., pydantic model:

class WithEmail(BaseModel):
    email: EmailStr

Transfer model produced by DTO should be:

class WithEmailTransfer(msgspec.Struct):
    email: str

Given:

  • {"email": 1} - msgspec should fail this on decoding
  • {"email": "abc"} - pydantic should fail this on model instantiation
  • {"email": "[email protected]"} - should pass both

The down-typing from EmailStr to str should be able to be handled in

def generate_field_definitions(
cls, model_type: type[pydantic.BaseModel]
) -> Generator[DTOFieldDefinition, None, None]:

@jw-can
Copy link

jw-can commented Oct 22, 2023

@peterschutt thanks for the hints!
I just made a pull-request

jw-can added a commit to jw-can/litestar that referenced this issue Dec 2, 2023
@JacobCoffee JacobCoffee added the (De)serialization This is related to our Serialization (msgspec) label Dec 7, 2023
@github-project-automation github-project-automation bot moved this to Triage in Overview Dec 8, 2023
@gsakkis
Copy link
Contributor

gsakkis commented Mar 12, 2024

Just ran into this, any update or workaround?

EDIT: Not a general fix but at least for EmailStr this workaround seems to work for me:

from pydantic import AfterValidator, validate_email

class EmailModel(BaseModel):
    email: Annotated[str, AfterValidator(lambda v: validate_email(v)[1])]

peterschutt added a commit that referenced this issue Mar 31, 2024
This PR simplifies the type that we apply to transfer models for pydantic field types in specific circumstances.

- `JsonValue`: this field type is an instance of `TypeAliasType` at runtime, and contains recursive type definitions. Pydantic allow `list`, `dict`, `str`, `bool`, `int`, `float` and `None`, and the value types of `list` and `dict` are allowed to be the same. We type this as `Any` on the transfer models as this is pretty much the same thing for msgspec ([ref][1]).
- `EmailStr`. These are typed as `EmailStr` which is a class at runtime which is not a `str`, however they are a string and `if TYPE_CHECKING` is used to get type checkers to play along. If we return a `str` from a msgspec decode hook for the type, it msgspec won't validate the input. So we must tell msgspec its a string. This also works well with encoding because it is one.
- IP/Network/Interface types. These are represented by types such as `IPvAnyAddress`, but they are actually parsed into instances of stdlib types such as `IPv4Address` and others from the `ipaddress` module. Given that an instance of `IPvAnyAddress` cannot be meaningfully returned from a decoding hook, we have to tell msgspec that these are strings on the transfer models. Encoding the stdlib ip types to str is natively handled by msgspec.

Closes #2334

1: https://jcristharif.com/msgspec/supported-types.html#any
peterschutt added a commit that referenced this issue Mar 31, 2024
This PR simplifies the type that we apply to transfer models for pydantic field types in specific circumstances.

- `JsonValue`: this field type is an instance of `TypeAliasType` at runtime, and contains recursive type definitions. Pydantic allow `list`, `dict`, `str`, `bool`, `int`, `float` and `None`, and the value types of `list` and `dict` are allowed to be the same. We type this as `Any` on the transfer models as this is pretty much the same thing for msgspec ([ref][1]).
- `EmailStr`. These are typed as `EmailStr` which is a class at runtime which is not a `str`, however they are a string and `if TYPE_CHECKING` is used to get type checkers to play along. If we return a `str` from a msgspec decode hook for the type, it msgspec won't validate the input. So we must tell msgspec its a string. This also works well with encoding because it is one.
- IP/Network/Interface types. These are represented by types such as `IPvAnyAddress`, but they are actually parsed into instances of stdlib types such as `IPv4Address` and others from the `ipaddress` module. Given that an instance of `IPvAnyAddress` cannot be meaningfully returned from a decoding hook, we have to tell msgspec that these are strings on the transfer models. Encoding the stdlib ip types to str is natively handled by msgspec.

Closes #2334

1: https://jcristharif.com/msgspec/supported-types.html#any
peterschutt added a commit that referenced this issue Mar 31, 2024
This PR simplifies the type that we apply to transfer models for pydantic field types in specific circumstances.

- `JsonValue`: this field type is an instance of `TypeAliasType` at runtime, and contains recursive type definitions. Pydantic allow `list`, `dict`, `str`, `bool`, `int`, `float` and `None`, and the value types of `list` and `dict` are allowed to be the same. We type this as `Any` on the transfer models as this is pretty much the same thing for msgspec ([ref][1]).
- `EmailStr`. These are typed as `EmailStr` which is a class at runtime which is not a `str`, however they are a string and `if TYPE_CHECKING` is used to get type checkers to play along. If we return a `str` from a msgspec decode hook for the type, it msgspec won't validate the input. So we must tell msgspec its a string. This also works well with encoding because it is one.
- IP/Network/Interface types. These are represented by types such as `IPvAnyAddress`, but they are actually parsed into instances of stdlib types such as `IPv4Address` and others from the `ipaddress` module. Given that an instance of `IPvAnyAddress` cannot be meaningfully returned from a decoding hook, we have to tell msgspec that these are strings on the transfer models. Encoding the stdlib ip types to str is natively handled by msgspec.

Closes #2334

1: https://jcristharif.com/msgspec/supported-types.html#any
peterschutt added a commit that referenced this issue Mar 31, 2024
feat: pydantic DTO with non-instantiable types.

This PR simplifies the type that we apply to transfer models for pydantic field types in specific circumstances.

- `JsonValue`: this field type is an instance of `TypeAliasType` at runtime, and contains recursive type definitions. Pydantic allow `list`, `dict`, `str`, `bool`, `int`, `float` and `None`, and the value types of `list` and `dict` are allowed to be the same. We type this as `Any` on the transfer models as this is pretty much the same thing for msgspec ([ref][1]).
- `EmailStr`. These are typed as `EmailStr` which is a class at runtime which is not a `str`, however they are a string and `if TYPE_CHECKING` is used to get type checkers to play along. If we return a `str` from a msgspec decode hook for the type, it msgspec won't validate the input. So we must tell msgspec its a string. This also works well with encoding because it is one.
- IP/Network/Interface types. These are represented by types such as `IPvAnyAddress`, but they are actually parsed into instances of stdlib types such as `IPv4Address` and others from the `ipaddress` module. Given that an instance of `IPvAnyAddress` cannot be meaningfully returned from a decoding hook, we have to tell msgspec that these are strings on the transfer models. Encoding the stdlib ip types to str is natively handled by msgspec.

Closes #2334

1: https://jcristharif.com/msgspec/supported-types.html#any
cofin pushed a commit that referenced this issue Apr 1, 2024
feat: pydantic DTO with non-instantiable types.

This PR simplifies the type that we apply to transfer models for pydantic field types in specific circumstances.

- `JsonValue`: this field type is an instance of `TypeAliasType` at runtime, and contains recursive type definitions. Pydantic allow `list`, `dict`, `str`, `bool`, `int`, `float` and `None`, and the value types of `list` and `dict` are allowed to be the same. We type this as `Any` on the transfer models as this is pretty much the same thing for msgspec ([ref][1]).
- `EmailStr`. These are typed as `EmailStr` which is a class at runtime which is not a `str`, however they are a string and `if TYPE_CHECKING` is used to get type checkers to play along. If we return a `str` from a msgspec decode hook for the type, it msgspec won't validate the input. So we must tell msgspec its a string. This also works well with encoding because it is one.
- IP/Network/Interface types. These are represented by types such as `IPvAnyAddress`, but they are actually parsed into instances of stdlib types such as `IPv4Address` and others from the `ipaddress` module. Given that an instance of `IPvAnyAddress` cannot be meaningfully returned from a decoding hook, we have to tell msgspec that these are strings on the transfer models. Encoding the stdlib ip types to str is natively handled by msgspec.

Closes #2334

1: https://jcristharif.com/msgspec/supported-types.html#any
@github-project-automation github-project-automation bot moved this from Triage to Closed in Overview Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 This is something that is not working as expected (De)serialization This is related to our Serialization (msgspec) Help Wanted 🆘 This is good for people to work on
Projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

6 participants