Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] content-type application/json using iso-8859-1 instead of utf-8 (accent and special char not supported) #12797

Open
4 of 5 tasks
pqwarlot opened this issue Jul 8, 2022 · 11 comments

Comments

@pqwarlot
Copy link
Contributor

pqwarlot commented Jul 8, 2022

  • Have you provided a full/minimal spec to reproduce the issue?
  • [ x Have you validated the input using an OpenAPI validator (example)?
  • Have you tested with the latest master to confirm the issue still exists?
  • Have you searched for related issues/PRs?
  • What's the actual output vs expected output?
  • [Optional] Sponsorship to speed up the bug fix or feature request (example)
Description

Using the plugin org.openapitools:openapi-generator-maven-plugin:6.0.1 to generator a Java client from an openapi3.yml spec file, I have noticed requestBody content-type application/json generate a Java Api that does not support accent or special characters.

For example )çàç!è!§è(‘“é‘(§‘“é&“’(§è!çà send to the Java API is transformed to )???!?!??(????(????&??(??! in the HTTP call (wireshark inspection).

Problem : request content-type application/json is parsed as application/json with no charset. Therfore, using apache-httpclient, the generator use org.apache.httpcomponents:httpcore where the default charset iso-8859-1 which does not support accent.

application/json with ISO-8859-1 it is not compliant with RFC4627. In my case it lead to unwanted behavior in Java HTTP call such as unexpected character replacement.

openapi-generator version

org.openapitools:openapi-generator-maven-plugin:6.0.1
library : apache-httpclient

OpenAPI declaration file content or url
openapi: 3.0.1
info:
  version: 1.0.0
  title: Example
paths:
  /test-body:
    put:
      tags:
        - Test
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TestBodys'
      responses:
        200:
          description: OK
          content:
            text/plain:
              schema:
                type: string
                example: pong
components:
  schemas:
    TestBodys:
      type: object
      properties:
        test_body:
          type: array
          items:
            $ref: '#/components/schemas/TestBody'

    TestBody:
      type: object
      properties:
        code:
          type: string
        type:
          type: string
        value:
          type: object
Generation Details

pom.xml

<plugin>
    <groupId>org.openapitools</groupId>
    <artifactId>openapi-generator-maven-plugin</artifactId>
    <version>6.0.1</version>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <generatorName>java</generatorName>
                <inputSpec>${project.basedir}/src/main/resources/openapi/connect/openapi3.yml</inputSpec>
                <!-- api final path: output + sourceFolder + apiPackage -->
                <apiPackage>com.test.client.generated.api</apiPackage>
                <!-- model final path: output + sourceFolder + modelPackage -->
                <modelPackage>com.test.client.generated.model</modelPackage>
                <!-- disable unused code generation -->
                <generateApiTests>false</generateApiTests>
                <generateModelTests>false</generateModelTests>
                <generateApiDocumentation>false</generateApiDocumentation>
                <generateModelDocumentation>false</generateModelDocumentation>
                <typeMappings>
                    <typeMapping>OffsetDateTime=java.time.Instant</typeMapping>
                </typeMappings>
                <!-- disable pom.xml and other unwanted file generation -->
                <supportingFilesToGenerate>
                    ApiClient.java,ServerConfiguration.java,ServerVariable.java,JavaTimeFormatter.java,StringUtil.java,Authentication.java,HttpBasicAuth.java,HttpBearerAuth.java,ApiKeyAuth.java,ApiException.java,Configuration.java,Pair.java,auth/Authentication.java,RFC3339DateFormat.java
                </supportingFilesToGenerate>
                <configOptions>
                    <library>apache-httpclient</library>
                    <dateLibrary>java8</dateLibrary>
                    <sourceFolder>src/main/java/</sourceFolder>
                </configOptions>
                <output>${project.basedir}</output>
            </configuration>
        </execution>
    </executions>
</plugin>
Steps to reproduce

Generator the Java client using the OpenApi plugin generator.
Then use the ApiClient TestBody api to send special character such as )çàç!è!§è(‘“é‘(§‘“é&“’(§è!çà
Using wireshark or another tool inspect the payload request content.
=> Special caracters has been replaces by interrogation point.

Related issues/PRs

Not found

Suggest a fix

Use UTF-8 charset proposed by apache httpcore such as here in httpcore.

Inside generated ApiClient.java, the getContentType method is :

private ContentType getContentType(String headerValue) throws ApiException {
        try {
            return ContentType.parse(headerValue);
        } catch (org.apache.http.ParseException var3) {
            throw new ApiException("Could not parse content type " + headerValue);
        }
    }

And can be replaced by the following to retrieve an utf8 charset inside the content-type :

private ContentType getContentType(String headerValue) {
       return ContentType.getByMimeType(headerValue);
    }
@pqwarlot
Copy link
Contributor Author

fixed

@JoeCqupt
Copy link
Contributor

JoeCqupt commented Aug 3, 2022

i think you should use [ content-Type: application/json;charset=UTF8 ] to specify charset , instead of modify method: getContentType.

#13058 want support custom contentType, it will modify it back

@pqwarlot
Copy link
Contributor Author

pqwarlot commented Aug 3, 2022

i think you should use [ content-Type: application/json;charset=UTF8 ] to specify charset , instead of modify method: getContentType.

#13058 want support custom contentType, it will modify it back

This is a also a possibility but using application/json with ISO-8859-1 it is not compliant with RFC4627 and can lead to unwanted behavior in Java HTTP call such as unexpected character replacement.
I think both should not be exclusive.

@pqwarlot pqwarlot reopened this Aug 24, 2022
@pqwarlot
Copy link
Contributor Author

pqwarlot commented Aug 24, 2022

Issue re-opened as changes has been revert. The behavior is no more RFC4627 compliant

@rems
Copy link

rems commented Aug 26, 2022

I would also like the default behavior to comply with the RFC4627 section 3, that states:

  1. Encoding

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

RFC5627 section 1.1 states the term SHALL is to be interpreted as described in RFC2119.

As per RFC2119 section 1, SHALL means:

1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

Furthermore, RFC4627 section 1 states that:

JSON can represent four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays).

A string is a sequence of zero or more Unicode characters [UNICODE].

I can't see any reason for the encoding to be ISO-8859-1 when the content of the JSON strings are unicode characters.

Having to specify the charset to UTF-8 on each and every operation in the OpenAPI specification, as I see it:

  • is redundant per the RFCs
  • is error prone
  • it's not the purpose of any specification to be aware of how one of its implementation is behaving
  • adapting any specification to work with one implementation looks counter intuitive
  • there might be several implementations that behave differently, changing the specification for each and every one of the implementations can't scale
  • the organization generating the implementation might not own the specification, and cannot change it

@aollag09
Copy link

aollag09 commented Mar 10, 2023

Any news on this topic ? UTF-8 default encoding for application/json seems a good idea :)

@hpr1999
Copy link

hpr1999 commented Mar 29, 2023

Hey, I also just stumbled over this and we'll have to modify quite a few API-specs due to this, which is bizarre, given UTF-8 should be the default.

I suppose this is especially problematic outside of English speaking countries.
Chars like ü, ä, ö etc. are standard and common here and the server-libraries generated from the same API-specification can't consume ISO-8859-1, as they rightly assume the default UTF-8 will be used.

Would greatly appreciate a fix =)

@i7paradise
Copy link

I stumbled upon this issue, is it possible to solve this in upcoming release ?

@dpulrichth
Copy link

I also just ran into this issue. A fix would be greatly appreciated.

@JavierGH
Copy link

JavierGH commented Jun 9, 2023

I propose to force a content-type with UTF-8 encoding when serializing the json:

public HttpEntity serialize(Object obj, Map<String, Object> formParams, ContentType contentType) throws ApiException {
    String mimeType = contentType.getMimeType();
    if (isJsonMime(mimeType)) {
      try {
        return new StringEntity(objectMapper.writeValueAsString(obj), ContentType.APPLICATION_JSON);
      } catch (JsonProcessingException e) {
        throw new ApiException(e);
      }

@lucjross-favor
Copy link

lucjross-favor commented Jun 9, 2023

I'd like to have a fix for this as well. For now, we're working around it by providing a subclassed ApiClient to each API client. The subclass overrides serialize so that it returns a custom org.apache.hc.core5.http.io.entity.AbstractHttpEntity implementation. That class is a copy of StringEntity, except that in the constructor, the charset defaults to UTF-8 instead of Latin-1.

christiannicola added a commit to christiannicola/oapi-codegen that referenced this issue Sep 29, 2023
This commit fixes an issue with the codegeneration - when specifying a
charset in a JSON content type, the resulting request / response bodies
are missing the JSON serializer.

The charset is usually redundant, since JSON should usually be encoded
using UTF-8 (see https://www.rfc-editor.org/rfc/rfc8259#section-8.1),
however there are some code generators out there that do not honor this
(for example: OpenAPITools/openapi-generator#12797)
ghost pushed a commit to tahiti-web-design/openapi-generator that referenced this issue Dec 13, 2023
wing328 pushed a commit that referenced this issue Dec 15, 2023
* fix(java): apache-httpclient serialization error

fixes following related issue:
#12797

* docs(java): update samples and docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants