Skip to content

[BUG] Invalid UTF-8 start byte issue #187

@dstepanov25

Description

@dstepanov25

Describe the bug
It's not possible to save item with non-ASCII characters into OpenSearch

To Reproduce
Steps to reproduce the behavior:

  1. Run OpenSearch in a Docker container:

docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" opensearchproject/opensearch:latest

  1. Setup and install logstash-oss-8-5-2 (Windows)
  2. Install logstash-output-opensearch plugin:

<path/to/your/logstash/dir>/bin/logstash-plugin install --version 2.0.0 logstash-output-opensearch

  1. Use below sample code to run the logstash, save file as logstash-example.conf
input {
    stdin { } 
}

filter {
# if you remove letter 'ß' error will dissapear
    mutate { add_field => { "name" =>  "Groß" } }    
    prune { whitelist_names => [ "^name$" ] }
}

output {
    opensearch {
        hosts => ["localhost:9200"]
        auth_type => {
            type => 'basic'
            user => 'admin'
            password => 'admin'
        }
        index => "test_index"
        action => "index"
    }

   # if elasticsearch is instaled, you can make sure that it works properly:
   # elasticsearch {
   #     hosts => ["localhost:9200"]
   #     index => "test_index"
   #     action => "index"
   #     codec => "json"
   # }

}
  1. Run the logstash as:

<path/to/your/logstash/dir>/bin/logstash -f logstash-example.conf

  1. Type any text into std input, press enter
  2. See the error:

"status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse", "caused_by"=>{"type"=>"json_parse_exception", "reason"=>"Invalid UTF-8 start byte 0xa0\n at [Source: (byte[])"{"event":{"original":"\r"},"message":"\r","@timestamp":"2023-01-19T11:07:14.447970Z","name":"Gro�","host":{"hostname":"DESKTOP-SP31NNN"},"@Version":"1"}"; line: 1, column: 98]

WireShark caught next request:
image
you can see that OpenSearch filter encodes letter 'ß' non-correctly (a0)

And here is how request looks like with elasticsearch and Logstash 7.11 output filter:
image

you can see that Elastic filter encodes letter 'ß' correctly (c3 9f)

Expected behavior
There should not be any error and new item with field "name": "Groß" should appears in the index "test_index"

Plugins
logstash-output-opensearch

Screenshots
image

Host/Environment:

  • OS: Windows 11
  • Version 10.0.22621 N/A Build 22621

UPD 23.01.2023
I tried to execute the script for ElasticSearch 8.6.0 (and elasticsearch output plugin) and received the same error.
On the first step run ElasticSearch in a Docker container:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.6.0

Also I tried with Logstash-8-6-0 and the issue is not reproduced.

So, the issue is in Logstash OSS version

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions