Skip to content

Making JsonProcessor stricter so that it does not silently drop data#93179

Merged
elasticsearchmachine merged 7 commits intoelastic:mainfrom
masseyke:fix/json-processor-too-lenient
Jan 24, 2023
Merged

Making JsonProcessor stricter so that it does not silently drop data#93179
elasticsearchmachine merged 7 commits intoelastic:mainfrom
masseyke:fix/json-processor-too-lenient

Conversation

@masseyke
Copy link
Member

@masseyke masseyke commented Jan 23, 2023

This PR makes JsonProcessor's JSON parsing a little bit stricter so that we are not silently dropping data when given bad inputs. Previously if the input string began with something that could be parsed as a valid json field, then the processor would grab that and ignore the rest. For example, 123 "foo" would be parsed as 123, dropping the "foo". Now by default it will throw an IllegalArgumentException on a string like this. A user can now set the strict_json_parsing parameter to false to get the old behavior. For example:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "",
    "processors" : [
      {
        "json" : {
          "field" : "message",
          "strict_json_parsing": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "123 \"foo\""
      }
    }
  ]
}'

Closes #92898

@masseyke masseyke added >bug :Distributed/Ingest Node Execution or management of Ingest Pipelines v8.7.0 labels Jan 23, 2023
@github-actions
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke marked this pull request as ready for review January 24, 2023 20:39
@elasticsearchmachine elasticsearchmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Jan 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke
Copy link
Member Author

@elasticmachine run elasticsearch-ci/docs

Copy link
Contributor

@mattc58 mattc58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments mostly around style but the core logic is +1 LGTM to me

String randomField = randomAlphaOfLength(3);
String randomTargetField = randomAlphaOfLength(2);
JsonProcessor jsonProcessor = new JsonProcessor(processorTag, null, randomField, randomTargetField, false, REPLACE, false);
JsonProcessor jsonProcessor = new JsonProcessor(processorTag, null, randomField, randomTargetField, false, REPLACE, false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines in this test file make me wonder if we should add a constructor to JsonProcessor which has the default strict behavior, and then have the other constructor for allowing the caller to set it. It would at the very least reduce the changes on this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can do that -- I went back and forth over whether it would be simpler to have another constructor or another required argument to the single constructor.

assertThat(resultList.get(i), closeTo(list.get(i), .001));
}
}
expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("foo", true, true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these expected to fail? A comment would help.

expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("123 foo", true, true));
expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("45 this is {\"a\": \"json\"}", true, true));

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good. Maybe split into a separate test just to make it clear that we're testing for something different than the code above? Basically, maybe the "strict validation" stuff could be its own test.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, pending green CI of course. Had one question about breaking vs bug classification.

| `add_to_root` | no | false | Flag that forces the parsed JSON to be added at the top level of the document. `target_field` must not be set when this option is chosen.
| `add_to_root_conflict_strategy` | no | `replace` | When set to `replace`, root fields that conflict with fields from the parsed JSON will be overridden. When set to `merge`, conflicting fields will be merged. Only applicable if `add_to_root` is set to `true`.
| `allow_duplicate_keys` | no | false | When set to `true`, the JSON parser will not fail if the JSON contains duplicate keys. Instead, the last encountered value for any duplicate key wins.
| `strict_json_parsing` | no | true | When set to `true`, the JSON parser will strictly parse the field value. When set to `false`, the JSON parser will be more lenient but also more likely to drop parts of the field value. For example if `strict_json_parsing` is set to `true` and the field value is `123 "foo"` then the processor will throw an IllegalArgumentException. But if `strict_json_parsing` is set to `false` then the field value will be parsed as `123`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default being strict, does this count as a breaking change now or are we classifying this as a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had agreed it was a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't remember. If so, all good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed/Ingest Node Execution or management of Ingest Pipelines Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JSON Processor] Unexpected behavior with payloads starting with integers

4 participants