Making JsonProcessor stricter so that it does not silently drop data#93179
Conversation
|
Documentation preview: |
|
Hi @masseyke, I've created a changelog YAML for you. |
…elasticsearch into fix/json-processor-too-lenient
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
@elasticmachine run elasticsearch-ci/docs |
mattc58
left a comment
There was a problem hiding this comment.
I left comments mostly around style but the core logic is +1 LGTM to me
| String randomField = randomAlphaOfLength(3); | ||
| String randomTargetField = randomAlphaOfLength(2); | ||
| JsonProcessor jsonProcessor = new JsonProcessor(processorTag, null, randomField, randomTargetField, false, REPLACE, false); | ||
| JsonProcessor jsonProcessor = new JsonProcessor(processorTag, null, randomField, randomTargetField, false, REPLACE, false, true); |
There was a problem hiding this comment.
These lines in this test file make me wonder if we should add a constructor to JsonProcessor which has the default strict behavior, and then have the other constructor for allowing the caller to set it. It would at the very least reduce the changes on this file.
There was a problem hiding this comment.
Yeah I can do that -- I went back and forth over whether it would be simpler to have another constructor or another required argument to the single constructor.
| assertThat(resultList.get(i), closeTo(list.get(i), .001)); | ||
| } | ||
| } | ||
| expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("foo", true, true)); |
There was a problem hiding this comment.
Why are these expected to fail? A comment would help.
| expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("123 foo", true, true)); | ||
| expectThrows(IllegalArgumentException.class, () -> JsonProcessor.apply("45 this is {\"a\": \"json\"}", true, true)); | ||
|
|
||
| { |
There was a problem hiding this comment.
These are all good. Maybe split into a separate test just to make it clear that we're testing for something different than the code above? Basically, maybe the "strict validation" stuff could be its own test.
jbaiera
left a comment
There was a problem hiding this comment.
Changes LGTM, pending green CI of course. Had one question about breaking vs bug classification.
| | `add_to_root` | no | false | Flag that forces the parsed JSON to be added at the top level of the document. `target_field` must not be set when this option is chosen. | ||
| | `add_to_root_conflict_strategy` | no | `replace` | When set to `replace`, root fields that conflict with fields from the parsed JSON will be overridden. When set to `merge`, conflicting fields will be merged. Only applicable if `add_to_root` is set to `true`. | ||
| | `allow_duplicate_keys` | no | false | When set to `true`, the JSON parser will not fail if the JSON contains duplicate keys. Instead, the last encountered value for any duplicate key wins. | ||
| | `strict_json_parsing` | no | true | When set to `true`, the JSON parser will strictly parse the field value. When set to `false`, the JSON parser will be more lenient but also more likely to drop parts of the field value. For example if `strict_json_parsing` is set to `true` and the field value is `123 "foo"` then the processor will throw an IllegalArgumentException. But if `strict_json_parsing` is set to `false` then the field value will be parsed as `123`. |
There was a problem hiding this comment.
With the default being strict, does this count as a breaking change now or are we classifying this as a bug?
There was a problem hiding this comment.
I thought we had agreed it was a bug.
There was a problem hiding this comment.
Couldn't remember. If so, all good!
This PR makes JsonProcessor's JSON parsing a little bit stricter so that we are not silently dropping data when given bad inputs. Previously if the input string began with something that could be parsed as a valid json field, then the processor would grab that and ignore the rest. For example,
123 "foo"would be parsed as123, dropping the"foo". Now by default it will throw an IllegalArgumentException on a string like this. A user can now set thestrict_json_parsingparameter to false to get the old behavior. For example:Closes #92898