Remove `null` values from Json Spec #1410

simitt · 2018-09-27T07:43:36Z

The APM Server JSON specs allow sending null values for optional attributes. This handling was motivated by some serializers not removing null values from the output.

Allowing null values requires having additional checks and introduces an extra level of complexity in the json spec definition.
The json validation is a time costly part of the APM Server processing and every additional check introduces additional overhead.

For these reasons we propose to deny null values for all attributes for Intake API v2.
@elastic/apm-agent-devs please tick off if you can handle not sending null values in v2, or let us know about the problem otherwise.

The text was updated successfully, but these errors were encountered:

watson · 2018-09-27T07:49:17Z

I'm ok with this, but I'll like to just get @Qard's input as well before we tick the box

Qard · 2018-09-27T18:46:46Z

We can just swap null for undefined and it'll get omitted, so works for me. 👍

felixbarny · 2018-09-28T05:31:59Z

I'm generally ok with this as IIRC, the Java agent does not send null values already. But I think it could be a bit surprising, especially to new contributors. I always thought, a field not being present in JSON has the same semantics as if the field is set to null.

beniwohli · 2018-09-28T06:34:45Z

I'm not a huge fan of this to be honest. The stricter we make the schema, the harder it becomes to implement it. I'm mostly thinking of people from the community here. If implementing a complying agent becomes an exercise in frustration, people will be much less inclined in contributing.

I get that we need to be strict in some places to ensure consistent data, but this doesn't look like one of those places to me.

roncohen · 2018-09-28T08:17:54Z

this was an interesting read: json-schema-org/json-schema-spec#584

mikker · 2018-09-28T08:31:27Z

Do we have any measurements of the added overhead? If it's a lot it might justify it, @beniwohli?

beniwohli · 2018-09-28T20:30:39Z

@mikker there's also potential for overhead in the agent. E.g. the JSON serializer in Python doesn't have an option to drop keys with null values, so I'd have to iterate over the whole structure before dumping it, potentially having to copy it while doing so (Python doesn't like changing a dictionary while iterating over it, so you often copy at least the list of keys before iterating).

watson · 2018-09-30T13:39:53Z

@beniwohli In the Node agent we currently opt to not set the property in the first place if it's null. Could something similar be done in the Python agent without too much trouble? Do you have a code example from the agent where a value have the possibility of becoming null today?

beniwohli · 2018-09-30T14:31:23Z

@watson I'm not sure, I'd have to comb the code base for an example. In any case, it would make for uglier code if I'd have to do a None check for all attributes that could be None.

But again, this isn't really about me or you. We get paid to work on this, so if we have to do some annoying work, so be it. But if e.g. a community member wants to contribute an integration for a Python framework, I'd like to avoid making it a super frustrating experience. I'd have to implement a drop-nulls JSON serialize to ensure that data doesn't get rejected because of a little detail like this, incurring the aforementioned overhead.

Generally speaking, I'd really prefer an as-lax-as-possible schema. We had lots of complaints back in the Opbeat days about perfectly good data being rejected because of some small detail.

Some data needs to strictly conform to our expectations so we can ensure that e.g. the UI works correctly. Requiring optional attributes to not be null is overshooting that goal by quite a bit IMO.

axw · 2018-10-01T02:11:07Z

Generally speaking, I'd really prefer an as-lax-as-possible schema. We had lots of complaints back in the Opbeat days about perfectly good data being rejected because of some small detail.

I have had similar experiences with other APIs - being overly strict can be very frustrating. That doesn't mean the schema needs to be lax, though. We could instead provide APIs which hide this from instrumentation developers.

Allowing nulls in the schema feels strange to me. You can already omit a field, so why encode a null value in the first place? This means bigger payloads; more for the agents to serialise, more for the server to parse. I would be surprised if the overhead is significant, or couldn't be made insignificant, but it's still wasteful.

Another issue is around semantics. When I first looked at the schema, I was a bit confused by the possibility of nulls. In Go, a string is a string, and cannot be null; there is no "object" base class like in Java or Python.

the JSON serializer in Python doesn't have an option to drop keys with null values, so I'd have to iterate over the whole structure before dumping it, potentially having to copy it while doing so

There's probably more changes required, but couldn't you do something like this?

diff --git a/elasticapm/utils/json_encoder.py b/elasticapm/utils/json_encoder.py
index 9b0f0c59..80b10f73 100644
--- a/elasticapm/utils/json_encoder.py
+++ b/elasticapm/utils/json_encoder.py
@@ -32,6 +32,11 @@ class BetterJSONEncoder(json.JSONEncoder):
             return self.ENCODERS[type(obj)](obj)
         return super(BetterJSONEncoder, self).default(obj)
 
+    def encode(self, obj):
+        if type(obj) is dict:
+            obj = {k: v for (k, v) in obj.items() if v is not None}
+        return super(BetterJSONEncoder, self).encode(obj)
+
 
 def better_decoder(data):
     return data

That keeps the schema strict while freeing instrumentation developers from the burden of None-checking.

beniwohli · 2018-10-01T07:50:33Z

We could instead provide APIs which hide this from instrumentation developers.

That might work for people writing instrumentation for existing agents, but not for people who write their own agents.

You can already omit a field, so why encode a null value in the first place? This means bigger payloads; more for the agents to serialise, more for the server to parse. I would be surprised if the overhead is significant, or couldn't be made insignificant, but it's still wasteful.

It's not like allowing null values will lead to agents sending nulls all over the place. This is for the occasional null that slips through. Throwing away a whole event because of a tiny little detail that we are perfectly able to parse without ambiguity seems wrong IMO.

As for your code example, yes, that's more or less how I'd do it. But as noted further up, this creates a whole new copy of every single dict that we serialize. How is that not wasteful?

If y'all agree that this is a good idea (as the checkmarks indicate) , then let's move forward with it. It's not like I got veto rights or anything. I made my point that strictness for strictness sake will lead to a more painful user experience (as we saw countless times in Opbeat support). If the simplification/optimization in the server is worth that, by all means do it.

axw · 2018-10-01T08:02:00Z

As for your code example, yes, that's more or less how I'd do it. But as noted further up, this creates a whole new copy of every single dict that we serialize. How is that not wasteful?

Sorry, I was thinking this pattern would lazily compute the pairs, but I suppose it'll create the dict all at once. You could probably do something with a generator instead. Anyways, we can continue this in another thread if we go ahead.

felixbarny · 2018-10-01T08:08:19Z

Throwing away a whole event because of a tiny little detail that we are perfectly able to parse without ambiguity seems wrong IMO.

I agree with @beniwohli and removed my tick as I don't fully agree this change should be made. I don't mean to veto, however.

mikker · 2018-10-01T08:08:43Z

I'll just say that @beniwohli's concerns apply to Ruby as well. So you are not alone, Beni.

watson · 2018-10-01T08:08:54Z

@beniwohli In Node.js doing it like Andrew suggest is probably not that performant either. But we deal with it in two other ways: Either we don't set the null value in the first if we can avoid it, or we use a JSON serializer that can strip away null values. The latter might not be feasible in all languages though. But the former might?

@simitt as you can see it looks like this might introduce an overhead in the Python agent. Do you happen to have any numbers on how improved the performance will be for the APM Server?

simitt · 2018-10-02T07:37:12Z

It is not the null values themselves that introduce additional overhead, but having to add additional requirements when having e.g. conditionally required fields e.g. https://github.com/elastic/apm-server/blob/v2/docs/spec/errors/common_error.json#L50.
I don't have concrete numbers on that, but the other reason to bring this up, is the complexity it introduces in the schema. This can easily lead to requiring a field in the schema, but allowing null values by accident.

This should also not be about having the right to veto or not. The question came up a couple of times internally why we allow null values, as it makes the schema more complicated, so I wanted to see if there is still a need for having null values or not.

roncohen · 2018-10-02T16:04:36Z

a somewhat crazy idea: we could let the APM Server preprocess incoming data to remove null values. This adds overhead in the APM Server, but that is probably better than adding overhead in the agents. It will let us get rid of the overly complicated constructs for handling conditionally required required fields in the JSON Schema. So it would make the JSON Schema simpler and removes some overhead we have in validating documents. Perhaps some of the overhead that would go away, would pay for the overhead of removing nulls in a preprocess step.

One disadvantage of this approach would be that error messages given from JSON Schema validation could be confusing because the validation happens on a slightly altered document, compared to what the user sent. As far as i can see, it doesn't seem too bad though. If a user sends a trace_id: null when it's required, it would look something like:

Problem validating JSON document against schema: I[#] S[#] doesn't validate with "transaction#"
  I[#] S[#/allOf/1] allOf failed
    I[#] S[#/allOf/1/required] missing properties: "trace_id""

mikker · 2018-10-02T17:30:43Z

I'd definitely rather add the overhead to a Go project than a Ruby one 😀, so 👍

watson · 2018-10-02T21:29:36Z

@roncohen in that case, would it be possible to not modify the context of context.custom? I.e. if context.custom.foo was null we would keep it.

alvarolobato · 2018-10-15T14:06:52Z

We've agreed to postpone this change to 7.x were we will make the decision and implement the changes in the agents and server.

jalvz · 2019-08-26T06:51:19Z

Closing until we can tackle it for next major, tracked in #2631 along with other BC for consideration

roncohen mentioned this issue Sep 27, 2018

Intake v2 support #1237

Closed

30 tasks

mikker mentioned this issue Sep 27, 2018

Avoid null for optional values in json payloads elastic/apm-agent-ruby#216

Closed

graphaelli added the [zube]: In Progress label Oct 1, 2018

felixbarny added this to the 7.0 milestone Oct 15, 2018

simitt added [zube]: Ready and removed [zube]: In Progress labels Oct 16, 2018

This was referenced Nov 12, 2018

Remove null and undefined values from apm-server payload elastic/apm-agent-nodejs#673

Open

Remove null and undefined values from apm-server payload elastic/apm-agent-python#321

Closed

This was referenced Nov 12, 2018

Remove null and undefined values from apm-server payload elastic/apm-agent-java#298

Closed

Remove null and undefined values from apm-server payload elastic/apm-agent-go#316

Closed

jalvz removed this from the 7.0 milestone Jan 17, 2019

graphaelli mentioned this issue Aug 26, 2019

Breaking changes for 8.0 #2631

Closed

16 tasks

jalvz closed this as completed Aug 26, 2019

zube bot added [zube]: Done and removed [zube]: Backlog labels Aug 26, 2019

graphaelli removed the [zube]: Done label Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `null` values from Json Spec #1410

Remove `null` values from Json Spec #1410

simitt commented Sep 27, 2018 •

edited by alvarolobato

Loading

watson commented Sep 27, 2018

Qard commented Sep 27, 2018

felixbarny commented Sep 28, 2018 •

edited

Loading

beniwohli commented Sep 28, 2018

roncohen commented Sep 28, 2018

mikker commented Sep 28, 2018

beniwohli commented Sep 28, 2018

watson commented Sep 30, 2018

beniwohli commented Sep 30, 2018

axw commented Oct 1, 2018

beniwohli commented Oct 1, 2018

axw commented Oct 1, 2018

felixbarny commented Oct 1, 2018

mikker commented Oct 1, 2018

watson commented Oct 1, 2018 •

edited

Loading

simitt commented Oct 2, 2018

roncohen commented Oct 2, 2018

mikker commented Oct 2, 2018

watson commented Oct 2, 2018

alvarolobato commented Oct 15, 2018 •

edited

Loading

jalvz commented Aug 26, 2019

Remove null values from Json Spec #1410

Remove null values from Json Spec #1410

Comments

simitt commented Sep 27, 2018 • edited by alvarolobato Loading

watson commented Sep 27, 2018

Qard commented Sep 27, 2018

felixbarny commented Sep 28, 2018 • edited Loading

beniwohli commented Sep 28, 2018

roncohen commented Sep 28, 2018

mikker commented Sep 28, 2018

beniwohli commented Sep 28, 2018

watson commented Sep 30, 2018

beniwohli commented Sep 30, 2018

axw commented Oct 1, 2018

beniwohli commented Oct 1, 2018

axw commented Oct 1, 2018

felixbarny commented Oct 1, 2018

mikker commented Oct 1, 2018

watson commented Oct 1, 2018 • edited Loading

simitt commented Oct 2, 2018

roncohen commented Oct 2, 2018

mikker commented Oct 2, 2018

watson commented Oct 2, 2018

alvarolobato commented Oct 15, 2018 • edited Loading

jalvz commented Aug 26, 2019

Remove `null` values from Json Spec #1410

Remove `null` values from Json Spec #1410

simitt commented Sep 27, 2018 •

edited by alvarolobato

Loading

felixbarny commented Sep 28, 2018 •

edited

Loading

watson commented Oct 1, 2018 •

edited

Loading

alvarolobato commented Oct 15, 2018 •

edited

Loading