-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove null
values from Json Spec
#1410
Comments
I'm ok with this, but I'll like to just get @Qard's input as well before we tick the box |
We can just swap |
I'm generally ok with this as IIRC, the Java agent does not send |
I'm not a huge fan of this to be honest. The stricter we make the schema, the harder it becomes to implement it. I'm mostly thinking of people from the community here. If implementing a complying agent becomes an exercise in frustration, people will be much less inclined in contributing. I get that we need to be strict in some places to ensure consistent data, but this doesn't look like one of those places to me. |
this was an interesting read: json-schema-org/json-schema-spec#584 |
Do we have any measurements of the added overhead? If it's a lot it might justify it, @beniwohli? |
@mikker there's also potential for overhead in the agent. E.g. the JSON serializer in Python doesn't have an option to drop keys with |
@beniwohli In the Node agent we currently opt to not set the property in the first place if it's |
@watson I'm not sure, I'd have to comb the code base for an example. In any case, it would make for uglier code if I'd have to do a But again, this isn't really about me or you. We get paid to work on this, so if we have to do some annoying work, so be it. But if e.g. a community member wants to contribute an integration for a Python framework, I'd like to avoid making it a super frustrating experience. I'd have to implement a drop-nulls JSON serialize to ensure that data doesn't get rejected because of a little detail like this, incurring the aforementioned overhead. Generally speaking, I'd really prefer an as-lax-as-possible schema. We had lots of complaints back in the Opbeat days about perfectly good data being rejected because of some small detail. Some data needs to strictly conform to our expectations so we can ensure that e.g. the UI works correctly. Requiring optional attributes to not be null is overshooting that goal by quite a bit IMO. |
I have had similar experiences with other APIs - being overly strict can be very frustrating. That doesn't mean the schema needs to be lax, though. We could instead provide APIs which hide this from instrumentation developers. Allowing nulls in the schema feels strange to me. You can already omit a field, so why encode a null value in the first place? This means bigger payloads; more for the agents to serialise, more for the server to parse. I would be surprised if the overhead is significant, or couldn't be made insignificant, but it's still wasteful. Another issue is around semantics. When I first looked at the schema, I was a bit confused by the possibility of nulls. In Go, a string is a string, and cannot be null; there is no "object" base class like in Java or Python.
There's probably more changes required, but couldn't you do something like this? diff --git a/elasticapm/utils/json_encoder.py b/elasticapm/utils/json_encoder.py
index 9b0f0c59..80b10f73 100644
--- a/elasticapm/utils/json_encoder.py
+++ b/elasticapm/utils/json_encoder.py
@@ -32,6 +32,11 @@ class BetterJSONEncoder(json.JSONEncoder):
return self.ENCODERS[type(obj)](obj)
return super(BetterJSONEncoder, self).default(obj)
+ def encode(self, obj):
+ if type(obj) is dict:
+ obj = {k: v for (k, v) in obj.items() if v is not None}
+ return super(BetterJSONEncoder, self).encode(obj)
+
def better_decoder(data):
return data That keeps the schema strict while freeing instrumentation developers from the burden of None-checking. |
That might work for people writing instrumentation for existing agents, but not for people who write their own agents.
It's not like allowing As for your code example, yes, that's more or less how I'd do it. But as noted further up, this creates a whole new copy of every single dict that we serialize. How is that not wasteful? If y'all agree that this is a good idea (as the checkmarks indicate) , then let's move forward with it. It's not like I got veto rights or anything. I made my point that strictness for strictness sake will lead to a more painful user experience (as we saw countless times in Opbeat support). If the simplification/optimization in the server is worth that, by all means do it. |
Sorry, I was thinking this pattern would lazily compute the pairs, but I suppose it'll create the dict all at once. You could probably do something with a generator instead. Anyways, we can continue this in another thread if we go ahead. |
I agree with @beniwohli and removed my tick as I don't fully agree this change should be made. I don't mean to veto, however. |
I'll just say that @beniwohli's concerns apply to Ruby as well. So you are not alone, Beni. |
@beniwohli In Node.js doing it like Andrew suggest is probably not that performant either. But we deal with it in two other ways: Either we don't set the @simitt as you can see it looks like this might introduce an overhead in the Python agent. Do you happen to have any numbers on how improved the performance will be for the APM Server? |
It is not the This should also not be about having the right to veto or not. The question came up a couple of times internally why we allow null values, as it makes the schema more complicated, so I wanted to see if there is still a need for having null values or not. |
a somewhat crazy idea: we could let the APM Server preprocess incoming data to remove One disadvantage of this approach would be that error messages given from JSON Schema validation could be confusing because the validation happens on a slightly altered document, compared to what the user sent. As far as i can see, it doesn't seem too bad though. If a user sends a
|
I'd definitely rather add the overhead to a Go project than a Ruby one 😀, so 👍 |
@roncohen in that case, would it be possible to not modify the context of |
We've agreed to postpone this change to 7.x were we will make the decision and implement the changes in the agents and server. |
Closing until we can tackle it for next major, tracked in #2631 along with other BC for consideration |
The APM Server JSON specs allow sending
null
values for optional attributes. This handling was motivated by some serializers not removing null values from the output.Allowing null values requires having additional checks and introduces an extra level of complexity in the json spec definition.
The json validation is a time costly part of the APM Server processing and every additional check introduces additional overhead.
For these reasons we propose to deny
null
values for all attributes for Intake API v2.@elastic/apm-agent-devs please tick off if you can handle not sending
null
values in v2, or let us know about the problem otherwise.The text was updated successfully, but these errors were encountered: