-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance overhead of encoding/redacting #144
Comments
Hello @PragTob, Indeed this is a hard tradeoff, I think we might drop We had pretty large projects using LoggerJSON without an issue (DB and IO are still the bottleneck in 99% of the cases). I will take a look into it anyways. BTW can you please share |
@AndrewDryga the benchmark is linked in the issue right beneath the benchmarking results. How would I ever post a benchmark without giving people the possibility to rerun it? 😁 And yeah, that's what I expected and to be fair in that benchmark you'll see that it's still faster than sending the data through the default formatter (which surprised me... still wondering if I'm missing a config option there). So yeah, I don't think application holistically it matters. We're talking about microseconds here. I'm just me and so these things interest me & I look for ways to make it faster :) |
👋 One small but easy "win" might be to use Jason.Fragments to avoid re-visiting branches that were already safe-checked and encoded. |
@ruslandoga teah, the older package version did this but the performance benefit is negligible and, as I remember, you can't use it with anything that contains an unescaped value. |
The branch can be encoded before being wrapped in a fragment. defmodule LoggerJSON do
defmodule Log do
@moduledoc false
defstruct [:fields]
defimpl Jason.Encoder do
def encode(%{fields: fields}, opts) do
Jason.Encode.keyword(fields, opts)
end
end
end
# ...
# process some nested map
def process_value(k, v) when is_map(v) do
fields = process_kv(Map.to_list(map))
nested =
%Log{fields: fields}
|> Jason.encode_to_iodata!()
|> Jason.Fragment.new() # now Jason would skip this whole branch on "root" encode step
{k, nested}
end
# ...
end I'm exploring this approach in plausible/analytics#4855 and it's showing x1.5+ LoggerJSON performance. However, as of right now, it's not a fair comparison since I'm not doing invalid UTF8 escaping and I'm skipping non-keyword lists but I think there are still some ideas there that could potentially improve performance if applied to LoggerJSON. I'd be happy to open a few small micro-optimisation PRs if there is interest :) Benchmark resultsCode# now = DateTime.utc_now()
inputs = [
{"just a msg", {:string, "This is just some elaborate message"}},
{"some map",
{:report,
%{
message: "some other weirdo message",
# time: DateTime.utc_now(),
http_meta: %{
status: 500,
method: "GET"
# headers: [["what", "eva"], ["some-more", "stuff"]]
}
}}},
{"bigger_map",
{:report,
%{
"users" => %{
"user_1" => %{
"name" => "Alice",
"age" => 30,
"preferences" => %{
"theme" => "dark",
"language" => "English",
"notifications" => %{
"email" => true,
"sms" => false,
"push" => true
}
}
# "tags" => ["developer", "team_lead"]
},
"user_2" => %{
"name" => "Bob",
"age" => 25,
"preferences" => %{
"theme" => "light",
"language" => "French",
"notifications" => %{
"email" => true,
"sms" => true,
"push" => false
}
}
# "tags" => ["designer", "remote"]
}
},
"settings" => %{
"global" => %{
"timezone" => "UTC",
"currency" => :usd,
"support_contact" => "[email protected]"
},
"regional" => %{
"US" => %{
"timezone" => "America/New_York",
"currency" => :usd
},
"EU" => %{
"timezone" => "Europe/Berlin",
"currency" => "EUR"
}
}
},
"analytics" => %{
"page_views" => %{
"home" => 1200,
"about" => 450,
"contact" => 300
},
"user_sessions" => %{
"total" => 2000,
"active" => 150
}
}
}}}
]
redactors = []
{_, default_formatter_config} = Logger.Formatter.new(colors: [enabled?: false])
Benchee.run(
%{
"just Jason" => fn input ->
Jason.encode_to_iodata!(elem(input, 1))
end,
"just :json" => fn input ->
:json.encode(elem(input, 1))
end,
# "logger_json encode" => fn input ->
# %{message: LoggerJSON.Formatter.RedactorEncoder.encode(input, redactors)}
# end,
"whole LoggerJSON format" => fn input ->
LoggerJSON.Formatters.Basic.format(%{level: :info, meta: %{}, msg: input}, [])
end,
"whole Plausible.Logger.JSONFormatter format" => fn input ->
Plausible.Logger.JSONFormatter.format(
%{level: :info, meta: %{}, msg: input},
default_formatter_config
)
end,
# odd that those 2 end up being the slowest - what additional work are they doing?
"default formatter with report data (sanity check)" => fn input ->
Logger.Formatter.format(
%{level: :info, meta: %{}, msg: input},
default_formatter_config
)
end
},
warmup: 0.1,
time: 1,
inputs: inputs
# profile_after: true
)
|
@ruslandoga I'd add in a variant with just the UTF-8 checking removed, depending on input I expect this to be the biggest chunk but I haven't profiled it yet although it would be trivial :) |
Some notes:
|
👋
Not sure if this is a known issue but the performance overhead of the code in
RedactorEncode
is quite significant - even without any redactors as it traverses the entire data structure.As in, I'd have expected the JSON encoding to be the most expensive formatting operation in the formatter - but in 2/3 scenarios the encoding & redacting (without any redactors) is ~twice as slow as the Jason encoding:
Full benchmark (with some more logger stuff) here: https://github.com/PragTob/elixir_playground/blob/main/bench/logger_json_overhead.exs
It's still fast enough (I think) - it just feels odd.
I haven't done a profile to see what actually eats the most time (my suspicion is that the
String.valid?
andString.printable?
checks are expensive).A solution might be a more lightweight formatter that is more "use at your own risk" (i.e. expects you to do the work previously if necessary: redact things yourself beforehand, doesn't convert tuples, check string validity) but I suppose that might be its own project then almost. I also don't know how often these situations occur :)
The text was updated successfully, but these errors were encountered: