protobuf-jackson is a library for efficient marshalling of Protocol Buffer messages to and from JSON. It is based on the streaming API of the Jackson JSON handling library and uses Byte Buddy to generate efficient bytecode at runtime for marshalling specific message types.
This library requires Java 8 to use.
The official Protocol Buffer Java library comes with a util library for marshalling to JSON, using a class called JsonFormat. Unfortunately, JsonFormat is very slow, and its core design of using manual JSON printers / GSON parsers and introspection during serialization time limit how fast it can get.
JSON marshalling is important - it can allow existing REST services to be migrated to Protocol Buffers and makes it easier to evangelize their use in organizations currently tied heavily to JSON. However, slow marshalling prevents these use cases from being practical, so this library aims to allow JSON marshalling to be fast enough for production. It will never be as good as binary serialization, but should be about as fast as standard REST API libraries.
If the protoc compiler supports generating source code for JSON marshalling, this library will likely be deprecated.
Include protobuf-jackson as a dependency. Gradle users can add
implementation("org.curioswitch.curiostack:protobuf-jackson:2.2.0")
to their dependencies.
The API is exposed in MessageMarshaller
. Use its builder to configure options and register
specific Protocol Buffer message types you will marshall. Any other types will not be marshallable
by the built MessageMarshaller
.
class MessageApi {
private static final MessageMarshaller marshaller = MessageMarshaller.builder()
.register(ApiRequest.getDefaultInstance())
.register(ApiResponse.getDefaultInstance())
.build();
String coolApi(String jsonRequest) {
ApiRequest.Builder request = ApiRequest.newBuilder();
marshaller.mergeValue(jsonRequest, request);
ApiResponse response = processRequest(request);
return marshaller.writeValueAsString(response);
}
}
protobuf-jackson introspects registered message types during bytecode generation time and generates code specific to the type. Unlike JsonFormat, this means that during actually marshalling, there is no introspection of the message type at all - it's already been done during registration. In addition, messages are accessed through their standard Java generated API, no reflection is needed. No reflection allows the JIT to optimize generated machine code as much as possible. By generating bytecode, we approach the same speed that could be achieved by generating the source code itself.
All of the same tests as JsonFormat
(besides the differences listed below) pass, so
protobuf-jackson should be mostly compatible with upstream and ready for production.
Differences with JsonFormat are
-
No support for
DynamicMessage
. The library is designed to interact with generated code for optimal performance. If you don't know whatDynamicMessage
is, you are not using it. If you do, stick withJsonFormat
. -
Does not support parsing Any messages where
@type
is not the first field. -
Correctly handles invalid array syntax.
JsonFormat
does not due to its dependency on GSON. See JsonFormatTest. -
Correctly rejects invalid JSON which specifies the same field twice.
JsonFormat
does not due to its dependency on GSON. See JsonFormatTest -
Fields already present in the passed in
Message.Builder
will be overwritten, rather than causing the parse to fail, as is usual with merge semantics.
First, a synthetic benchmark that uses the upstream TestAllTypes
proto which contains all
types of fields.
Benchmark Mode Cnt Score Error Units
JsonParserBenchmark.codegenJson thrpt 50 114260.817 ± 1162.079 ops/s
JsonParserBenchmark.fromBinary thrpt 50 448586.946 ± 7946.499 ops/s
JsonParserBenchmark.upstreamJson thrpt 50 43392.884 ± 715.184 ops/s
JsonSerializeBenchmark.codegenJson thrpt 50 204768.672 ± 2459.787 ops/s
JsonSerializeBenchmark.toBytes thrpt 50 1071878.319 ± 12628.059 ops/s
JsonSerializeBenchmark.upstreamJson thrpt 50 49941.606 ± 707.045 ops/s
protobuf-jackson generally shows a 4x speed up in serialization and 3x speed up in parsing vs upstream JSON parsing. It is about 4-5x slower than protobuf binary marshalling.
For a more real-world example, the JSON response of the GitHub search API was used. A proto file corresponding to the format was made to model what an actual data exchange may look like. This is just one example, and other examples may have different results, but hopefully the trends are similar
Benchmark Mode Cnt Score Error Units
GithubApiBenchmark.marshallerParseString thrpt 50 1696.653 ± 170.891 ops/s
GithubApiBenchmark.marshallerWriteString thrpt 50 3042.433 ± 172.438 ops/s
GithubApiBenchmark.protobufParseBytes thrpt 50 4599.390 ± 56.570 ops/s
GithubApiBenchmark.protobufToBytes thrpt 50 21315.212 ± 210.659 ops/s
GithubApiBenchmark.upstreamParseString thrpt 50 472.902 ± 6.455 ops/s
GithubApiBenchmark.upstreamWriteString thrpt 50 717.227 ± 8.834 ops/s
For both parsing and serializing, protobuf-jackson is about 4x faster.
One other way to look at these results is that JSON marshalling is now less than an order of magnitude different than protobuf binary marshalling, which was not the case before. While binary marshalling will always be preferred, for use cases that demand it, JSON should also be acceptable in production.