-
Notifications
You must be signed in to change notification settings - Fork 343
Description
Problem
JDK serialisation used by security plugin to serialize and deserialize various headers is slow.
Proposal
This is a proposal to change the implementation of Base64Helper::serializeObject and Base64Helper::deserializeObject to use a faster serialization protocol. I explored Fast Serialization, Protostuff, Kryo, Avro, and OpenSearch's Custom Serialization as alternatives to JDK serialization and ran a few benchmarks. Results are attached below.
Benchmarking Environment
Framework used - JMH, 1000 warm-up iterations, 30000 test iterations
EC2 InstanceType - c5.2xlarge
JDK - Corretto JDK 11
OS - Amazon Linux 2 x86_64
| Type | User | User | User | InetSocketAddress | InetSocketAddress | InetSocketAddress | SourceFieldContext | SourceFieldContext | SourceFieldContext | User | User | User | InetSocketAddress | InetSocketAddress | InetSocketAddress | SourceFieldContext | SourceFieldContext | SourceFieldContext |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Operation | deserialize | deserialize | deserialize | deserialize | deserialize | deserialize | deserialize | deserialize | deserialize | serialize | serialize | serialize | serialize | serialize | serialize | serialize | serialize | serialize |
| Stat | Avg Time (ns/op) | Error +/- ns/op | Diff % | Avg Time (ns/op) | Error +/- ns/op | Diff % | Avg Time (ns/op) | Error +/- ns/op | Diff % | Avg Time (ns/op) | Error +/- ns/op | Diff % | Avg Time (ns/op) | Error +/- ns/op | Diff % | Avg Time (ns/op) | Error +/- ns/op | Diff % |
| Java | 26062.709 | 847.012 | 9732.072 | 309.654 | 7892.943 | 333.835 | 10370.249 | 319.919 | 4749.54 | 168.423 | 4023.138 | 146.527 | ||||||
| FST | 4299.802 | 251.09 | -83.50209 | 3957.335 | 287.201 | -59.33718 | 2168.463 | 66.373 | -72.52656 | 3104.632 | 161.298 | -70.06213 | 2578.204 | 115.172 | -45.71676 | 1427.189 | 63.018 | -64.52548 |
| FST (Pre) | 3674.455 | 133.466 | -85.90148 | 3417.478 | 134.756 | -64.88437 | 868.976 | 48.215 | -88.99047 | 2899.691 | 131.584 | -72.03837 | 2368.224 | 101.214 | -50.13782 | 756.986 | 38.476 | -81.18419 |
| Proto | 808.423 | 40.851 | -96.89816 | 1003.155 | 29.785 | -87.29048 | 1423.777 | 59.772 | -86.27056 | 1138.412 | 70.829 | -71.70338 | ||||||
| Custom (OpenSearch) | 834.74 | 56.749 | -96.79719 | 834.987 | 30.013 | -89.42109 | 1115.154 | 69.707 | -89.2466 | 1123.486 | 37.035 | -72.07439 | ||||||
| Kryo (Pre) | 1274.085 | 20.928 | -86.90839 | 1544.436 | 55.018 | -67.48241 | 55.018 |
- Though FST is highly performant, simplest to use amongst all, it comes with its own shortcomings. FST no longer seems to be actively maintained with last commit made 2yrs ago and 102 open issues, history of breaking changes even with minor version upgrades.
- Protostuff too is highly performant, but will need explicit handling for certain classes such as InetSocketAddress by writing Delegates. Protostuff too doesn't seem to be actively maintained, last commit was 1yr ago.
- Kryo does not work out of the box. Kryo does not work with classes with no zero-arg constructors. We'll have to write serializers. Discovered that for complex objects for eg.
java.util.Collections$SynchronizedMapwe'll have to register separate serializers. There's a repo kryo-serializers that has many such serializers that we can use. Given we already have highly optimised custom serialization framework (StreamOutput,StreamInput) within OpenSearch, expending effort to integrate with another library seems unnecessary. - Custom serialization using OpenSearch's
BytesStreamOutputandBytesStreamInputclasses is a promising approach. It too is highly performant. For the classes that are defined within security plugin such asUser,SourceFieldsContext-Writeableinterface can be implemented. For classes such asInetSocketAddresswhich we cannot change, we'll have to add Writers and Read methods to theStreamOutputandStreamInputclasses to be able to usewriteGenericObjectandreadGenericObjectmethods. This is inline with how OpenSearch deals with third party classes today. [source code]
To conclude, we propose to use custom serialization for headers in security plugin.
Solution
This change is to proposed to be introduced with OS 3.0 with no intention to backport this. We can break down the solution into following action items -
- Code change in OpenSearch's
StreamInput,StreamOutputclasses to add Writers and Read methods respectively for third party classes directly involved in serialization within security plugin. [will update the list below]- InetSocketAddress
- Re-implement
Base4Helper::serializeandBase64Helper.deserializemethods to use custom serialization. - Handle communication b/w old and new nodes during version upgrade
- Introduce safe class checks for the alternative (de)serialization implementation (this may no longer be needed as unsupported classes will fail to be serialized)
- End to end testing, especially the version upgrade scenario
- Run OSB tests to see how the various throughputs/latencies change (exploring different workloads where the impact would be much more pronounced, encountering high variance for the tests already performed)
- Finalise the OS version in which the change will be released (version code be used in the version upgrade handling logic to identify old nodes)
I've raised an initial draft PR for serialization using protostuff and working towards testing the version upgrade scenario (from OS2.5 to OS2.7). Currently, the change is assumed to be introduced as part of OS2.7 release for testing purpose. We may need to bump up this version.
Will raise another PR with custom serialization.
Next Steps
- Review the benchmarks and maybe explore any other potential alternatives.