-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
As write throughput is bound by disk IO, compressing events during serialization could improve throughput at the cost of CPU (see: proof of concept).
If possible, per-event compression should be delivered inside the scope of existing v2 PQ page format, in which entries contain only seqnum+length+N bytes. To do this, the reader will need to be able to handle compressed or uncompressed bytes without additional context (e.g., by differentiating zlib header from existing CBOR first-bytes).
Because not all users will want to spend CPU for increased throughput, and because of later-mentioned rollback barriers, this feature should first be delivered as opt-in, preferably at a per-pipeline level.
Compatibility Considerations
Once a queue contains compressed events, it will be unable to be read by a logstash instance that does not support event decompression; this presents an undesired rollback barrier that would prevent a user from rolling back to a last known-working configuration due to an unrelated issue.
Queue compression should be implemented as opt-in until at least three minor versions have shipped with decompression support.
Design Requirements
- compression is opt-in for at least 2 minor releases
- reads compressed events from queue unless explicitly configured otherwise
- include metrics in
pipeline.${pipeline_id}.queue.compressionnamespace:name definition expected value range encode.spend.lifetime( encode_time/uptime)[ 0,(N_CPUS)]encode.ratio.lifetime(compressed_bytes / decompresssed_bytes) [ 0,1]decode.spend.lifetime( decode_time/uptime)[ 0,(N_CPUS)]decode.ratio.lifetime(decompressed_bytes / compresssed_bytes) [ 1,)