Skip to content

PQ could benefit from event-level compression #17819

@yaauie

Description

@yaauie

As write throughput is bound by disk IO, compressing events during serialization could improve throughput at the cost of CPU (see: proof of concept).

If possible, per-event compression should be delivered inside the scope of existing v2 PQ page format, in which entries contain only seqnum+length+N bytes. To do this, the reader will need to be able to handle compressed or uncompressed bytes without additional context (e.g., by differentiating zlib header from existing CBOR first-bytes).

Because not all users will want to spend CPU for increased throughput, and because of later-mentioned rollback barriers, this feature should first be delivered as opt-in, preferably at a per-pipeline level.

Compatibility Considerations

Once a queue contains compressed events, it will be unable to be read by a logstash instance that does not support event decompression; this presents an undesired rollback barrier that would prevent a user from rolling back to a last known-working configuration due to an unrelated issue.

Queue compression should be implemented as opt-in until at least three minor versions have shipped with decompression support.

Design Requirements

  1. compression is opt-in for at least 2 minor releases
  2. reads compressed events from queue unless explicitly configured otherwise
  3. include metrics in pipeline.${pipeline_id}.queue.compression namespace:
    name definition expected value range
    encode.spend.lifetime (encode_time / uptime) [0,(N_CPUS)]
    encode.ratio.lifetime (compressed_bytes / decompresssed_bytes) [0,1]
    decode.spend.lifetime (decode_time / uptime) [0,(N_CPUS)]
    decode.ratio.lifetime (decompressed_bytes / compresssed_bytes) [1,)

Sub-issues

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions