Snappy compression #146

scrwtp · 2017-05-20T17:53:20Z

Adding Snappy compression using Snappy.NET.

scrwtp · 2017-05-20T17:56:14Z

build.fsx

@@ -146,7 +146,8 @@ Target "Build" (fun _ ->
 Target "RunTests" (fun _ ->
    !! testAssemblies
    |> NUnit3 (fun p ->
-        { p with TimeOut = TimeSpan.FromMinutes 20.})
+        { p with TimeOut = TimeSpan.FromMinutes 20.
+                 ProcessModel = NUnit3ProcessModel.SingleProcessModel })


Not sure if you want this change - but I couldn't get the tests to run otherwise on my machine.

scrwtp · 2017-05-22T12:47:39Z

Seems this will be a bit more tricky than that. Kafka uses non-standard framing (snappy-java
(prefix "\x82SNAPPY\x0")), so we'd either need Snappy.NET to support that or somehow work around it on kafunk side.

Currently Snappy.NET uses a hardcoded value for that ("sNaPpY").

eulerfx · 2017-05-23T12:08:11Z

@scrwtp do we need framing for this?

scrwtp · 2017-05-24T12:00:54Z

@eulerfx I've just updated the PR with new changes.

I've switched from SnappyStream to plain Compress/Decompress calls, and we also now put the same headers on the message as snappy-java does.

Checked compatibility with reference kafka-console-producer/kafka-console-consumer, looks good both ways.

eulerfx · 2017-05-20T17:57:44Z

src/kafunk/AssemblyInfo.fs

-[<assembly: AssemblyVersionAttribute("0.0.40")>]
-[<assembly: AssemblyFileVersionAttribute("0.0.40")>]
+[<assembly: AssemblyVersionAttribute("0.1.0")>]
+[<assembly: AssemblyFileVersionAttribute("0.1.0")>]


I think your fork might have been an older version?

I think it just wasn't bumped, the fork was created this weekend?

eulerfx · 2017-05-24T12:34:19Z

src/kafunk/Compression.fs

+      let compressed = SnappyCodec.Compress(inputBytes)
+      let output = CompressedMessage.pack compressed
+      createMessage (Binary.ofArray output) CompressionCodec.Snappy
+    with :? IOException as _ex ->


I'd say lets just drop this with here and above since we're not adding any context.

I'm fine dropping it. Don't you want logging there though?

The exception would be caught at a higher level in the code anyway. Adding some context here might be useful, but not necessary IMO.

eulerfx · 2017-05-24T12:43:28Z

src/kafunk/Compression.fs

+    MessageSet.Write (messageVer,ms,BinaryZipper(buf))
+    try
+      let compressed = SnappyCodec.Compress(inputBytes)
+      let output = CompressedMessage.pack compressed


I think it should be possible to avoid copying the array. Looking at SnappyCodec.Compress, it uses GetMaxCompressedLength to estimate the size of the array. What if you use that to allocate the array with the header bytes added to it, then use the other overload of Compress to write into that array?

I'll take a stab at it.

eulerfx · 2017-05-24T12:44:11Z

src/kafunk/Compression.fs

+
+      // write content
+      // NOTE: this will also write content length in the first 4 bytes, this is expected.
+      bz.WriteBytes(Binary.ofArray bytes)


May be worth using an internally copied version of this. BinaryZipper was designed specifically for the Kafka protocol and its possible it will cause an issue down the road if its refactored.

I can do a WriteInt32 followed by WriteBlock, but I'd think one of the tests added would fail if WriteBytes changed. (Compression.Snappy reads messages that are compatible with reference implementation)

I meant to create a function called writeBytes local to the snappy compressor, that would simply do the same thing as WriteBytes. The reason is that WriteBytes might change for another reason at some point, driven by a need from the Kafka protocol, and it can cause an issue here down the road.

Done, no more dependency on BinaryZipper.

# Conflicts: # src/kafunk/Compression.fs

eulerfx · 2017-05-24T18:36:37Z

src/kafunk/Compression.fs

+    let output = CompressedMessage.compress inputBytes 
+
+    createMessage (Binary.ofArray output) CompressionCodec.Snappy
+
  let decompress (messageVer:ApiVersion) (m:Message) =
    let inputBytes = m.value |> Binary.toArray


Can this copy be avoided by passing m.value directly into the binary zipper?

Done - the same fix can be applied in the gzip version.

eulerfx · 2017-05-24T18:43:50Z

src/kafunk/Compression.fs

+      let arr = ArraySegment<byte>(buf.Array, buf.Offset, length)
+      arr, Binary.shiftOffset length buf
+
+    let truncateIfSmaller actualLength maxLength (array: byte []) = 


Rather than copying an array, you can truncate by adjusting the offset on an ArraySegment (Binary module has some helper for this).

I followed what Snappy.NET did, but I guess it can be avoided. Made both compress and decompress here be Binary.Segment - > Binary.Segment.

Yeah looks good. Also here:

let buf = Array.zeroCreate uncompressedLength let actualLength = SnappyCodec.Uncompress(content.Array, content.Offset, content.Count, buf, 0) Binary.truncateIfSmaller actualLength uncompressedLength buf

The call to Binary.truncateIfSmaller should operate on ArraySegment<byte> rather than array directly. This way, when it actually has to truncate, it doesn't call Binary.toArray which creates a new array, but simply adjusts the offsets on the ArraySegment.

It doesn't call Binary.toArray anymore - it does take an array, but it will return a Segment of appropriate length.

Ah sorry I misread ofArray

eulerfx · 2017-05-25T12:30:58Z

Looks good. I've tried this out and it appears to be working, and performs well. I'm going to merge, and the only thing I'll do is I'll "unroll" the Write/Read methods on the SnappyBinaryZipper (much like BinaryZipper is unrolled) to avoid FSharpFunc allocations and virtual calls - this is a critical enough area where we should get low-level.

scrwtp · 2017-05-25T13:30:42Z

I like the way they read, how about making them inline?

eulerfx · 2017-05-25T13:37:12Z

In critical paths you often forego readability/reusability for the sake of performance. Making them inline won't help in this case - the calls to the underlying operations on the Binary module however are already inline if you look at the dissassembly.

Btw, if you look at the IL for the former implementation of WriteInt32, it looks like this:

IL_0001: ldarg.1      // x
IL_0002: newobj       instance void Kafunk.CompressionModule/SnappyModule/WriteInt32@95::.ctor(int32)
IL_0007: stloc.0      // V_0
IL_0008: ldarg.0      // this
IL_0009: ldloc.0      // V_0
IL_000a: ldarg.0      // this
IL_000b: ldfld        valuetype [mscorlib]System.ArraySegment`1<unsigned int8> Kafunk.CompressionModule/SnappyModule/SnappyBinaryZipper::buffer
IL_0010: callvirt     instance !1/*valuetype [mscorlib]System.ArraySegment`1<unsigned int8>*/ class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<valuetype [mscorlib]System.ArraySegment`1<unsigned int8>, valuetype [mscorlib]System.ArraySegment`1<unsigned int8>>::Invoke(!0/*valuetype [mscorlib]System.ArraySegment`1<unsigned int8>*/)
IL_0015: stfld        valuetype [mscorlib]System.ArraySegment`1<unsigned int8> Kafunk.CompressionModule/SnappyModule/SnappyBinaryZipper::buffer
IL_001a: ret

The newobj opcode creates an instance of an FSharpFunc which represents the actual reading of the Int32 - this contains the core of the logic. Then the instantiated object's fields are initialized. After that the call to callvirt invokes the object. Unrolling this into direct calls avoids this object allocation, movements of values to/from heap and the virtual call. The only instantiations that remain are stack allocations of ArraySegment.

Tomasz added 2 commits May 20, 2017 18:36

- Run tests in process

479c603

- Adding snappy compression using Snappy.NET

193a6e7

scrwtp mentioned this pull request May 20, 2017

Snappy compression #125

Closed

scrwtp commented May 20, 2017

View reviewed changes

scrwtp added 2 commits May 24, 2017 12:49

- Another attempt at making snappy work

9917669

- cleaning up unused opens

63bfdb2

eulerfx reviewed May 24, 2017

View reviewed changes

scrwtp added 2 commits May 24, 2017 19:16

- Applying code review feedback

bc6da0d

Merge branch 'master' of https://github.com/scrwtp/kafunk

33fec6b

# Conflicts: # src/kafunk/Compression.fs

eulerfx reviewed May 24, 2017

View reviewed changes

- Dropped some array allocations

a4168e7

eulerfx merged commit 308f3a6 into jet:master May 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snappy compression #146

Snappy compression #146

scrwtp commented May 20, 2017

scrwtp May 20, 2017

scrwtp commented May 22, 2017 •

edited

Loading

eulerfx commented May 23, 2017

scrwtp commented May 24, 2017

eulerfx May 20, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017 •

edited

Loading

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

scrwtp May 24, 2017

eulerfx May 24, 2017

eulerfx commented May 25, 2017

scrwtp commented May 25, 2017

eulerfx commented May 25, 2017

Snappy compression #146

Snappy compression #146

Conversation

scrwtp commented May 20, 2017

Choose a reason for hiding this comment

scrwtp commented May 22, 2017 • edited Loading

eulerfx commented May 23, 2017

scrwtp commented May 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scrwtp May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eulerfx commented May 25, 2017

scrwtp commented May 25, 2017

eulerfx commented May 25, 2017

scrwtp commented May 22, 2017 •

edited

Loading

scrwtp May 24, 2017 •

edited

Loading