[SPARK-20807][SQL] Add compression/decompression of column data to ColumnVector #18033
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds compression/decompression of column data to
ColumnVector.While current
CachedBatchcan compress column data by using of multiple compression schemes,ColumnVectorcannot compress column data. The compression is mandatory for table cache.At first, this PR enables the following schemes. Another JIRA will support compression schemes.
RunLengthEncodingfor boolean/byte/short/int/longBooleanBitSetfor boolean.At high level view, when
ColumnVector.compress()is called, compression is performed from an array for primitive data type to byte array inColumnVector. WhenColumnVector.decompress()is called, decompression is performed from the byte array to the array for primitive data type to byte array inColumnVector.ArrayBufferis used for accessing data during compression or decompression.This PR added and changed the following APIs:
ArrayBufferjava.io.ByteBuffer.ArrayBufferclass can wrap an array for any primitive data type such asArray[Int]orArray[Long]. This class manages current position to be accessed.ColumnType.get(buffer: ArrayBuffer): jvmType, ColumnType.put(buffer: ArrayBuffer)ArrayBuffer.Encoder.gatherCompressibilityStats(in: ArrayBuffer)Encoder.compress(from: ArrayBuffer, to: ArrayBuffer): Unitfromand stores compressed data toto.tohas to have an byte array with enough size for compressed data.Decoder.decompress(values: ArrayBuffer): UnitDecoderby providing its constructor and stores uncompressed data tovalues.tohas to have an byte array with enough size for uncompressed data.How was this patch tested?
Added new test suites