More CRAM Encoding tests + updates #1203

jmthibault79 · 2018-10-25T16:04:56Z

Description

External*CodecTests
ByteArrayLenCodec test
ITF8 and LTF8 updates

Checklist

Code compiles correctly
New tests covering changes and new functionality
All tests passing
Extended the README / documentation, if necessary
Is not backward compatible (breaks binary or source compatibility)

jmthibault79 · 2018-10-25T16:06:26Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

@@ -41,6 +69,48 @@ public static int readUnsignedITF8(final InputStream inputStream) throws IOExcep
        return ((b1 & 15) << 28) | inputStream.read() << 20 | inputStream.read() << 12 | inputStream.read() << 4 | (15 & inputStream.read());
    }

+    /**


Moved up to group the read methods together

jmthibault79 · 2018-10-25T16:06:45Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

-     * @param value the value to be written out
-     * @return the bytes holding ITF8 representation of the value
-     */
-    public static byte[] writeUnsignedITF8(final int value) {


moved down to group the write methods together

jmthibault79 · 2018-10-25T16:07:12Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

    /**
     * Writes an unsigned (32 bit) integer to an {@link OutputStream} encoded as ITF8. The sign bit is interpreted as a value bit.
     *
     * @param value the value to be written out
     * @param buffer   the {@link ByteBuffer} to write to
     */
-    public static void writeUnsignedITF8(final int value, final ByteBuffer buffer) {
+    public static int writeUnsignedITF8(final int value, final ByteBuffer buffer) {


easy to return the length, so why not

jmthibault79 · 2018-10-25T16:08:01Z

src/main/java/htsjdk/samtools/cram/structure/EncodingID.java

@@ -46,7 +46,7 @@
     */
    BYTE_ARRAY_STOP,
    /**
-     * http://en.wikipedia.org/wiki/Beta_Code


A different unrelated Beta Code

Hmm..good to know....

codecov-io · 2018-10-25T16:22:23Z

Codecov Report

Merging #1203 into master will increase coverage by 0.018%.
The diff coverage is 93.878%.

@@              Coverage Diff               @@
##              master    #1203       +/-   ##
==============================================
+ Coverage     68.712%   68.73%   +0.018%     
- Complexity      8062     8068        +6     
==============================================
  Files            542      542               
  Lines          32728    32734        +6     
  Branches        5537     5537               
==============================================
+ Hits           22488    22498       +10     
+ Misses          8041     8039        -2     
+ Partials        2199     2197        -2

Impacted Files	Coverage Δ	Complexity Δ
...ava/htsjdk/samtools/cram/structure/EncodingID.java	`100% <ø> (ø)`	`1 <0> (ø)`	⬇️
...ain/java/htsjdk/samtools/cram/io/CramIntArray.java	`88.889% <ø> (ø)`	`4 <0> (ø)`	⬇️
...k/samtools/cram/encoding/ExternalIntegerCodec.java	`58.333% <0%> (+8.333%)`	`3 <0> (ø)`	⬇️
.../samtools/cram/build/CompressionHeaderFactory.java	`90.876% <100%> (ø)`	`64 <0> (ø)`	⬇️
...htsjdk/samtools/cram/encoding/EncodingFactory.java	`34.483% <100%> (ø)`	`10 <0> (ø)`	⬇️
...ng/huffman/codec/CanonicalHuffmanByteEncoding.java	`43.75% <50%> (ø)`	`5 <0> (?)`
...huffman/codec/CanonicalHuffmanIntegerEncoding.java	`85.714% <50%> (ø)`	`10 <1> (?)`
src/main/java/htsjdk/samtools/cram/io/ITF8.java	`94.505% <94.366%> (-2.046%)`	`23 <17> (+1)`
src/main/java/htsjdk/samtools/cram/io/LTF8.java	`96.032% <96.429%> (-1.509%)`	`19 <19> (+1)`
...va/htsjdk/samtools/sra/SRAIndexedSequenceFile.java	`62.162% <0%> (-2.703%)`	`6% <0%> (-1%)`
... and 5 more

cmnbroad

I'm still making my way through this and am not finished yet, just checkpointing my questions so far.

cmnbroad · 2018-10-25T20:43:44Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

+ *      write out [bits 13-20]
+ *      write out [bits 5-12]
+ *      write out [bits 1-8]
+ *


Thanks for the doc...

cmnbroad · 2018-10-25T20:46:19Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

 */
 public class ITF8 {

+    public static final int MAX_BYTES = 5;
+    public static final int MAX_BITS = 8 * MAX_BYTES;
+


Can these be package-protected (no access modifier) rather than public, since they're only used by tests ? Or are they useful outside of this package ?

They could be useful outside of this? But I have no specific use case, so I can make them pkg-priv

cmnbroad · 2018-10-25T21:04:48Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

+
+        return value;
+    }
+


The alternative factoring to share a single implementation across the different source types that I mentioned when we talked can be seen here. Since you can't use primitive types a generic type param in Java, it requires boxing (on the other hand, small integer values are interned), so I'd be inclined to not actually make this change now since it might have perf implications.

One other interesting issue that I was reminded of when I did this was how the cram code propagates checked exceptions all over the place. Most of the rest of htsdjk wraps checked exceptions in unchecked exceptions so they don't have to be declared and caught everywhere. I'd propose that we start migrating the cram code to the htsjdk pattern, and have these methods catch IOException and re-throw it wrapped in a RuntimeIOException. Then we can start removing all of the throws clauses and try/catch blocks as we move up the stack.

cmnbroad · 2018-10-26T12:54:27Z

src/main/java/htsjdk/samtools/cram/io/LTF8.java

 */
 public class LTF8 {
+
+    public static final int MAX_BYTES = 9;
+    public static final int MAX_BITS = 8 * MAX_BYTES;


Same comment about package protected here as above.

cmnbroad · 2018-10-26T12:54:39Z

src/main/java/htsjdk/samtools/cram/structure/EncodingID.java

@@ -46,7 +46,7 @@
     */
    BYTE_ARRAY_STOP,
    /**
-     * http://en.wikipedia.org/wiki/Beta_Code


Hmm..good to know....

cmnbroad · 2018-10-26T15:35:49Z

src/test/java/htsjdk/samtools/cram/encoding/ByteArrayLenCodecTest.java

+import java.util.HashMap;
+import java.util.Map;
+
+public class ByteArrayLenCodecTest extends HtsjdkTest {


This test class is called ByteArrayLenCodecTest, but ByteArrayLenCodec is a private class inside of ByteArrayLenEncoding. That makes me think this should be called `ByteArrayLenEncodingTest. But I'm not sure if the tests should change, or the actual classes ? Is this is part of the weirdness you were talking about yesterday ?

cmnbroad · 2018-10-29T14:53:05Z

src/test/java/htsjdk/samtools/cram/encoding/ByteArrayLenCodecTest.java

+            bale.fromByteArray(byteArrayOutputStream.toByteArray());
+            return bale;
+        }
+    }


@jmthibault79 It seems like this code is very similar to the actual code in ByteArrayLenEncoding (but its slightly different ?) I wonder if we shouldn't refactor ByteArrayLenEncoding and ByteArrayLenCodec so we can test them directly instead ? Or is that what you're planning to do in the next round ? It almost seems like we should wait for that if its the latter.

ITF8 and LTF8 updates

cmnbroad

Thanks for doing this - seeing these tests certainly helped me get some insight into the existing class modeling (!) Just a couple of questions and minor comments - the new tests look like a big improvement.

cmnbroad · 2018-10-30T15:32:48Z

src/main/java/htsjdk/samtools/cram/encoding/ExternalIntegerCodec.java

-        } catch (final IOException e) {
-            // this should never happened but still:
-            throw new RuntimeException(e);
-        }


Yay thanks.

cmnbroad · 2018-10-30T15:41:37Z

src/test/java/htsjdk/samtools/cram/encoding/ExternalByteArrayCodecTest.java

+    public void codecTest(final byte[] values) throws IOException {
+
+        try (final ByteArrayOutputStream os = new ByteArrayOutputStream()) {
+            final BitCodec<byte[]> writeCodec = new ExternalByteArrayCodec(os, null);


So for my own education, the External in ExternalByteArrayCodec indicates that its writes to external blocks (?), which implies that its byte-oriented as opposed to bit-oriented ? But the class implements BitCodec, so it has to take ByteArrayOutputStream and ByteArrayInputStream as constructor side-inputs since it doesn't consume the BitStreams defined by the interface methods ? Wow.

Yes, exactly.

Separating these is one of the goals of an upcoming refactor sweep.

cmnbroad · 2018-10-30T19:52:17Z

src/main/java/htsjdk/samtools/cram/io/CramIntArray.java

@@ -15,7 +15,7 @@
     * @return array of integers from the input stream
     * @throws IOException as per java IO contract
     */
-    public static int[] array(final InputStream inputStream) throws IOException {
+    public static int[] array(final InputStream inputStream) {


Javadoc @throws can be removed for these now.

good catch, thanks

cmnbroad · 2018-10-30T19:52:29Z

src/main/java/htsjdk/samtools/cram/io/CramIntArray.java

@@ -32,7 +32,7 @@
     * @return the number of bits written out
     * @throws IOException as per java IO contract
     */
-    public static int write(final int[] array, final OutputStream outputStream) throws IOException {
+    public static int write(final int[] array, final OutputStream outputStream) {


Same as above.

cmnbroad · 2018-10-30T20:10:13Z

src/main/java/htsjdk/samtools/cram/io/ITF8.java

+     * @return the bytes holding ITF8 representation of the value
+     */
+    public static byte[] writeUnsignedITF8(final int value) {
+        final ByteBuffer buffer = ByteBuffer.allocate(10);


I know this code was just moved from elsewhere, but is 10 meaningful here ? Although harmless, it seems like random over-allocation by half. Is there any way possible this can use more than the max possible # of bytes in an ITF8 encoding for an int, which is 5?. Can this use the new MAX_BYTES constant ?

That makes sense to me. I'll see how the tests respond.

cmnbroad · 2018-10-30T20:15:24Z

src/test/java/htsjdk/samtools/cram/io/IOTestCases.java

+            list.add((byte)(1 << i));
+            list.add((byte)(-(1 << i) + 2));
+            list.add((byte)(-(1 << i) + 1));
+            list.add((byte)-(1 << i));


cmnbroad

LGTM now. On to the refactoring!

lbergelson

Looks good to me. Thank you @jmthibault79 and @cmnbroad

jmthibault79 commented Oct 25, 2018

View reviewed changes

jmthibault79 requested review from cmnbroad and lbergelson October 25, 2018 16:13

cmnbroad reviewed Oct 26, 2018

View reviewed changes

cmnbroad reviewed Oct 29, 2018

View reviewed changes

jmthibault79 added 7 commits October 29, 2018 14:29

Remove incorrect Beta Code link

7bd259b

rename Huffman*Encoding to CanonicalHuffman*Encoding

8bbda87

External*CodecTests

d06de29

ITF8 and LTF8 updates

make ITF8/LTF8 consts pkg-priv

05a77d7

ITF8/LTF8: rewrap IOException as RuntimeIOException

50d42e1

RuntimeEOFException

5fbd2ce

final all the things

a058888

jmthibault79 force-pushed the jt_encoding_ut branch from 89651d2 to a058888 Compare October 29, 2018 19:19

cmnbroad requested changes Oct 30, 2018

View reviewed changes

jmthibault79 added 2 commits October 31, 2018 11:08

rm throws IOException

4cafed7

MAX_BYTES

2a1ecbf

cmnbroad approved these changes Oct 31, 2018

View reviewed changes

lbergelson self-assigned this Nov 5, 2018

lbergelson approved these changes Nov 6, 2018

View reviewed changes

lbergelson merged commit 47bda38 into master Nov 6, 2018

lbergelson deleted the jt_encoding_ut branch November 6, 2018 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More CRAM Encoding tests + updates #1203

More CRAM Encoding tests + updates #1203

jmthibault79 commented Oct 25, 2018 •

edited

Loading

jmthibault79 Oct 25, 2018

jmthibault79 Oct 25, 2018

jmthibault79 Oct 25, 2018

jmthibault79 Oct 25, 2018

cmnbroad Oct 26, 2018

codecov-io commented Oct 25, 2018 •

edited

Loading

cmnbroad left a comment

cmnbroad Oct 25, 2018

cmnbroad Oct 25, 2018

jmthibault79 Oct 29, 2018

cmnbroad Oct 25, 2018

cmnbroad Oct 26, 2018

cmnbroad Oct 26, 2018

cmnbroad Oct 26, 2018

cmnbroad Oct 29, 2018

cmnbroad left a comment

cmnbroad Oct 30, 2018

cmnbroad Oct 30, 2018

jmthibault79 Oct 31, 2018

cmnbroad Oct 30, 2018

jmthibault79 Oct 31, 2018

cmnbroad Oct 30, 2018

cmnbroad Oct 30, 2018

jmthibault79 Oct 31, 2018

cmnbroad Oct 30, 2018

cmnbroad left a comment

lbergelson left a comment

More CRAM Encoding tests + updates #1203

More CRAM Encoding tests + updates #1203

Conversation

jmthibault79 commented Oct 25, 2018 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Oct 25, 2018 • edited Loading

Codecov Report

cmnbroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmnbroad left a comment

Choose a reason for hiding this comment

lbergelson left a comment

Choose a reason for hiding this comment

jmthibault79 commented Oct 25, 2018 •

edited

Loading

codecov-io commented Oct 25, 2018 •

edited

Loading