-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More CRAM Encoding tests + updates #1203
Conversation
@@ -41,6 +69,48 @@ public static int readUnsignedITF8(final InputStream inputStream) throws IOExcep | |||
return ((b1 & 15) << 28) | inputStream.read() << 20 | inputStream.read() << 12 | inputStream.read() << 4 | (15 & inputStream.read()); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up to group the read methods together
* @param value the value to be written out | ||
* @return the bytes holding ITF8 representation of the value | ||
*/ | ||
public static byte[] writeUnsignedITF8(final int value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved down to group the write methods together
/** | ||
* Writes an unsigned (32 bit) integer to an {@link OutputStream} encoded as ITF8. The sign bit is interpreted as a value bit. | ||
* | ||
* @param value the value to be written out | ||
* @param buffer the {@link ByteBuffer} to write to | ||
*/ | ||
public static void writeUnsignedITF8(final int value, final ByteBuffer buffer) { | ||
public static int writeUnsignedITF8(final int value, final ByteBuffer buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easy to return the length, so why not
@@ -46,7 +46,7 @@ | |||
*/ | |||
BYTE_ARRAY_STOP, | |||
/** | |||
* http://en.wikipedia.org/wiki/Beta_Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different unrelated Beta Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm..good to know....
Codecov Report
@@ Coverage Diff @@
## master #1203 +/- ##
==============================================
+ Coverage 68.712% 68.73% +0.018%
- Complexity 8062 8068 +6
==============================================
Files 542 542
Lines 32728 32734 +6
Branches 5537 5537
==============================================
+ Hits 22488 22498 +10
+ Misses 8041 8039 -2
+ Partials 2199 2197 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still making my way through this and am not finished yet, just checkpointing my questions so far.
* write out [bits 13-20] | ||
* write out [bits 5-12] | ||
* write out [bits 1-8] | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc...
*/ | ||
public class ITF8 { | ||
|
||
public static final int MAX_BYTES = 5; | ||
public static final int MAX_BITS = 8 * MAX_BYTES; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be package-protected (no access modifier) rather than public, since they're only used by tests ? Or are they useful outside of this package ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be useful outside of this? But I have no specific use case, so I can make them pkg-priv
|
||
return value; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative factoring to share a single implementation across the different source types that I mentioned when we talked can be seen here. Since you can't use primitive types a generic type param in Java, it requires boxing (on the other hand, small integer values are interned), so I'd be inclined to not actually make this change now since it might have perf implications.
One other interesting issue that I was reminded of when I did this was how the cram code propagates checked exceptions all over the place. Most of the rest of htsdjk wraps checked exceptions in unchecked exceptions so they don't have to be declared and caught everywhere. I'd propose that we start migrating the cram code to the htsjdk pattern, and have these methods catch IOException and re-throw it wrapped in a RuntimeIOException
. Then we can start removing all of the throws clauses and try/catch blocks as we move up the stack.
*/ | ||
public class LTF8 { | ||
|
||
public static final int MAX_BYTES = 9; | ||
public static final int MAX_BITS = 8 * MAX_BYTES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about package protected here as above.
@@ -46,7 +46,7 @@ | |||
*/ | |||
BYTE_ARRAY_STOP, | |||
/** | |||
* http://en.wikipedia.org/wiki/Beta_Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm..good to know....
import java.util.HashMap; | ||
import java.util.Map; | ||
|
||
public class ByteArrayLenCodecTest extends HtsjdkTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test class is called ByteArrayLenCodecTest
, but ByteArrayLenCodec
is a private class inside of ByteArrayLenEncoding
. That makes me think this should be called `ByteArrayLenEncodingTest. But I'm not sure if the tests should change, or the actual classes ? Is this is part of the weirdness you were talking about yesterday ?
bale.fromByteArray(byteArrayOutputStream.toByteArray()); | ||
return bale; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmthibault79 It seems like this code is very similar to the actual code in ByteArrayLenEncoding
(but its slightly different ?) I wonder if we shouldn't refactor ByteArrayLenEncoding
and ByteArrayLenCodec
so we can test them directly instead ? Or is that what you're planning to do in the next round ? It almost seems like we should wait for that if its the latter.
ITF8 and LTF8 updates
89651d2
to
a058888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - seeing these tests certainly helped me get some insight into the existing class modeling (!) Just a couple of questions and minor comments - the new tests look like a big improvement.
} catch (final IOException e) { | ||
// this should never happened but still: | ||
throw new RuntimeException(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay thanks.
public void codecTest(final byte[] values) throws IOException { | ||
|
||
try (final ByteArrayOutputStream os = new ByteArrayOutputStream()) { | ||
final BitCodec<byte[]> writeCodec = new ExternalByteArrayCodec(os, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for my own education, the External
in ExternalByteArrayCodec
indicates that its writes to external blocks (?), which implies that its byte-oriented as opposed to bit-oriented ? But the class implements BitCodec
, so it has to take ByteArrayOutputStream
and ByteArrayInputStream
as constructor side-inputs since it doesn't consume the BitStreams defined by the interface methods ? Wow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
Separating these is one of the goals of an upcoming refactor sweep.
@@ -15,7 +15,7 @@ | |||
* @return array of integers from the input stream | |||
* @throws IOException as per java IO contract | |||
*/ | |||
public static int[] array(final InputStream inputStream) throws IOException { | |||
public static int[] array(final InputStream inputStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc @throws
can be removed for these now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks
@@ -32,7 +32,7 @@ | |||
* @return the number of bits written out | |||
* @throws IOException as per java IO contract | |||
*/ | |||
public static int write(final int[] array, final OutputStream outputStream) throws IOException { | |||
public static int write(final int[] array, final OutputStream outputStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
* @return the bytes holding ITF8 representation of the value | ||
*/ | ||
public static byte[] writeUnsignedITF8(final int value) { | ||
final ByteBuffer buffer = ByteBuffer.allocate(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this code was just moved from elsewhere, but is 10 meaningful here ? Although harmless, it seems like random over-allocation by half. Is there any way possible this can use more than the max possible # of bytes in an ITF8 encoding for an int, which is 5?. Can this use the new MAX_BYTES
constant ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I'll see how the tests respond.
list.add((byte)(1 << i)); | ||
list.add((byte)(-(1 << i) + 2)); | ||
list.add((byte)(-(1 << i) + 1)); | ||
list.add((byte)-(1 << i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. On to the refactoring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you @jmthibault79 and @cmnbroad
Description
External*CodecTests
ByteArrayLenCodec test
ITF8 and LTF8 updates
Checklist