-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit Tests and fixes for a few classes in samtools.cram.io #1198
Conversation
- add IOTestCases and CramIntTest - add 64 bit CramInt - byte[] and ByteBuffer tests and fixes - add CramLong and CramLongTest
@@ -40,7 +40,10 @@ public static int int32(final byte[] data) { | |||
* @return an integer value read from the buffer | |||
*/ | |||
public static int int32(final ByteBuffer buffer) { | |||
return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shifts were being attempted in a byte
context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test case that fails without this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does my 0xff
comment work for you on this point?
@@ -7,7 +7,7 @@ | |||
/** | |||
* Methods to read and write CRAM array of integers data type. | |||
*/ | |||
public class CramArray { | |||
public class CramIntArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed because this gave me the impression of a generic array type, though it's limited to integer. I can revert this if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
* Methods to read and write CRAM long values as given in the file format specification. | ||
*/ | ||
|
||
public class CramLong { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All methods correspond to those in CramInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it looks like longs are a valid data type in the spec, but are unused by the implementation ? Its strange that we've never encountered a problem decoding CRAMs produced by other software, unless they don't use it either. It may be that longs are always written as LTF8 ? I think we should try to resolve that before we commit code and tests that aren't used anywhere. If we are missing functionality, we can keep this and then create a ticket that we need to hook it up. Do you know if/how htslib uses these ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through cram_io.h
in htslib, they only seem to use ITF8, LFT8, and int32.
CramInt in htsjdk is only used for Block checksums and Container Header sizes and checksums. Most other such values are ITF8/LTF8.
So you're right that this does seem unused. I can pull this change out of this PR into a different branch we can bring back if we find a later use for it.
@@ -1,29 +0,0 @@ | |||
package htsjdk.cram.io; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from htsjdk.cram.io
to htsjdk.samtools.cram.io
like the class it's testing
ByteBuffer bb = ByteBuffer.wrap(bytes); | ||
|
||
// a manually-advanced byte buffer | ||
final int manualBufferSize = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To exercise the read from byte[]
method
return params; | ||
} | ||
|
||
static List<Integer> int32Tests() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from ITF8Test
for reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice consolidation!
}; | ||
} | ||
|
||
static List<Long> int64Tests() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from LTF8Test
for reuse
} | ||
|
||
// special cases: | ||
list.add(Long.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two were not in the original test cases
Codecov Report
@@ Coverage Diff @@
## master #1198 +/- ##
===============================================
+ Coverage 68.414% 68.472% +0.058%
- Complexity 8019 8025 +6
===============================================
Files 542 542
Lines 32701 32704 +3
Branches 5530 5530
===============================================
+ Hits 22372 22393 +21
+ Misses 8125 8109 -16
+ Partials 2204 2202 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay - more tests. A few questions/comments.
@@ -7,7 +7,7 @@ | |||
/** | |||
* Methods to read and write CRAM array of integers data type. | |||
*/ | |||
public class CramArray { | |||
public class CramIntArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
* Methods to read and write CRAM long values as given in the file format specification. | ||
*/ | ||
|
||
public class CramLong { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it looks like longs are a valid data type in the spec, but are unused by the implementation ? Its strange that we've never encountered a problem decoding CRAMs produced by other software, unless they don't use it either. It may be that longs are always written as LTF8 ? I think we should try to resolve that before we commit code and tests that aren't used anywhere. If we are missing functionality, we can keep this and then create a ticket that we need to hook it up. Do you know if/how htslib uses these ?
public void runTest(List<Integer> ints) throws IOException { | ||
|
||
int[] inputArray = ints.stream().mapToInt(Integer::intValue).toArray(); | ||
ByteArrayOutputStream baos = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use try-with-resources for these to ensure they're closed even if an exception is thrown.
ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ||
CramIntArray.write(inputArray, baos); | ||
|
||
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-with-resources
return (0xFF & (int) buffer.get()) | | ||
(0xFF & (int) buffer.get()) << 8 | | ||
(0xFF & (int) buffer.get()) << 16 | | ||
(0xFF & (int) buffer.get()) << 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're in this class I'd be in favor of renaming the int32
overloads in to readInt32
so they're symmetric with the writeInt32
ones. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that
* @return a long value read | ||
* @throws IOException as per java IO contract | ||
*/ | ||
public static long int64(final InputStream inputStream) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same naming comment here if we keep these.
} | ||
|
||
baos.close(); | ||
bais.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add something to verify that these methods are really respecting little endianness.
I'd prefer to see this factored so there is one test method per method (or one test method per read/write overload/pair), rather than mixing them all up, with common code factored out as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
return params; | ||
} | ||
|
||
static List<Integer> int32Tests() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice consolidation!
- split CramInt tests into multiple - try-with-resources - rename to readInt32
Comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson more tests @jmthibault79 LGTM - thx!
return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24; | ||
public static int readInt32(final ByteBuffer buffer) { | ||
return (0xFF & (int) buffer.get()) | | ||
(0xFF & (int) buffer.get()) << 8 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting to int
should be unnecessary here, as byte is promoted to int before the bit shift operation is applied.
Also am not sure why 0xff &
is necessary as buffer.get()
returns byte
. I must be missing something here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the int casts - good catch.
0xFF &
is still necessary because int-promotion extends the negative sign into higher bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I now see why this is necessary, I forgot that byte isn't an unsigned type. Why, Java? Why?
} | ||
|
||
private static <T> Object[][] asDataProvider(List<T> list) { | ||
Object[][] params = new Object[list.size()][] ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space
Object[][] params = new Object[list.size()][] ; | |
Object[][] params = new Object[list.size()][]; |
|
||
private static <T> Object[][] asDataProvider(List<T> list) { | ||
Object[][] params = new Object[list.size()][] ; | ||
for (int i=0; i<params.length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space
for (int i=0; i<params.length; i++) | |
for (int i = 0; i < params.length; i++) |
|
||
public class ExternalCompressionTest extends HtsjdkTest { | ||
public static final File BZIP2_FILE = new File("src/test/resources/htsjdk/samtools/cram/io/bzip2-test.bz2"); | ||
public static final byte [] TEST_BYTES = "This is a simple string to test compression".getBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd spacing, I would expect to not have a space before []
public static final byte [] TEST_BYTES = "This is a simple string to test compression".getBytes(); | |
public static final byte[] TEST_BYTES = "This is a simple string to test compression".getBytes(); |
@@ -40,7 +40,10 @@ public static int int32(final byte[] data) { | |||
* @return an integer value read from the buffer | |||
*/ | |||
public static int int32(final ByteBuffer buffer) { | |||
return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test case that fails without this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Description
Increasing test coverage for the samtools.cram.io package, and fixes to bugs found by these tests
Checklist