Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update CramCompressionRecord.isPlaced() to make it APDelta-aware #1284

Merged
merged 20 commits into from
Feb 21, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/main/java/htsjdk/samtools/CRAMContainerStreamWriter.java
Original file line number Diff line number Diff line change
Expand Up @@ -379,8 +379,8 @@ protected void flushContainer() throws IllegalArgumentException, IllegalAccessEx
while (last.next != null) last = last.next;

if (cramRecord.isFirstSegment() && last.isLastSegment()) {

final int templateLength = CramNormalizer.computeInsertSize(cramRecord, last);
final boolean coordinateSorted = (samFileHeader.getSortOrder() == SAMFileHeader.SortOrder.unsorted);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems inverted - it will be false for coordinate sorted. Since no tests failed, we should add one somewhere that verifies this is working correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch.

Unfortunately, I believe that it would take me a substantial amount of time to make sense of CramNormalizer and write a reasonable test for it, so I don't think that's feasible with the quick turnaround time we want for this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test in this PR that would have failed before without the fix for APDelta, but passes now so at least we can be sure that we've fixed it ? I assume we didn't have one before this bug was introduced, since no test caught the issue ? Without that, I'm not even sure I'd recognize the failure if I saw it. Ideally we'd create a ticket describing it (that will be fixed by this PR) so we have a record in case anyone sees it using the interim version that had the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The pair of SliceTests cases starting with the comment on line 97 calls this out explicitly.

On a lower level, the combinations in CramCompressionRecordTest.placedTests show that the APDelta flag does affect results, and some would fail without this fix.

final int templateLength = CramNormalizer.computeInsertSize(cramRecord, last, coordinateSorted);

if (cramRecord.templateSize == templateLength) {
last = cramRecord.next;
Expand Down
15 changes: 9 additions & 6 deletions src/main/java/htsjdk/samtools/cram/build/CramNormalizer.java
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ public void normalize(final ArrayList<CramCompressionRecord> records,
for (final CramCompressionRecord record : records) {
if (record.previous != null) continue;
if (record.next == null) continue;
restoreMateInfo(record);
restoreMateInfo(record, header.getSortOrder() == SAMFileHeader.SortOrder.coordinate);
}
}

Expand Down Expand Up @@ -137,7 +137,8 @@ public void normalize(final ArrayList<CramCompressionRecord> records,
restoreQualityScores(defaultQualityScore, records);
}

private static void restoreMateInfo(final CramCompressionRecord record) {
private static void restoreMateInfo(final CramCompressionRecord record,
final boolean coordinateSorted) {
if (record.next == null) {

return;
Expand All @@ -155,7 +156,7 @@ private static void restoreMateInfo(final CramCompressionRecord record) {
// record.setFirstSegment(true);
// last.setLastSegment(true);

final int templateLength = computeInsertSize(record, last);
final int templateLength = computeInsertSize(record, last, coordinateSorted);
record.templateSize = templateLength;
last.templateSize = -templateLength;
}
Expand Down Expand Up @@ -326,19 +327,21 @@ private static byte getByteOrDefault(final byte[] array, final int pos,
*
* @param firstEnd first mate of the pair
* @param secondEnd second mate of the pair
* @param coordinateSorted do these records have APDelta set?
* @return template length
*/
public static int computeInsertSize(final CramCompressionRecord firstEnd,
final CramCompressionRecord secondEnd) {
final CramCompressionRecord secondEnd,
final boolean coordinateSorted) {
if (firstEnd.isSegmentUnmapped() || secondEnd.isSegmentUnmapped()) {
return 0;
}
if (firstEnd.sequenceId != secondEnd.sequenceId) {
return 0;
}

final int firstEnd5PrimePosition = firstEnd.isNegativeStrand() ? firstEnd.getAlignmentEnd() : firstEnd.alignmentStart;
final int secondEnd5PrimePosition = secondEnd.isNegativeStrand() ? secondEnd.getAlignmentEnd() : secondEnd.alignmentStart;
final int firstEnd5PrimePosition = firstEnd.isNegativeStrand() ? firstEnd.getAlignmentEnd(coordinateSorted) : firstEnd.alignmentStart;
final int secondEnd5PrimePosition = secondEnd.isNegativeStrand() ? secondEnd.getAlignmentEnd(coordinateSorted) : secondEnd.alignmentStart;

final int adjustment = (secondEnd5PrimePosition >= firstEnd5PrimePosition) ? +1 : -1;
return secondEnd5PrimePosition - firstEnd5PrimePosition + adjustment;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,22 @@ public String toString() {
return stringBuilder.toString();
}

public int getAlignmentSpan() {
if (alignmentSpan < 0) calculateAlignmentBoundaries();
public int getAlignmentSpan(final boolean APDelta) {
if (alignmentSpan < 0) {
calculateAlignmentBoundaries(APDelta);
}
return alignmentSpan;
}

void calculateAlignmentBoundaries() {
if (isSegmentUnmapped()) {
public int getAlignmentEnd(final boolean APDelta) {
if (alignmentEnd < 0) {
jmthibault79 marked this conversation as resolved.
Show resolved Hide resolved
calculateAlignmentBoundaries(APDelta);
}
return alignmentEnd;
}

void calculateAlignmentBoundaries(final boolean APDelta) {
if (! isPlaced(APDelta)) {
alignmentSpan = 0;
alignmentEnd = SAMRecord.NO_ALIGNMENT_START;
} else if (readFeatures == null || readFeatures.isEmpty()) {
Expand Down Expand Up @@ -189,11 +198,6 @@ void calculateAlignmentBoundaries() {
}
}

public int getAlignmentEnd() {
if (alignmentEnd < 0) calculateAlignmentBoundaries();
return alignmentEnd;
}

public boolean isMultiFragment() {
return (flags & MULTI_FRAGMENT_FLAG) != 0;
}
Expand All @@ -207,7 +211,7 @@ public void setMultiFragment(final boolean multiFragment) {
* Unmapped records may be stored in the same {@link Slice}s and {@link Container}s as mapped
* records if they are placed.
*
* @see #isPlaced()
* @see #isPlaced(boolean)
* @return true if the record is unmapped
*/
public boolean isSegmentUnmapped() {
Expand All @@ -220,15 +224,20 @@ public void setSegmentUnmapped(final boolean segmentUnmapped) {

/**
* Does this record have a valid placement/alignment location? This is independent of mapping status.
* It must have both a valid reference sequence ID and alignment start position to qualify.
* Unplaced reads may only be stored in Unmapped or Multiple Reference {@link Slice}s and {@link Container}s.
* It must have a valid reference sequence ID to qualify, as well as a valid alignment start position
* in the case of absolute (non-APDelta) position storage.
*
* @see #isSegmentUnmapped()
* @return true if the record is placed
*/
public boolean isPlaced() {
return sequenceId != SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX &&
alignmentStart != SAMRecord.NO_ALIGNMENT_START;
public boolean isPlaced(final boolean APDelta) {
// if an absolute alignment start coordinate is required but we have none, it's unplaced
if (! APDelta && alignmentStart == SAMRecord.NO_ALIGNMENT_START) {
jmthibault79 marked this conversation as resolved.
Show resolved Hide resolved
return false;
}

// it's also unplaced if we have no reference
return sequenceId != SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX;
}

public boolean isFirstSegment() {
Expand Down
8 changes: 4 additions & 4 deletions src/main/java/htsjdk/samtools/cram/structure/Slice.java
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ public static Slice buildSlice(final List<CramCompressionRecord> records,
// Unmapped: all records are unmapped and unplaced

final CramCompressionRecord firstRecord = records.get(0);
slice.sequenceId = firstRecord.isPlaced() ?
slice.sequenceId = firstRecord.isPlaced(header.APDelta) ?
firstRecord.sequenceId :
SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX;

Expand All @@ -398,19 +398,19 @@ public static Slice buildSlice(final List<CramCompressionRecord> records,
hasher.add(record);

// flip an unmapped slice to multi-ref if the record is placed
if (slice.isUnmapped() && record.isPlaced()) {
if (slice.isUnmapped() && record.isPlaced(header.APDelta)) {
slice.sequenceId = MULTI_REFERENCE;
}
else if (slice.isMappedSingleRef()) {
// flip a single-ref slice to multi-ref if the record is unplaced or on a different ref
if (!record.isPlaced() || slice.sequenceId != record.sequenceId) {
if (!record.isPlaced(header.APDelta) || slice.sequenceId != record.sequenceId) {
slice.sequenceId = MULTI_REFERENCE;
}
else {
// calculate single ref slice alignment

minAlStart = Math.min(record.alignmentStart, minAlStart);
maxAlEnd = Math.max(record.getAlignmentEnd(), maxAlEnd);
maxAlEnd = Math.max(record.getAlignmentEnd(header.APDelta), maxAlEnd);
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import htsjdk.samtools.SAMRecord;
import htsjdk.samtools.cram.encoding.readfeatures.*;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.util.ArrayList;
Expand All @@ -12,19 +13,24 @@
* Created by vadim on 28/09/2015.
*/
public class CramCompressionRecordTest extends HtsjdkTest {
@Test
public void test_getAlignmentEnd() {
@DataProvider(name = "tf")
jmthibault79 marked this conversation as resolved.
Show resolved Hide resolved
private Object[][] tf() {
return new Object[][] { {true}, {false}};
}

@Test(dataProvider = "tf")
public void test_getAlignmentEnd(final boolean APDelta) {
jmthibault79 marked this conversation as resolved.
Show resolved Hide resolved
CramCompressionRecord r = new CramCompressionRecord();
r.alignmentStart = 1;
r.setSegmentUnmapped(true);
Assert.assertEquals(r.getAlignmentEnd(), SAMRecord.NO_ALIGNMENT_START);
Assert.assertEquals(r.getAlignmentEnd(APDelta), SAMRecord.NO_ALIGNMENT_START);

r = new CramCompressionRecord();
int readLength = 100;
r.alignmentStart = 1;
r.readLength = readLength;
r.setSegmentUnmapped(false);
Assert.assertEquals(r.getAlignmentEnd(), r.readLength + r.alignmentStart - 1);
Assert.assertEquals(r.getAlignmentEnd(APDelta), r.readLength + r.alignmentStart - 1);

r = new CramCompressionRecord();
r.alignmentStart = 1;
Expand All @@ -33,7 +39,7 @@ public void test_getAlignmentEnd() {
r.readFeatures = new ArrayList<>();
String softClip = "AAA";
r.readFeatures.add(new SoftClip(1, softClip.getBytes()));
Assert.assertEquals(r.getAlignmentEnd(), r.readLength + r.alignmentStart - 1 - softClip.length());
Assert.assertEquals(r.getAlignmentEnd(APDelta), r.readLength + r.alignmentStart - 1 - softClip.length());

r = new CramCompressionRecord();
r.alignmentStart = 1;
Expand All @@ -42,7 +48,7 @@ public void test_getAlignmentEnd() {
r.readFeatures = new ArrayList<>();
int deletionLength = 5;
r.readFeatures.add(new Deletion(1, deletionLength));
Assert.assertEquals(r.getAlignmentEnd(), r.readLength + r.alignmentStart - 1 + deletionLength);
Assert.assertEquals(r.getAlignmentEnd(APDelta), r.readLength + r.alignmentStart - 1 + deletionLength);

r = new CramCompressionRecord();
r.alignmentStart = 1;
Expand All @@ -51,38 +57,50 @@ public void test_getAlignmentEnd() {
r.readFeatures = new ArrayList<>();
String insertion = "CCCCCCCCCC";
r.readFeatures.add(new Insertion(1, insertion.getBytes()));
Assert.assertEquals(r.getAlignmentEnd(), r.readLength + r.alignmentStart - 1 - insertion.length());

Assert.assertEquals(r.getAlignmentEnd(APDelta), r.readLength + r.alignmentStart - 1 - insertion.length());

r = new CramCompressionRecord();
r.alignmentStart = 1;
r.readLength = readLength;
r.setSegmentUnmapped(false);
r.readFeatures = new ArrayList<>();
r.readFeatures.add(new InsertBase(1, (byte) 'A'));
Assert.assertEquals(r.getAlignmentEnd(), r.readLength + r.alignmentStart - 1 - 1);
Assert.assertEquals(r.getAlignmentEnd(APDelta), r.readLength + r.alignmentStart - 1 - 1);
}

@DataProvider(name = "placedTests")
private Object[][] placedTests() {
return new Object[][] {
// APDelta = false. Must have a valid Start as well as a Reference
{false, false, false, false},
// APDelta = true. Must have a valid Reference. Invalid Start is OK because we use the Delta instead
{true, false, false, true}
jmthibault79 marked this conversation as resolved.
Show resolved Hide resolved
};
}

@Test
public void test_isPlaced() {
@Test(dataProvider = "placedTests")
public void test_isPlaced(final boolean APDelta,
final boolean noRefExpectation,
final boolean bothExpectation,
final boolean noStartExpectation) {
final CramCompressionRecord r = new CramCompressionRecord();

// it's only Placed if both of these are valid

r.sequenceId = 5;
r.alignmentStart = 10;
Assert.assertTrue(r.isPlaced());
Assert.assertTrue(r.isPlaced(APDelta));

r.sequenceId = SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX;
Assert.assertFalse(r.isPlaced());
Assert.assertEquals(r.isPlaced(APDelta), noRefExpectation);

r.alignmentStart = SAMRecord.NO_ALIGNMENT_START;
Assert.assertFalse(r.isPlaced());
Assert.assertEquals(r.isPlaced(APDelta), bothExpectation);

r.sequenceId = 3;
Assert.assertFalse(r.isPlaced());
Assert.assertEquals(r.isPlaced(APDelta), noStartExpectation);

r.alignmentStart = 15;
Assert.assertTrue(r.isPlaced());
Assert.assertTrue(r.isPlaced(APDelta));
}
}
Loading