-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MergeSamFiles accept SO:UNKNOWN #1069
MergeSamFiles accept SO:UNKNOWN #1069
Conversation
- appropriate conversion if tag/value is valid but in improper case(without changing the source file); - warning if change was made; - corresponding test.
Codecov Report
@@ Coverage Diff @@
## master #1069 +/- ##
==============================================
+ Coverage 66.16% 68.355% +2.195%
- Complexity 7609 8003 +394
==============================================
Files 533 541 +8
Lines 32373 32691 +318
Branches 5509 5535 +26
==============================================
+ Hits 21418 22346 +928
+ Misses 8790 8120 -670
- Partials 2165 2225 +60
|
Personally I'm opposed to this change. I do not see why htsjdk needs to be made more complicated in order to accept non-SAM-spec compliant input. Its hard enough as it is even with SAM-compliant input and this will make the code more complicated. What if someone has a typo in the header and they spelled the sort order "unkwnon"? Or perhaps they use a different "locale" and the used "SO: desconocido" ? should htsjdk also accept that? That off my chest, I'm not a maintainer...so I'd love to hear from them before even looking at the details of the implementation: @lbergelson @tfenne @jacarey |
I think that the correct solution is to fix the offending software so that it produces compliant BAM files and to write a simple script that fixes "broken" headers. |
Alternatively a better (in my view) solution would be an ability to ignore (when validation=silent or leniant) an unknown sort order and use "unknown" when that happens (instead of lowercasing) |
I also prefer that HTSJDK is SAM-compliant and do not allow non-supported |
I agree with @yfarjoun |
@a-filipp0v could you modify the code as follows:
I think that this will both solve the issue at hand and not introduce un-needed complexity. As a bonus, I think that your tests should still be useful! |
- added implementation that treating non-conforming SO tag values as "unknown"; - corresponding tests reworked, few more tests added.
Thank you for your comments. Hope we've understood you correctly in case of non-conforming SO tag values. |
this.sortOrder = null; | ||
tempVal = SortOrder.valueOf(value).toString(); | ||
} catch (IllegalArgumentException e) { | ||
tempVal = "unknown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SortOrder.unknown.toString()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this.
mKeyValuePairs.put(keyAndValue[0], keyAndValue[1]); | ||
} | ||
lineValid = true; | ||
} | ||
|
||
private void validateSortOrderValue(String[] value) { | ||
if ("SO".equals(value[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SAMFileHeader.SORT_ORDER_TAG
log.warn("Found non conforming header SO tag: " | ||
+ value[1] + ". Treating as 'unknown'."); | ||
} | ||
value[1] = "unknown"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use SortOrder.unknown.toString()
if (key.equals(SORT_ORDER_TAG)) { | ||
this.sortOrder = null; | ||
try { | ||
this.sortOrder = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this line into the try clause?
@@ -69,6 +70,8 @@ | |||
private static final Pattern FIELD_SEPARATOR_RE = Pattern.compile(FIELD_SEPARATOR); | |||
|
|||
public static final String COMMENT_PREFIX = HEADER_LINE_START + HeaderRecordType.CO.name() + FIELD_SEPARATOR; | |||
private static final Log log = Log.getInstance(SAMTextHeaderCodec.class); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra newline
@@ -233,8 +236,8 @@ private void parseHDLine(final ParsedHeaderLine parsedHeaderLine) { | |||
try { | |||
if (soString != null) SAMFileHeader.SortOrder.valueOf(soString); | |||
} catch (IllegalArgumentException e) { | |||
reportErrorParsingLine(HEADER_LINE_START + parsedHeaderLine.getHeaderRecordType() + | |||
" line has non-conforming SO tag value: "+ soString + ".", | |||
reportErrorParsingLine(HEADER_LINE_START + parsedHeaderLine.getHeaderRecordType() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert extra tab
@@ -552,12 +552,6 @@ public void duplicateReadsOutOfOrder() throws Exception { | |||
"@RG\tID:0\tSM:Hi,Mom!\n" + | |||
"E\t147\tchr1\t15\t255\t10M\t=\t2\t-30\tCAACAGAAGC\t)'.*.+2,))\tU2:Z:CAA"; | |||
|
|||
final String SOTagCorrectlyProcessTestData = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reply and sorry for the delay. We've removed this test case since as @a-filipp0v said, it's redundant after our implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand...if it's just "Redundant" but not exactly equals, I'd ask that it be left in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case does not pass now, since we no longer treat SO:NOTKNOWN as error as test expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
Hello, @yfarjoun , thanks for your reply. According to your previous comments, we've tried to move an appropriate check to SamFileValidator, but found that it still might make sense to leave our checks in SAMTextHeaderCodec and SAMFileHeader. First, the check in SamFileValidator will be called only when the ValidateSamFile is used, which does not prevent an error when using any tools separately (for example, MergeSamFile as in this issue). Secondly, SamFileValidator does not contain separate data about Validation Stringency, because this parameter is parsed from Picard during start. We suggest leaving checks in SAMTextHeaderCodec and SAMFileHeader AND adding a similar one in SamFileValidator. What do you think about it? |
@yfarjoun Since you did the initial review, could you take a look at this again when you get a chance? |
@a-filipp0v thanks for the explanation. I've removed my comment relating to the validation. |
@a-filipp0v sorry for the delay. please respond to the review comments. |
@a-filipp0v just so we know, are you planning to get back to this PR? |
@yfarjoun sorry for the delay and thank you for your reply. I made some fixes according to your notes. Could you take a look at PR again? |
Thanks for the changes. Please leave tests as is, unless they no longer pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine now. Maintainers, I'd like to merge this, but please glance at it as it changes the API a bit: a previously failing file will now validate with order=="unknown"
@jacarey @tfenne @lbergelson any opinions about this small API change? |
if (key.equals(SORT_ORDER_TAG)) { | ||
this.sortOrder = null; | ||
try { | ||
tempVal = SortOrder.valueOf(value).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In setSortOrder()
the code uses .name()
instead of .toString()
which I think is clearer; either way the code should be consistent.
Also, it seems odd that setSortOrder()
sets sortOrder
to non-null, if the argument is a valid SortOrder
, but this method sets it to null
. Maybe this method should just call setSortOrder()
when key
is SORT_ORDER_TAG
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pshapiro4broad that's how the old implementation also does it...it let's the super-class deal with it...take a look at htsjdk.samtools.SAMFileHeader#getSortOrder, I think that might clarify things up (and note that this.sortOrder is private)
No response from the maintainers for a while now, I suspect that no-one objects....if I'm wrong, please revert and accept my apologies. |
Description
Request is associated with Picard's issue broadinstitute/picard#796
As said in that issue MergeSamFiles can't accept SAM-file with SO:UNKNOWN. That behavior is expected, because tag's value is in improper case(in accordance with SAM-format specification, SO value must be in lowercase). We've found that wrong tag case also provides an exception.
The idea of implementation is that htsjdk must check case of header's tags and values. If tag/value itself is valid, but in improper case(e.g. lowercase instead of uppercase) - it should be converted to proper case before sending the result to output(without changing the source file). Also, there must be a message if that conversion was made.
Checklist