You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The newest version of sam specifies which fields are allowed to be utf-8 and which must be standard 7bit ascii. I tested it and it turns out we do not support utf-8 in sam files at all. We mangle them to the ascii characters in all cases.
AsciiWriter uses the very simple StringUtil.charsToBytes which just downcasts the input char to a byte. This is incorrect.
We don't detect this case and instead silently corrupt the output.
We may have similar problems with cram/bam.
The text was updated successfully, but these errors were encountered:
* The @sq DS header field was added to the 1.6 bam spec, this adds a getter and setter for it.
* We do not correctly support UTF-8 characters in description due to #1202
* Changing htsjdk to produce sam version 1.6
* Htsjdk has technically been producing Sam version 1.6 since support for long cigars was added.
* Updating that list of acceptable versions to include 1.6 and setting the header version of new bams to 1.6.
* There is a known issue with writing utf-8 characters in the sam header, this is now allowed for some fields but not handled correctly. See #1202
* The @sq DS header field was added to the 1.6 bam spec, this adds a getter and setter for it.
* We do not correctly support UTF-8 characters in description due to #1202
The newest version of sam specifies which fields are allowed to be utf-8 and which must be standard 7bit ascii. I tested it and it turns out we do not support utf-8 in
sam
files at all. We mangle them to the ascii characters in all cases.AsciiWriter
uses the very simpleStringUtil.charsToBytes
which just downcasts the inputchar
to abyte
. This is incorrect.We don't detect this case and instead silently corrupt the output.
We may have similar problems with cram/bam.
The text was updated successfully, but these errors were encountered: