Skip to content

Multi-byte characters are split in writeCData() if first byte sits right at the end of the buffer #86

@tatsel

Description

@tatsel

This issue seems to be similar to the one fixed here https://github.com/FasterXML/aalto-xml/pull/75/commits, though the cause is a bit different:

I get 'javax.xml.stream.XMLStreamException: Incomplete surrogate pair in content: first char 0xdfce, second 0x78' exception when I try to write CData with multi-byte char sitting right at the border of 512-sized internal buffer.

Example test to reproduce (copied from https://github.com/FasterXML/aalto-xml/blob/master/src/test/java/com/fasterxml/aalto/sax/TestSaxWriter.java#L10 and slightly adjusted for writeCData()):

StringBuilder testText = new StringBuilder();
        for (int i = 0; i < 511; i++)
            testText.append('x');
        testText.append("\uD835\uDFCE");
        for (int i = 0; i < 512; i++)
            testText.append('x');
        WriterConfig writerConfig = new WriterConfig();
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        Utf8XmlWriter writer = new Utf8XmlWriter(writerConfig, byteArrayOutputStream);
        writer.writeStartTagStart(writer.constructName("testelement"));
        writer.writeCData(testText.toString());
        writer.writeStartTagEnd();
        writer.writeEndTag(writer.constructName("testelement"));
        writer.close(false);

I think the reason is that ByteXmlWriter#writeCDataContents() lacks this piece of code which exists in writeCharacters():

if (_surrogate != 0) {
            outputSurrogates(_surrogate, cbuf[offset]);
//           reset the temporary surrogate storage
            _surrogate = 0;
            ++offset;
            --len;
        }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions