-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document.to_xml is reformatting tags inconsistently #415
Comments
Hello! Thank you for testing pure Java Nokogiri. Inconsistent spaces and newlines are so difficult to resolve for pure Java version. Xerces needs schema to handle those. However, the difference displayed here looks like coming from some bug. I'll look at what makes that difference. |
Let me know if I can help. Dave |
Hi Dave, As @yokolet mentioned, it is extraordinarily hard to keep formatting behavior consistent between implementations. So, I'm wondering, can you explain a bit more about why this sort of non-semantic change is a blocker for you? Generally when people have pointed this out, it's because their test suites are asserting that the serialized document is identical to what they expect. Is this what you're up against? A more semantic (and thus portable) way to do this sort of testing is to assert against the document structure. The gems lorax or nokogiri-diff may be able to help in that case. Regardless, it would help us all if you'd give us some insight into what your particular blocker is here. Thanks for using Nokogirl! (Aaron made me say that. ;)) |
This isn't a question of a failing test. The problem is that this generated XML has additional whitespace that, when formatted using FO, results in extra spaces in the printed book. I'm representing a line of source code to be formatted in a book. So, the source code
gets converted to
However, the pure Java version converts it to
Now, in a <codeline>, whitespace is significant. The libxml version correctly puts no whitespace before the class keyword, while the Java version inserts it. As a result, the code listings format incorrectly. Even more confusingly, though, the Java version treats the codelines differently—the first and last are wrapped, while the rest are formated in the same way that libxml formats them. If we can stop the wrapping of the first and last, I think the problem would be solved. Dave |
I can fix this by overriding the default formatting @doc.to_xml(:save_with => 0) |
Dave - should this be closed? If this is as simple as turning off Node::SaveOptions::FORMAT by default on JRuby then maybe we should make that the default? I'm going to reopen. |
I think it makes more sense to have it off by default. |
Mise en place refactor: fa671aa |
default output of XML on JRuby is no longer formatted due to inconsistent whitespace handling. Closed by 4337005 |
The change of default of Node::SaveOptions::FORMAT makes sense. I've almost fixed this, but format option was doing something wrong. That confused me. I've already fixed the problem that a doctype decl was missing. |
…tent whitespace handling. Closes sparklemotion#415
The code at https://gist.github.com/815162 reads an XML document and then writes it back out. It produces different results with libxml Nokogiri and the pure-Java version. The first and last tags are split onto multiple lines by the Java version, but left intact by the libxml version.
libxml nokogiri
Java nokogiri
Is there any configuration I can use to turn off this behavior—it's currently preventing me from switching our toolchain to JRuby.
Dave
The text was updated successfully, but these errors were encountered: