-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode bug #2
Comments
This is a blocker for any use of XSPARQL over Unicode data. I suspect the builtin old Saxon is to blame and we'll try to retarget it to use BaseX. |
A similar thing happens for the Unicode char in
|
This won't be quite easy. So we'll first try to use the most recent SaxonHE: it uses 9.4 which is Mar 2012; the current one is 9.7.0-14 |
I tried to build XSPARQL with the latest SaxonHE 9.7.0-14, but after version 9.5 there are changes. For this reason it will need many changes in the code of the project. For example after 9.5 ExtensionFunctionCall use Sequence rather than SequenceIterator. |
We tried a simple XQuery on the XML in question, with both the SaxonHE versions listed above: there's no Unicode problem. |
The problem is fixed if you run it with the appropriate java option: java -Dfile.encoding=UTF-8 -jar c:/prog/xsparql/xsparql-cli-jar-with-dependencies.jar $@ But the input file starts with <?xml version="1.0" encoding="UTF-8"?> So this is still a valid bug |
Updated to saxon 9.9.0-2 8aff57e |
Kanji test cases are failing probably for the same reason |
I still experience Unicode problems if xsparql writes to STDOUT. |
"μg" in XML is converted to \265g (octal code \265 followed by "g") which is invalid Unicode and causes this RIOT exception:
"μ" in UFT8 is 2 bytes: \302\265. Somehow the XSPARQL conversion loses the first byte
The text was updated successfully, but these errors were encountered: