Special Characters Not Handled Correctly in Filters #564

joshuapsteele · 2024-04-18T18:08:13Z

It appears that special characters are not handled correctly in Filters. For example, changing the name to "Bílbo Bággins" by adding accents here causes the test to fail:

https://github.com/apache/directory-scimple/blob/develop/scim-spec/scim-spec-schema/src/test/java/org/apache/directory/scim/spec/filter/FilterBuilderTest.java#L33-L40

line 1:18 token recognition error at: '"Bí'
line 1:21 extraneous input 'lbo' expecting CompValue

org.apache.directory.scim.spec.filter.FilterParseException: Failed to parse filter: name.givenName EQ "Bílbo" AND name.familyName EQ "Bággins"

	at org.apache.directory.scim.spec.filter.Filter.parseFilter(Filter.java:92)
	at org.apache.directory.scim.spec.filter.Filter.setFilter(Filter.java:70)
	at org.apache.directory.scim.spec.filter.Filter.<init>(Filter.java:55)
	at org.apache.directory.scim.spec.filter.FilterBuilderTest.testSimpleAnd(FilterBuilderTest.java:39)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.IllegalStateException: failed to parse at line 1:21 due to extraneous input 'lbo' expecting CompValue
	at org.apache.directory.scim.spec.filter.Filter$1.syntaxError(Filter.java:81)
	at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
	at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
	at org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
	at org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548)
	at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467)
	at org.antlr.v4.runtime.Parser.match(Parser.java:208)
	at org.apache.directory.scim.spec.filter.FilterParser.filterExpression(FilterParser.java:459)
	at org.apache.directory.scim.spec.filter.FilterParser.filter(FilterParser.java:128)
	at org.apache.directory.scim.spec.filter.Filter.parseFilter(Filter.java:86)
	... 6 more

The text was updated successfully, but these errors were encountered:

bdemers · 2024-04-19T15:25:58Z

Hi @joshuapsteele!

I only had a minute to poke around at this, but my guess is it's related to the ANTLR grammar, the def for STRING looks like it's only parsing ASCII, and would need to be improved.

It looks like there is a better/improved grammar here: https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4
Which contains:

STRING
    : '"' (ESC | SAFECODEPOINT)* '"'
    ;

fragment ESC
    : '\\' (["\\/bfnrt] | UNICODE)
    ;

fragment UNICODE
    : 'u' HEX HEX HEX HEX
    ;

fragment HEX
    : [0-9a-fA-F]
    ;

fragment SAFECODEPOINT
    : ~ ["\\\u0000-\u001F]
    ;

I did a quick replacement in src/main/antlr4/imports/Json.g4 and it seemed to fix the parsing error, but test failed still because of an equality check of a filter with B\u00EDlbo and Bílbo. (my guess is test needs to be tweaked a bit 🤷, but I don't have time to dig into it more today)

Anyway, great find, hopefully the above info helps!

- Updated ANTLR4 grammar to support unicode chars - Added test - Removed logic that escapes/unescapes JSON values, should is handled by at the JSON parser (jackson) Fixes: #564

bdemers added a commit that referenced this issue Dec 22, 2024

Handle unicode characters in filters

8505ac2

- Updated ANTLR4 grammar to support unicode chars - Added test - Removed logic that escapes/unescapes JSON values, should is handled by at the JSON parser (jackson) Fixes: #564

bdemers linked a pull request Dec 22, 2024 that will close this issue

Handle unicode characters in filters #707

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special Characters Not Handled Correctly in Filters #564

Special Characters Not Handled Correctly in Filters #564

joshuapsteele commented Apr 18, 2024

bdemers commented Apr 19, 2024

Special Characters Not Handled Correctly in Filters #564

Special Characters Not Handled Correctly in Filters #564

Comments

joshuapsteele commented Apr 18, 2024

bdemers commented Apr 19, 2024