-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent Illegal Look-Around for OneOf in JSONSchema #897
Conversation
Looks good, thank you for the fix! |
other_subregexes_str = "|".join([f"{s}" for s in other_subregexes]) | ||
negative_lookahead = f"(?!.*({other_subregexes_str}))" | ||
xor_patterns.append(f"({subregex}){negative_lookahead}") | ||
xor_patterns = [f"(?:{subregex})" for subregex in subregexes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lapp0 what is the difference bw oneOf and anyOf in this script? Both concat the subregex with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also think additional conditions like just checking for ^
and just checking for $
needs to be added to this conditon? https://github.com/outlines-dev/outlines/blob/7863f8e8bbaeb71c9d2434636a2d63bfe6dd7d39/outlines/fsm/json_schema.py#L248-L249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how about adding array
along with object
to the types to ignore here? https://github.com/outlines-dev/outlines/blob/7863f8e8bbaeb71c9d2434636a2d63bfe6dd7d39/outlines/fsm/json_schema.py#L344-L347 Array like object requires many more parameters like items
which is not provided in the list of types which is the reason why object is ignored I believe since properties cannot be mentioned when type is given as a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekagra-ranjan actually reviewing the test cases, I think anyOf
is wrong. I'd appreciate a second set of eyes on this. Would you agree that
oneOf
is correct (uses exclusive or), guarantees only one match
>>> print(re.fullmatch(r"((?:foo)|(?:bar))", "foo"))
<re.Match object; span=(0, 3), match='foo'>
>>> print(re.fullmatch(r"((?:foo)|(?:bar))", "foobar"))
None
anyOf
is incorrect, allows only one match (I think this regression was introduced in https://github.com/outlines-dev/outlines/pull/552/files)
>>> print(re.fullmatch(r"((foo)|(bar))", "foo"))
<re.Match object; span=(0, 3), match='foo'>
>>> print(re.fullmatch(r"((foo)|(bar))", "foobar"))
None
allOf
is somewhat correct, requires generation of all schemas, but enforces an order which it shouldn't:
elif "allOf" in instance:
subregexes = [
to_regex(resolver, t, whitespace_pattern) for t in instance["allOf"]
]
subregexes_str = [f"{subregex}" for subregex in subregexes]
return rf"({''.join(subregexes_str)})"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your other two comments, I appreciate your insights into json schema -> regex. It's definitely a work in progress and it's great that you're looking deep into it's behavior and identifying problems. I think it's out of scope with this PR. Could you create a separate issue so others can provide input as well?
Fixes #823
This comment details the issues error: #823 (comment)
The reproduction code provided results in a json schema with
OneOf[pets]
:Before this PR:
OneOf
uses negative lookaheads to assert that only one schema member is included. This is illegal ininteregular
, more details available here: #456After
OneOf
uses or-joined non-capturing groups which don't have the same issues withinteregular
.