Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnexpectedToken.__str__ fails with Lark(transformer=...) when transformer throws exception #1345

Closed
TobiasKaiser opened this issue Sep 28, 2023 · 3 comments

Comments

@TobiasKaiser
Copy link

I use lark.Lark with the transformer= argument to run a transformer during parsing. I fould that this sometimes produced the following type of error message when an unexpected token was encountered: lark.exceptions.UnexpectedToken: <exception str() failed>. This seems to mean that an exception is raised in UnexpectedToken.__str__. When I ran the parser without the transformer, this did not happen.

Minimal example to reproduce:

import lark

grammar = """
start: node_lhs OP "(" AMP_REF ")" node_rhs

node_lhs: (PERCENT_REF ("," PERCENT_REF)* "=")?
node_rhs: (PERCENT_REF ("," PERCENT_REF)*)?

OP: "add" | "sub"

AMP_REF: "&" DIGIT+
PERCENT_REF: "%" DIGIT+

%import common.DIGIT
%import common.WS_INLINE
%ignore WS_INLINE
"""

class MyTransformer(lark.visitors.Transformer):
    def OP(self, token):
        return {"add":1, "sub":2}[str(token)]

def case1():
    parser = lark.Lark(grammar, parser="lalr",  lexer="basic")

    # Success:
    print(MyTransformer().transform(parser.parse("%17 = add(&1) %13, %16")))

    # Raises graceful lark.exceptions.UnexpectedToken:
    MyTransformer().transform(parser.parse("%17 = add(&1) %13, %16"))
    parser.parse("&17 = add %13, %16")

def case2():
    parser = lark.Lark(grammar, parser="lalr",  lexer="basic", transformer=MyTransformer())

    # Success:
    print(parser.parse("%17 = add(&1) %13, %16"))

    # Raises broken lark.exceptions.UnexpectedToken: <exception str() failed>
    parser.parse("&17 = add %13, %16")

#case1()
case2()

case1() runs fine, the problem is in case2().

Analysis: UnexpectedToken.__str__ calls InteractiveParser.accepts through the property UnexpectedToken.accepts. InteractiveParser.accepts somehow calls MyTransformer.OP with an empty token as argument. MyTransformer.OP raises a KeyError when it gets that empty ('') token.

Fix: I find that it is acceptable behaviour for the transformer rule to raise an exception when it gets a token that violating the grammar. I guess it should even be acceptable for the transformer to raise an exception in other cases. Possible ways to fix this problem could be:

  • Do not attach InteractiveParsers to UnexpectedInput exceptions (_Parser.parse_from_state, lalr_parser.py:190) when a transformer is used directly together with a parser. Then, UnexpectedToken.__str__ would produce a error message using self.expected instead of self.accepts. I do not know what the drawbacks of using self.expected are.
  • Catch + handle any type of exception in UnexpectedToken.accepts. On error, resort to self.expected or return a meaningful string to prevent <exception str() failed>.

Versions: lark-1.1.7, Python 3.11.2

@MegaIng
Copy link
Member

MegaIng commented Sep 28, 2023

IMO the bug is that InteractiveParser.accepts calls into the Transformer at all, that can have arbitrary site effects outside of the purpose of that function.

But also, I am pretty sure passing your transformer as an internal transformer is a bad idea since the OP function doesn't return a Token. I am surprised it works at all.

Edit: The semantics of passing Token transformations is slightly different than what I remembered, this is fine.

MegaIng added a commit to MegaIng/lark that referenced this issue Sep 28, 2023
@MegaIng
Copy link
Member

MegaIng commented Sep 28, 2023

#1346 Should fix this by just not calling the Transformer.

@TobiasKaiser
Copy link
Author

Thanks for having a look at this so quickly! Both #1346 and #1347 solve the problem for me. As a user, I do not know which one makes more sense internally.

@erezsh erezsh closed this as completed in 70bd307 Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants