Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for help (aka so near and yet so far). #348

Closed
kgigitdev opened this issue Jun 18, 2023 · 0 comments
Closed

Request for help (aka so near and yet so far). #348

kgigitdev opened this issue Jun 18, 2023 · 0 comments

Comments

@kgigitdev
Copy link

kgigitdev commented Jun 18, 2023

Hi all.

Apologies in advance: this isn't a bug report; it's a request for help. I'm hoping that some kind soul can point me in the right direction, much as happened for the user who posted issue 178.

I'm experimenting with replacing a Java-based parser using ANTLR with one using participle, but I'm having trouble getting quite a simple parser working, and I think there must be something fundamental that I've misunderstood.

The task at hand is to have arbitrary (and arbitrarily nested) boolean combinations of "node identifers", of the form "<string.number>". The strings ultimately evaluate to an application-specific boolean, but how that happens is unrelated to the actual parsing.

Here's my example code, in full:

package main

import (
	"log"
	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/repr"
	"github.com/alecthomas/participle/v2/lexer"
)

type Expression struct {
 	ParenExpression *ParenExpression `  @@`
 	NotExpression   *NotExpression   `| @@`
 	// AndExpression   *AndExpression   `| @@`
 	// OrExpression    *OrExpression    `| @@`
 	NodeExpression  *NodeExpression  `| @@`
}

type ParenExpression struct {
	Expression *Expression `OpenParenToken @@ CloseParenToken`
}

type NotExpression struct {
	Expression *Expression  `NotToken @@`
}

type AndExpression struct {
	Left *Expression  `@@ AndToken`
	Right *Expression `@@`
}

type OrExpression struct {
	Left *Expression  `@@ OrToken`
	Right *Expression `@@`
}

type NodeExpression struct {
	Node *string `@NodeToken`
}

func main() {

	var simpleLexer = lexer.MustSimple([]lexer.SimpleRule{
		{"Whitespace", `[\s\t]+`},
		{"NodeToken", `\w+\.\d+`},
		{"NotToken", `(not|NOT)`},
		{"OpenParenToken", `\(`},
		{"CloseParenToken", `\)`},
		{"AndToken", `(and|AND)`},
		{"OrToken", `(or|OR)`},
	})
	
	parser := participle.MustBuild[Expression](
		participle.Lexer(simpleLexer),
		participle.UseLookahead(1000),
		participle.Elide("Whitespace"),
	)

	expressions := []string {
		// These all work with just ParenExpression,
		// NotExpression and NodeExpression.
		"abcd.1234",
		"NOT abcd.1234",
		"( abcd.1234 )",
		"( NOT abcd.1234 )",
		"NOT ( abcd.1234 )",
		
		// In order for these to work, we would need to
		// uncomment AndExpression and OrExpression,
		// but doing so causes an infinite recursion.
		"abcd.1234 AND bcde.2345",
		"abcd.1234 AND ( bcde.2345 OR cdef.3456 )",
		"abcd.1234 AND NOT ( bcde.2345 OR cdef.3456 )",
	}

	for _, expression := range expressions {
		log.Printf("Parsing expression: [%s]\n", expression)
		expr, err := parser.ParseString("", expression)
		if err != nil {
			log.Printf("Parsing failed: %s", err)
		}
		repr.Println(expr)
	}
}

The first few example expressions (with node expressions, parentheses and NOTs) work, but the moment I try to enable the AND and OR expressions, I cause infinite recursion.

Things I've tried, in no particular order:

  • Instead of having an Expression struct, have an sealed Expression interface, have all the structs all implement the interface, and use participle.Union.
  • Removing the UseLookahead() (which is 100% pure cargo-cult).
  • Instead of having a 2-field AndExpression, have a 3-field AndExpression where the AndToken gets consumed into an Operator field (which is unnecessary, since the struct type determine the operator anyway, but I tried in case my 2-argument grammar constructs were faulty).
  • Having a TopLevelExpression struct with a Head *Expression followed by a Tail []*Expression.

The last of these was pure desperation; it shouldn't even be needed, since we aren't parsing something like a computer program, where there's "one or more" things to be parsed; the entire expression should be parseable into a tree with a single root node.

I'm sure that the solution is just a few keystrokes' worth of fixes, but after 2 days of trying, I'm out of ideas.

Finally, is it considered more idiomatic to create lexer tokens for keywords, as I've done above, or is it considered more idiomatic to simply quote the keywords in the grammar annotations?

Repository owner locked and limited conversation to collaborators Jun 18, 2023
@alecthomas alecthomas converted this issue into discussion #349 Jun 18, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant