Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing BNF? #15

Closed
mperham opened this issue Nov 24, 2024 · 7 comments
Closed

Parsing BNF? #15

mperham opened this issue Nov 24, 2024 · 7 comments

Comments

@mperham
Copy link

mperham commented Nov 24, 2024

I'm trying to parse the BNF/EBNF found here:

https://gist.githubusercontent.com/springcomp/e72d09e3a8f06e8d711751c3d1ee160e/raw/c58887a2f6c6f41de7a66a472e7c708fa6f64429/JSON.ebnf

It appears to parse fine with a BNF Playground website I found but I get the following error when I use the ebnf gem:

require 'ebnf'

grammar = EBNF.parse(File.open("./json.ebnf"))
puts grammar.to_s
/Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf/parser.rb:304:in `rescue in initialize': ERROR [line: 1] syntax error, expecting "@terminals", "@pass", :LHS (found "<quoted_string> ::= ") (SyntaxError)
	from /Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf/parser.rb:267:in `initialize'
	from /Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf/base.rb:125:in `new'
	from /Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf/base.rb:125:in `initialize'
	from /Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf.rb:29:in `new'
	from /Users/mperham/.gem/ruby/3.3.5/gems/ebnf-2.5.0/lib/ebnf.rb:29:in `parse'
	from lib/jsonebnf.rb:3:in `<main>'

Any tips?

@gkellogg
Copy link
Collaborator

Your EBNF uses angle brackets around production symbols (e.g., <quoted_string> vs. quoted_string). The EBNF gem generally adheres the the EBNF grammar as defined in XML 1.0, which does not use angle brackets. But, the gem also supports other features not in the "official" definition of EBNF, and I've seen this pattern before; supporting it should not be difficult. (You can see our grammar for EBNF here). Supporting this would require updating the SYMBOL production. SYMBOL could be defined as follows:

SYMBOL         ::= ('<' SYMBOL_RAW '>' ) | SYMBOL_RAW 
SYMBOL_RAW     ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+

(The LHS would need some minor updates for the line-number/identifier component, but I'm planning on changing that to a pre-processing filter, as it creates ambiguities in the grammar).

For reference, do you have a reference for an EBNF variant that includes the angle brackets around the symbols?

@mperham
Copy link
Author

mperham commented Nov 25, 2024

I don't. I'm guessing they are BNF that people are trying to parse as EBNF. Does that make sense?

@gkellogg
Copy link
Collaborator

The grammar uses too many extended features to be straight BNF (e.g., support for * AND +); but, I have seen this form used in EBNF before, so it's worthy of supporting. I should be able to get out a release that includes this without too much trouble. (Famous last words).

@gkellogg
Copy link
Collaborator

I'm running down a couple of more errors related to changes, but there are some issues with the EBNF example in any case:

<escaped_char> uses "\"", which is not part of EBNF; EBNF has no string escapes. Instead it should be '"'.

<unscaped_char> and <unescaped_literal> have a character that looks like a space separating terminals, but isn't an ASCII space character (just before | "$").

<escape> should just be <escape> ::= "\"

<quote> should just be <quote> ::= '"'

The resulting parsed grammar is the following:

(
 (rule quoted_string (seq quote (plus (alt unescaped_char escaped_char)) quote))
 (rule escaped_char (seq escape (alt '"' "/" "b" "f" "n" "r" "t" unicode escape)))
 (rule escaped_literal (alt escaped_char (seq escape "`")))
 (rule unescaped_char
  (alt digit letter " " "!" "#" "$" "%" "&" "'" "(" ")" "*+" "," "-" "." "/"
   ":" ";" "<" ">" "?" "@" "[" "]" "^" "_" "`" "{" "|" "}" "~" ))
 (rule unescaped_literal
  (alt digit letter " " "!" "#" "$" "%" "&" "'" "(" ")" "*+" "," "-" "." "/"
   ":" ";" "<" ">" "?" "@" "[" "]" "^" "_" "{" "|" "}" "~" ))
 (rule unicode (seq "u" digit digit digit digit))
 (rule escape (seq "\\"))
 (terminal digit (range "0-9"))
 (rule letter (alt (range "A-Z") (range "a-z") "_"))
 (rule quote (seq '"'))
 (rule json_value
  (alt json_array json_boolean json_null json_number json_object json_string))
 (rule json_null (seq "null"))
 (rule json_boolean (alt "true" "false"))
 (rule json_number
  (seq
   (opt "-")
   (alt "0" (seq (range "1-9") (star (range "0-9"))))
   (opt (seq "." (plus (range "0-9"))))
   (opt (seq "e" (alt "-" "+") (plus (range "0-9"))))) )
 (rule json_array
  (seq ws "[" (opt (seq ws json_value ws (star (seq "," ws json_value ws)))) "]" ws))
 (rule json_object
  (seq ws "{" ws (opt (seq member ws (star (seq "," ws member ws)))) "}" ws))
 (rule json_string (seq quote (star (alt unescaped_literal escaped_literal)) quote))
 (rule member (seq quoted_string ws ":" ws json_value))
 (rule ws (star " ")))

I'll have an update the the development branch tomorrow, and I'll ask you to check it for yourself before I release the gem.

@gkellogg
Copy link
Collaborator

Please give the version of the gem on the develop branch a test to see if it solves your problems, and I'll release an update to the gem.

(BTW, you're correct that the original BNF grammar used angle brackets around rule symbols, which EBNF does not. So, adding this capability provides greater compatibility. However, there is currently no provision for using this when serializing a grammar back to EBNF).

@mperham
Copy link
Author

mperham commented Nov 27, 2024

Sorry, I don't have an easy way to test this right now as I went a different direction due to this blocker and I didn't want to rely on a quick fix from you. You can add that linked .ebnf as a test case if you wish.

@gkellogg
Copy link
Collaborator

I have used it as a test case, but note the errors I reported in that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants