Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look at if a generated regex parser can do better job than pugixml (performance wise) #5

Open
mithro opened this issue Jul 2, 2019 · 7 comments

Comments

@mithro
Copy link
Member

mithro commented Jul 2, 2019

pugixml is a generic XML parser. However, now we are generating a parser, we should see if we can do better by creating a parser that only parses files in the exact given XML formats. Using something like Google's re2 would be a good option for that.

@mithro
Copy link
Member Author

mithro commented Jul 2, 2019

@duck2
Copy link
Member

duck2 commented Jul 2, 2019

I don't think this would work. Even the non-requirement of order on the attributes will make the resulting regular expression explode combinatorially. Think of an element with 6 required attributes:

<element (attr1="([\w]*)" attr2="([\w]*)" attr3="([\w]*)" attr4="([\w]*)" attr5="([\w]*)" attr6="([\w]*)")
|(attr1="([\w]*)" attr2="([\w]*)" attr3="([\w]*)" attr4="([\w]*)" attr6="([\w]*)" attr5="([\w]*)"
| [718 more permutations...] >

@mithro
Copy link
Member Author

mithro commented Jul 2, 2019

Use an or and a match X times. Something like....

<element (attr1="([\w]*)"|attr2="([\w]*)"|attr3="([\w]*)")*>

@duck2
Copy link
Member

duck2 commented Jul 2, 2019

Yes, but then we won't be checking if all required attributes are present. State machines are really bad at handling independent inputs.

@mithro
Copy link
Member Author

mithro commented Jul 2, 2019

@duck2 - That can be done after the tag has been parsed?

@duck2
Copy link
Member

duck2 commented Jul 2, 2019

Need to think more. Maybe we could make something like an opinionated SAX parser out of this, output of which can be fed to the general purpose validators.

@mithro
Copy link
Member Author

mithro commented Jul 2, 2019

@duck2 Notice how I separated some final validation from the parsing in the example here -> #1 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants