Skip to content

Conversation

@vglavnyy
Copy link
Contributor

This commit adds the new token for correct parsing of signed numeric constants.
Before this expressions -nan or -inf were treated as kTokenStringConstant. This was ambiguous if a real string field parsed.
For example, { "text_field" : -name } was accepted by the parser as valid JSON object.

Related oss-fuzz issue: 6200301176619008

@github-actions github-actions bot added c++ codegen Involving generating code from schema labels Jan 31, 2021
@vglavnyy vglavnyy requested a review from aardappel January 31, 2021 05:42
TD(IntegerConstant, 258, "integer constant") \
TD(FloatConstant, 259, "float constant") \
TD(Identifier, 260, "identifier")
TD(NumericConstant, 260, "nan, inf or function name (signed)") \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause someone writing a schema field like inf:string to get a pretty confusing error? If they intended to use inf as short for information or whatever :)

Might it be better to keep it as Identifier and explicitly recognize the few identifiers we care about only when parsing values (not while parsing field names)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause someone writing a schema field like inf:string to get a pretty confusing error?

No. In this case inf will be the identifier. If someone writes +inf the parser will return an error that it expects Identifier token. This case added into tests.

I tried to find another solution but couldn't.
Method ParseAnyValue is recursive so it is impossible to detect which part is now parsed, identifier or value.
I tried to solve it with flags on the assignment expressions (= for scheme and : for JSON) but it looks terrible.
Maybe you have an idea where these keywords can be matched as values?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what might lead to slightly more readable code is to split up the float and function name cases, i.e. +-inf/nan become kTokenFloatConstant (or kTokenSpecialFloat if we need to be able to distinguish), whereas -sin(.. actually gets returned as the separate tokens - followed by kTokenIdentifier ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the introduced kTokenNumericConstant.
Now all processing is located inside the Parser::ParseSingleValue(). It looks much better and it passed all tests.
Could you review this new solution?

@vglavnyy vglavnyy force-pushed the add_numeric_token branch 2 times, most recently from 04afd53 to 0ad06bb Compare March 6, 2021 09:10
vglavnyy added 5 commits March 6, 2021 16:41
This commit adds the new token for correct parsing of signed numeric constants.
Before this expressions `-nan` or `-inf` were treated as kTokenStringConstant.
This was ambiguous if a real string field parsed.
For example, `{ "text_field" : -name }` was accepted by the parser as valid JSON object.

Related oss-fuzz issue: 6200301176619008
@vglavnyy vglavnyy force-pushed the add_numeric_token branch from 0ad06bb to f375acc Compare March 6, 2021 09:50
Probably the generated flatbuffers.pc should not be a part of repo.
@vglavnyy vglavnyy force-pushed the add_numeric_token branch from 6486d96 to edce648 Compare March 6, 2021 10:40
@aardappel
Copy link
Collaborator

Thanks :)

@aardappel aardappel merged commit 0e453ac into google:master Mar 11, 2021
@vglavnyy vglavnyy deleted the add_numeric_token branch June 19, 2021 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ codegen Involving generating code from schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants