[idl_parser] Add kTokenNumericConstant token #6432

vglavnyy · 2021-01-31T05:42:00Z

This commit adds the new token for correct parsing of signed numeric constants.
Before this expressions -nan or -inf were treated as kTokenStringConstant. This was ambiguous if a real string field parsed.
For example, { "text_field" : -name } was accepted by the parser as valid JSON object.

Related oss-fuzz issue: 6200301176619008

aardappel · 2021-02-01T20:06:40Z

src/idl_parser.cpp

  TD(IntegerConstant, 258, "integer constant") \
  TD(FloatConstant, 259, "float constant") \
-  TD(Identifier, 260, "identifier")
+  TD(NumericConstant, 260, "nan, inf or function name (signed)") \


This will cause someone writing a schema field like inf:string to get a pretty confusing error? If they intended to use inf as short for information or whatever :)

Might it be better to keep it as Identifier and explicitly recognize the few identifiers we care about only when parsing values (not while parsing field names)?

This will cause someone writing a schema field like inf:string to get a pretty confusing error?

No. In this case inf will be the identifier. If someone writes +inf the parser will return an error that it expects Identifier token. This case added into tests.

I tried to find another solution but couldn't.
Method ParseAnyValue is recursive so it is impossible to detect which part is now parsed, identifier or value.
I tried to solve it with flags on the assignment expressions (= for scheme and : for JSON) but it looks terrible.
Maybe you have an idea where these keywords can be matched as values?

I guess what might lead to slightly more readable code is to split up the float and function name cases, i.e. +-inf/nan become kTokenFloatConstant (or kTokenSpecialFloat if we need to be able to distinguish), whereas -sin(.. actually gets returned as the separate tokens - followed by kTokenIdentifier ?

I removed the introduced kTokenNumericConstant.
Now all processing is located inside the Parser::ParseSingleValue(). It looks much better and it passed all tests.
Could you review this new solution?

This commit adds the new token for correct parsing of signed numeric constants. Before this expressions `-nan` or `-inf` were treated as kTokenStringConstant. This was ambiguous if a real string field parsed. For example, `{ "text_field" : -name }` was accepted by the parser as valid JSON object. Related oss-fuzz issue: 6200301176619008

Probably the generated flatbuffers.pc should not be a part of repo.

src/idl_parser.cpp

aardappel · 2021-03-11T19:12:02Z

Thanks :)

github-actions bot added c++ codegen Involving generating code from schema labels Jan 31, 2021

vglavnyy requested a review from aardappel January 31, 2021 05:42

aardappel reviewed Feb 1, 2021

View reviewed changes

vglavnyy force-pushed the add_numeric_token branch 2 times, most recently from 04afd53 to 0ad06bb Compare March 6, 2021 09:10

vglavnyy added 5 commits March 6, 2021 16:41

Add additional positive tests fo 'inf' and 'nan' as identifiers

8c0890a

Rebase to HEAD

20263ce

Move processing of signed constants to ParseSingleValue method.

ffc8c36

Add missed --cpp-static-reflection (google#6324) to pass CI

f375acc

vglavnyy force-pushed the add_numeric_token branch from 0ad06bb to f375acc Compare March 6, 2021 09:50

Remove flatbuffers.pc from repository to unblock CI (google#6455).

edce648

Probably the generated flatbuffers.pc should not be a part of repo.

vglavnyy force-pushed the add_numeric_token branch from 6486d96 to edce648 Compare March 6, 2021 10:40

Fix FieldIdentifierTest()

aeee696

aardappel reviewed Mar 8, 2021

View reviewed changes

src/idl_parser.cpp Show resolved Hide resolved

aardappel merged commit 0e453ac into google:master Mar 11, 2021

vglavnyy deleted the add_numeric_token branch June 19, 2021 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[idl_parser] Add kTokenNumericConstant token #6432

[idl_parser] Add kTokenNumericConstant token #6432

Uh oh!

vglavnyy commented Jan 31, 2021

Uh oh!

aardappel Feb 1, 2021

Uh oh!

vglavnyy Feb 27, 2021

Uh oh!

aardappel Mar 1, 2021

Uh oh!

vglavnyy Mar 6, 2021

Uh oh!

Uh oh!

aardappel commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[idl_parser] Add kTokenNumericConstant token #6432

[idl_parser] Add kTokenNumericConstant token #6432

Uh oh!

Conversation

vglavnyy commented Jan 31, 2021

Uh oh!

aardappel Feb 1, 2021

Choose a reason for hiding this comment

Uh oh!

vglavnyy Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

aardappel Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

vglavnyy Mar 6, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aardappel commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants