Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too long numeric hex literal silently converted to value 0 with an alias #7594

Open
mrotteveel opened this issue May 22, 2023 · 14 comments
Open

Comments

@mrotteveel
Copy link
Member

If a numeric hex literal is too long (more than 32 hex digits in Firebird 4.0 and 5.0, more than 16 hex digits in Firebird 3.0), it is silently converted to the value 0 with an alias equal to the remainder of the literal.

For example

select 0x12345678901234567890123456789012, 0x123456789012345678901234567890123 from rdb$database;
                                     CONSTANT X123456789012345678901234567890123
============================================= ==================================
       24197857199965561741520400062332047378                                  0

(first column is a valid INT128 literal in Firebird 4.0 and higher with 32 digits, while the second is invalid, with 33 digits)

This should raise an error instead.

@mrotteveel
Copy link
Member Author

mrotteveel commented May 22, 2023

NOTE: This seems to be a generic parser issue: for example select 0alias from rdb$database has the same problem. This should not be accepted either, but results in a column with alias alias and value 0.

For comparison, for select 0alias PostgreSQL raises:

ERROR:  trailing junk after numeric literal at or near "0a"
LINE 1: select 0alias

@AlexPeshkoff
Copy link
Member

I.e. it's definitely bug, not a feature?

@mrotteveel
Copy link
Member Author

The 0x123456789012345678901234567890123 should be handled as a single token, and then fail because it is too long for hex literals. Same goes for 0alias, it should be a single token which produces an error because it is not a valid literal or other type of expression (e.g. it is an invalid identifier, because regular identifiers cannot start with a number, nor is it a valid number literal).

@AlexPeshkoff AlexPeshkoff self-assigned this May 22, 2023
@AlexPeshkoff
Copy link
Member

OK, will take a look at it

@asfernandes
Copy link
Member

NOTE: This seems to be a generic parser issue: for example select 0alias from rdb$database has the same problem. This should not be accepted either, but results in a column with alias alias and value 0.

I fully disagree with you here.

It's so valid as is the expression 1+2. Spaces are not required.

The lexer consumes characters as the current token accepts them and stops when not.

For the record, select 0alias from dual is also valid in Oracle.

@mrotteveel
Copy link
Member Author

mrotteveel commented May 23, 2023

I fully disagree with you here.

It's so valid as is the expression 1+2. Spaces are not required.

The lexer consumes characters as the current token accepts them and stops when not.

In the expression 1+2, the + is a delimiter token and 1 and 2 are nondelimiter tokens, see 5.2 <token> and <separator> of SQL:2016-2, and the rule:

7) Any <token> may be followed by a <separator>. A <nondelimiter token> shall be followed by a <delimiter
token> or a <separator>.
NOTE 130 — If the Format does not allow a <nondelimiter token> to be followed by a <delimiter token>, then that <nondelimiter token> shall be followed by a <separator>.

Following those rules, things like 0x123456789012345678901234567890123 and 0alias are not allowed to be parsed as if they are 0 x123456789012345678901234567890123 or 0 alias, because neither 0 nor x123456789012345678901234567890123 or alias are delimiter tokens: they are nondelimiter tokens, so they must be separated by a separator (comment or whitespace), or a delimiter token, but that doesn't apply in this example.

For the record, select 0alias from dual is also valid in Oracle.

Then Oracle does it wrong too.

@asfernandes
Copy link
Member

asfernandes commented May 23, 2023

In the expression 1+2, the + is a delimiter token and 1 and 2 are nondelimiter tokens, see 5.2 <token> and <separator> of SQL:2016-2, and the rule:

  1. Any <token> may be followed by a <separator>. A <nondelimiter token> shall be followed by a <delimiter
    token> or a <separator>.
    NOTE 130 — If the Format does not allow a <nondelimiter token> to be followed by a <delimiter token>, then that <nondelimiter token> shall be followed by a <separator>.

Ok, but then choose to follow the standard for no good reason and break applications. I see no need for that.

The main issue could and should be fixed and actually this part has nothing to do with it.

@mrotteveel
Copy link
Member Author

Ok, but then choose to follow the standard for no good reason and break applications. I see no need for that.

I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like select 0alias from atable and expect it not to produce an error.

The main issue could and should be fixed and actually this part has nothing to do with it.

I disagree. The rule I quote has everything to do with it: if the rule would be followed then even the current tokenization (which for 0x123456789012345678901234567890123 yields token 0 and token x123456789012345678901234567890123, and for 0alias yields token 0 and token alias) would have resulted in an error because these are nondelimiter tokens and thus are not allowed to be next to each other without a separator (comment or whitespace).

@asfernandes
Copy link
Member

I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like select 0alias from atable and expect it not to produce an error.

I already received the same question by people, "why this works"? So, people write it.

For me it's nonsense like things 1+2+3 without spaces, and yes, people write it (a lot).

@aafemt
Copy link
Contributor

aafemt commented May 23, 2023

If you wanted to write an arithmetical nonsense you should use 1--2+-3 as the example.

@dyemanov
Copy link
Member

dyemanov commented May 23, 2023 via email

@mrotteveel
Copy link
Member Author

I'm not sure what you mean with this. The rule I quote is a sane rule for a parser/tokenizer, and I suspect no one in their right mind would write something like select 0alias from atable and expect it not to produce an error.

I already received the same question by people, "why this works"? So, people write it.

You're proving my point that they would expect it to produce an error.

For me it's nonsense like things 1+2+3 without spaces, and yes, people write it (a lot).

And following the rule I quoted that will continue to work, because + is a delimiter token, and thus is allowed to occur between two nondelimiter tokens without a separator.

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented May 23, 2023 via email

@asfernandes
Copy link
Member

Adriano, when you say that people asked you about it - may be they mistyped and were surprised that it works?

I don't know.

It may be someone reading an existing working code and surprised by it.

My point is that as it also works in Oracle, it's not so uncommon.

It's not standard compliant, but if we are going to change every piece that not follows the standard, our users will be in problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants