Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default quoting needs to be smarter #515

Closed
seancorfield opened this issue Nov 17, 2023 · 8 comments
Closed

Default quoting needs to be smarter #515

seancorfield opened this issue Nov 17, 2023 · 8 comments
Assignees
Labels
needs analysis I need to think about this!

Comments

@seancorfield
Copy link
Owner

Derived from discussion around PR #514

I get this. The thought behind this PR was essentially building on top the existing default (don't quote anything), but honeysql will still quote based on this regex ^[A-Za-z0-9_]_+.

It only does that if :quoted nil. If you explicitly specify :quoted false it will turn quoting off. The behavior is:

  • :quoted true -- always quote using the dialect's stropping
  • :quoted nil -- default: only quote non-alphanumeric entities
  • :quoted false -- do not quote anything

I gather from your response that the problem you want to solve here is that:

  • "always quoting" makes things case-sensitive and you don't want that (because you're working from existing SQL that would fail if entities were quoted -- because it already doesn't match the case of tables/columns in the database)
  • and the existing "quote non-alphanumerics" isn't enough because you have columns in the existing SQL that are reserved words in your new SQL database (that presumably weren't in the old SQL database)
  • and you can't use the "never quote" approach either due to the reserved words issue

Is that an accurate expression of the problem?

(yes)

Part of the issue is that different dialects have different "must quote" rules. 2023_11_16 is perfectly acceptable to some databases, without quoting, but would require quoting for others. The same applies to the list of reserved words that would need to be quoted when used as column names.

A wrinkle in this is DDL that uses things like [:varchar :80] and expects that to produce VARCHAR(80) without quoting "80". So, either known DDL contexts must know to override some of the rules (*ddl-no-quote* perhaps?) or it's completely "caveat programmer" and if we let users provide arbitrary overriding rules, and it breaks their DDL, well then too bad, too sad?

@p-himik
Copy link
Sponsor Contributor

p-himik commented Nov 17, 2023

A wrinkle in this is DDL that uses things like [:varchar :80]

Why not [:varchar 80]?
Might be my personal thing but :80 looks icky - much more icky than :a.b even though both aren't explicitly allowed.

@seancorfield
Copy link
Owner Author

For good or bad, it's existing documented behavior. [:varchar 80] in any other context would become VARCHAR(?) with a parameter of 80 -- which doesn't fly for DDL.

So there are backward compatibility issues here in spades...

@kranstrom
Copy link

Would it be acceptable to "hardcode" the different data types that expect a numeric input? (e.g. char, varchar, char varying, number, numeric, etc.)

I know honeysql is meant to be flexible and there are many database providers out there that it's already compatible with (we are using several ourselves). However, it seems to be quite beneficial for the current function to separate between knowing whether it's assessing one of these data type examples and anything else that's a quotable identifier.

Happy to conjure up an example in a fork and link it if that's the case. Anyway I can be helpful here - let me know. Thanks

@seancorfield
Copy link
Owner Author

DDL is out of scope for this issue, so let's stay on topic. I'll just note that the "function-like" behavior in DDL just falls out of general function "syntax" in HoneySQL and is not explicitly programmed at all (because DDL is just too large and irregular a space across multiple databases). If you want to suggest changes to DDL generation, feel free to create a separate issue.

seancorfield added a commit that referenced this issue Nov 20, 2023
this is a partial solution, intended to catch (and quote) things like
`0abc` while not changing the behavior for `80` or `2023_11_20`
@seancorfield
Copy link
Owner Author

So far, I have updated the code to exclude identifiers like 0abc from the "OK not to quote" rule -- so they will now get quoted -- but 80 and 2023_11_16 will still be "OK not to quote" and be left as-is.

As far as addressing the concerns expressed above and in the PR, I'm still leaning toward an additional, optional regex flag that matches identifies that you want quoted even if they satisfy the existing "OK not to quote" rule. That will definitely come with some caveats: if you provide a regex that matches things like 80 then you'll "break" the DDL behavior mentioned above (at least until that gets changed).

@kranstrom
Copy link

Thanks for the work/input here Sean.

As far as addressing the concerns expressed above and in the PR, I'm still leaning toward an additional, optional regex flag that matches identifies that you want quoted even if they satisfy the existing "OK not to quote" rule.

So just so I understand, if I (for example) wanted to quote reserved words - I'd pass those in as regex? I do like that suggestion fwiw.

@seancorfield
Copy link
Owner Author

That is my current thinking, yes.

@seancorfield seancorfield self-assigned this Nov 25, 2023
@seancorfield seancorfield added the needs analysis I need to think about this! label Nov 25, 2023
seancorfield added a commit that referenced this issue Dec 4, 2023
@kranstrom
Copy link

Beautiful @seancorfield , thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs analysis I need to think about this!
Projects
None yet
Development

No branches or pull requests

3 participants