Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate length of query parameter values #3505

Open
nPraml opened this issue Oct 22, 2024 · 5 comments
Open

Validate length of query parameter values #3505

nPraml opened this issue Oct 22, 2024 · 5 comments

Comments

@nPraml
Copy link
Contributor

nPraml commented Oct 22, 2024

Hello @rbygrave,

we have a search function in our application where users can search by different parameters, for example they search for German accounts (search by the two-digit country code).
The user occasionally enters texts that are too long (example: 3 letters for the two-digit country code). The application creates an ebean query with the entered search parameters and the database responds with an exception in some cases.

The following query comes out: select * from my_table where country_code = ? -- bind 'abc' but CountryCode would be a varchar(2)

For the above equals query, DB2 throws an error if the search parameter is longer than dbColumn length:
Caused by: com.ibm.db2.jcc.am.SqlDataException: DB2 SQL Error: SQLCODE=-302, SQLSTATE=22001, SQLERRMC=null, DRIVER=4.33.31

On the other hand, MariaDb executes the query without any problems and of course does not return any result (as if the where clause had 1=0).

The same exception also happens for in-queries in DB2, but not with like, contains, startsWith, e.g. country_code like 'DEE' is successfully processed by DB2 and does not return any result as expected.

I focused my experiments mainly on DB2 and found the following:

  • if I execute a raw query from ebean (or in DBeaver) with a value that is too long, the query is executed successfully without an exception: DB.findDto(Customer.class, "select t0.id, t0.name, t0.status from o_customer t0 where t0.name < '" + LONG_SEARCH_VALUE + "'").findList()
  • The raw query no longer works with parameters either: DB.findDto(Customer.class, "select t0.id, t0.name, t0.status from o_customer t0 where t0.name < :param").setParameter("param", LONG_SEARCH_VALUE).findList() An exception also flies here.

--> the database can handle long values ​​in some cases, but the driver cannot handle parameters that are too long in prepared queries.
I haven't found an option as to how or whether you could turn off this check in the DB2 driver.

What is not yet clear is what should happen be gt/ge/lt/le operations (e.g. country_code < 'DEE')?
DB2 raw query (without parameters) actually returns all country codes that appear alphabetically before DE.
With prepared statement and parameters you get an exception.

Our first solution suggestion was that we simplify the queries in such a way (simplify call) that impossible where-conditions are replaced by 1=0 or always true conditions are replaced by 1=1.
e.g. where country_code = 'DEE' --> where 1=0 i.e. the query returns no result (and the DB driver no longer returns an exception).
The eq and in operations can be changed easily, but what about ge/gt/le/lt, how should these queries look simplified?
We have pushed the first draft of the proposal here: https://github.com/FOCONIS/ebean/pull/111/files, but it is not really correct yet as some operations cannot be clearly simplified.

Our second idea was that will validate the queries: not only whether the properties exist (unknownProperties()), but also whether the values ​​correspond to the column length (tooLongProperties()).
We started a PR with this idea too: https://github.com/FOCONIS/ebean/pull/113/files
If you issue an invalid query with parameters that are too long, a specific exception would fly in ebean that this is not allowed and the application would behave the same for every database:
Catch this specific exception and display a meaningful error message to the user.

For the use case described above (parameter values ​​that are too long), we would need a generic solution that works the same for all databases. (since our automated tests run against h2, and it behaves differently than the other platforms)
Can you give us feedback on what you think of our ideas or how the problem should be addressed?

Kind regards
Noemi

@rbygrave
Copy link
Member

DB2 throws an error ... On the other hand, MariaDb executes the query without any problems and of course does not return any result

IMO the DB2 behaviour is unexpected and undesirable.

I understand the desire to "normalise" the behaviour across the different databases but I really feel like I don't want Ebean to own extra complexity just because of undesirable DB2 specific behaviour [and I do believe this is specific to DB2].

The alternative is that the user input needs to be sanitised/validated/normalised by the application before it uses the input to execute the query.

My gut does not like the idea of auto-magically modifying the "longer than db column length" into a 1=0 type false expression for this case. My desire would be to put the onus on the application to validate the input, and invalid input ideally doesn't execute ANY query to the database at all.

Cheers, Rob.

@rPraml
Copy link
Contributor

rPraml commented Oct 24, 2024

Hello @rbygrave,
I discussed various solutions with @nPraml

My gut does not like the idea of auto-magically modifying the "longer than db column length" into a 1=0 type false expression for this case.

I am also unhappy with this idea, as it is not always possible (and adds extra complexity to ebean)

My desire would be to put the onus on the application to validate the input, and invalid input ideally doesn't execute ANY query to the database at all.

I agree with you, but our problem is that it would be extremely complex to get the length information into the UI.

All information would be available in Ebean, so our second idea would have been to extend Ebean's validation API so that it not only checks the field names, but also the field lengths (and possibly also the data types)

If you are not completely against it, we would make a suggestion and extend Set<String> query.validate() (or write a validateExtended()) which, in addition to the missing properties, also complains about properties that are either too long or the data type does not match:

query = DB.find(Customer.java).where().eq("nameFoo", "Roland").query()
query.validate() // reports "nameFoo" does not exist

query = DB.find(Customer.java).where().eq("name",  veryLongString).query()
query.validate() // reports "name": value too long

query = DB.find(Tenant.java).where().eq("id",  "406d0e9e-3ebb-45e2-ae90-7f7f0e258132").query()
query.validate() // reports "id": type of value (string) does not map to property type (uuid)
// In the last case, we often had the problem that our tests running under H2 worked because
// H2 performed an implicit conversion but did not produce a result under MariaDB / DB2 
// where byte(16) is used as the data type for UUIDs.

We can then easily provide the validation result to the UI and it would help us immensely in our use case.
This solution should not add too much complexity to ebean. It also would not intercept on normal query execution, but only when calling query.validate explicit.

Roland

@jnehlmeier
Copy link

I agree with you, but our problem is that it would be extremely complex to get the length information into the UI.

Why into the UI? Sure it would be preferrable but it would be enough to have it on server side at the location that would construct and call the query. So typically some form of DAO method.

I don't really see a difference between

  1. The developer writes a query and thinks "oh wait, that is user input. I should probably call query.validate()".
  2. The developer writes a query and thinks "oh wait, that is user input. I should probably validate the user provided values before setting query parameters".

Your server side application should also have all information to validate user input because the application has defined its database schema.

Side note: I don't really see a use case for current querybuilder.validate() method. If code builds queries using invalid properties or path expressions, then the code is wrong (well or the server code is correct but the UI isn't great if the UI is some query builder itself). Ebean should simply throw exceptions if the query is messed up and cannot be translated to SQL and remove validate().

If an ORM has a validate method who defines what to validate? Entity properties? Path expressions? Field lengths? Value ranges? Regex on values? Allowed values for enumerations? Combination of fields / validation groups? SQL syntax for native sql queries? SQL column and table names in native sql queries? ..., ...

JPA also validates only the bare minimum itself:

  • JPQL syntax
  • Entity fields used in JPQL so it can translate it to SQL

Any application specific validation is delegated to Bean Validation API. If you write native SQL in JPA then it only relies on JDBC / DB validation.

@rPraml
Copy link
Contributor

rPraml commented Oct 25, 2024

I agree with you, but our problem is that it would be extremely complex to get the length information into the UI.

Why into the UI? Sure it would be preferrable but it would be enough to have it on server side at the location that would construct and call the query. So typically some form of DAO method.

I have to go into a bit of detail here. We (unfortunately) use database systems where "dbstring.length == javastring.length" does not apply. DB2 in particular is a spoilsport here, as 4 €€€€ characters do not fit into a varchar(10) column, for example, because DB2 counts UTF8 bytes. I know, there are ways to change this with charset/collation etc., but here, it means, that a varchar(10) can take 10 bytes and not 10 UTF-8 chars.

We had to extend the BeanValidation so that it can validate correctly for such databases. There are also some tweaks in ebean. e.g. #3341.

We have a UI, where you can add SQL/EQL like syntax, where we effectively build a ebean query with all bind parameters (and we, of course have also simple filters, like in excel, where the end user just can type in the value he wants to filter, but they are also translated in such a query). It would be great, if we can validate the query (especially the bind values - with the correct utf8 length check, if neccessary), as these informations are easy avaliable in ebean. And we want/have to do that validation before we execute the query and show the user, that the query he has entered does not make sense.
So just call query.validate would help us a lot here.

Otherwise we would have to implement the complete checking mechanism a third time in the filter UI, more precisely in the associated backend.

I don't really see a difference between

1. The developer writes a query and thinks "oh wait, that is user input. I should probably call query.validate()".

2. The developer writes a query and thinks "oh wait, that is user input. I should probably validate the user provided values before setting query parameters".

that's the point. We get queries like customer.order.country = "germany" and customer.name starts with "Rol". We convert them into an AST (abstract syntax tree) and then we build the query DB.find(Customer.class).where().eq("order.country", "germany").startsWith("name", "Rol")
Yes, we can also validate the AST before we build the query, retrieve length information for every el-path, check if we have to count characters or bytes...

Your server side application should also have all information to validate user input because the application has defined its database schema.

The schema is implicit defined with all the @Size and @Column annotations. But it is cumbersome to get them. I can either use BeanValidation, that does the job for me, but this does not support el-paths the same way as ebean does in queries, or use some ebeaninternal API (beanDescriptor etc.) where the information is stored ... or write the third way, that scans all annotations in the entity classes with reflection.

Side note: I don't really see a use case for current querybuilder.validate() method. If code builds queries using invalid properties or path expressions, then the code is wrong (well or the server code is correct but the UI isn't great if the UI is some query builder itself). Ebean should simply throw exceptions if the query is messed up and cannot be translated to SQL and remove validate().

For wrong properties, I would partially agree, as there is a public API how to validate. What I am missing is a public API where I can check if the length matches to a given property.

I will think again over the weekend about how I can implement a sustainable solution here. It might be enough if Ebean provides a simpler API than query.validate.

Roland

@jnehlmeier
Copy link

Yes, we can also validate the AST before we build the query, retrieve length information for every el-path, check if we have to count characters or bytes...

IMHO that is the way to go and visitors are a great tool to do it. Once you know your AST is correct and valid then follow up code might become easier as well and more importantly all validation does not depend on ebean anymore and can be tested independently.

Personally I would probably even go further and think about a tool that either uses all entities or the database schema as source of truth and generates the grammar for your lexer/parser library. The generated grammar can be pretty specific to the entities/database so that the parser already fails parsing customer.order.country if country does not exist on order in the source of truth. If the grammar is more generic to only capture the pure syntax then AST visitors would need to do more validation work instead.

Such a tool might also be able to generate AST visitors to validate user provided values because during the generation process it should know all the details (reflection if entities are chosen or database metadata otherwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants