ESQL: Categorize grouping function testing improvements#118013
ESQL: Categorize grouping function testing improvements#118013ivancea merged 7 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| // TODO: randomize and try multiple pages. | ||
| // TODO: assert the state of the BlockHash after adding pages. Including the categorizer state. | ||
| // TODO: also test the lookup method and other stuff. | ||
| // TODO: randomize values? May give wrong results |
There was a problem hiding this comment.
I could see randomizing the values in a separate test and asserting that we get expected results/don't crash out.
alex-spies
left a comment
There was a problem hiding this comment.
I only reviewed the CSV test + QL changes. They look good to me, except for the capability requirement (see below).
Thanks Ivan!
| TRUE, // Whether the expression can become null | ||
| FALSE, // The expression can never become null | ||
| UNKNOWN // Cannot determine if the expression supports possible null folding | ||
| /** |
x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public Nullability nullable() { | ||
| // Both nulls and empty strings result in null values |
There was a problem hiding this comment.
++ for the comment, thanks!
There was a problem hiding this comment.
To be precisely: any string for which the analyzer returns zero tokens results in a null category. The analyzer discards tokens like numbers, hex.numbers, and stopwords like Jan, Feb, Mon, Tue, ...
💚 Backport successful
|
Added some extra tests on the CategorizeBlockHash. Added NullFold rule comments, and forced nullable() to TRUE on Categorize.
| } else if (e instanceof Alias == false | ||
| && e.nullable() == Nullability.TRUE | ||
| } else if (e instanceof Alias == false && e.nullable() == Nullability.TRUE | ||
| // Categorize function stays as a STATS grouping (It isn't moved to an early EVAL like other groupings), |
There was a problem hiding this comment.
The story of null folding is not complete (PropagateEvalFoldables has missing pieces), but I appreciate adding this comment here. I was kind of confused and surprised to see here this out-of-nowhere instanceof Categorize.
Added some extra tests on the CategorizeBlockHash.
More autogenerated/randomized testcases could be added, but given the complexity of the Categorize function, we would probably have to have some kind of whitelist of strings, and that would require some extra time that we won't be investing here yet.
So for now, this PR adds extra checks for all its methods, and some loops to check categories are unchanged.
I planned to remove the Categorize exception we have in NullFold rule here, but it's more complicated than expected.