ESQL: Enable pushing down LOOKUP JOIN past Project#127776
ESQL: Enable pushing down LOOKUP JOIN past Project#127776alex-spies wants to merge 6 commits intoelastic:mainfrom
Conversation
| List<ValuesSourceReaderOperator.FieldInfo> fields = new ArrayList<>(extractFields.size()); | ||
| for (NamedExpression extractField : extractFields) { | ||
| String physicalName = extractField instanceof FieldAttribute fa ? fa.fieldName() | ||
| : extractField instanceof Alias a ? ((NamedExpression) a.child()).name() |
There was a problem hiding this comment.
Needs a comment: alias and reference attribute cases only relevant for ENRICH
There was a problem hiding this comment.
Needs a bunch of additional tests + updating the expectations of the tests inside here.
| // TODO: This probably also led to bugs for LOOKUP JOIN on a union typed field, let's add a test. | ||
| this(match.exactAttribute().fieldName(), input.channel(), input.type()); |
There was a problem hiding this comment.
The diff touches multiple places that should have used field names but used attribute names, instead.
To make this PR cleaner, I think we should have a separate PR just with these fixes + corresponding tests. This should also address #127521.
|
This approach would require that we can rename the lookup attributes that |
Closes #119082
Assume a lookup index with fields
language_code, lookup_field. We want to push down a LOOKUP JOIN past an upstream Project, like so:Pulling up the
Projectallows us to combine it with otherProjects downstream, which may eliminate some lookup fields entirely. An example is the query from #119082:Avoiding the early
Projects also allows us to perform field extractions later - theProjectahead of theLOOKUP JOINotherwise causesInsertFieldExtractionto load any and all fields that we need from the main index before theLOOKUP JOIN.Like with any pushdown optimization, we have to deal with name conflicts:
LOOKUP JOINshadows any conflicting attributes if the lookup fields have the same name; in this regard, it behaves likeENRICHorEVAL.Example: Assume the field
lookup_fieldoccurs both inlookup_indexand inmain_index:There are 2 ways to deal with this:
ProjectorEvalupstream from theJointo rename conflicting attributes to some arbitrary names, then in the newProjectthat we place downstream from theJoin, name them to the desired names.LOOKUP JOINadds.Option 1. is not ideal, because the renaming before the
LOOKUP JOINcan still trigger field extractions. This PR thus goes with 2., which is also the approach our other pushdown rules take, see here.To implement 2., we leverage the fact that
LOOKUP JOINessentially behaves likeENRICH: thus, we can represent aLOOKUP JOINas a unary plan node by wrapping it in a dedicated class and then we apply the same pushdown logic as toENRICH,EVALetc.This requires that the (field) attributes that a
LOOKUP JOINadds to the plan can be renamed to arbitrary names, rather than using the physical field names. Ideally, we'd just use temporary qualifiers for this, but this mechanism doesn't exist yet. But! We already have field attributes with arbitrary attribute names and use them for union-typed fields; so we can do the same here and simply rename the field attributes of theEsRelationthat represents the lookup index (without actually renaming the corresponding physical fields they refer to).For this to work, we need to make sure that the compute code of
LOOKUP JOINdoesn't rely onFieldAttribute#name(the, potentially arbitrary, attribute name) but rather onFieldAttribute#fieldName(the name of the physical field). There are some places in the code where we don't use#fieldName, yet - these are bugs (and won't work with union types!) and need to be fixed and backported before the bwc tests of this PR can truly pass. This is related to #127521.