[ESQL] Per-file filter pushdown awareness#145755
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @costin, I've created a changelog YAML for you. |
🔍 Preview links for changed docs⏳ Building and deploying preview... View progress This comment will be updated with preview links when the build is complete. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Make filter pushdown aware of per-file column availability and types in UNION_BY_NAME scenarios. Files whose filter columns are entirely absent are skipped at split discovery time. For files that do contain the columns, pushed ESQL expressions are adapted to the file's column set and re-translated to format-native filters. Type-widened columns (e.g. INTEGER file vs LONG unified) have their filter literals downcast with overflow detection.
bdbbba9 to
151f2a2
Compare
|
Hi @costin, I've created a changelog YAML for you. |
| /** | ||
| * Infers the file's native type from the unified attribute type and the cast target. | ||
| * The cast target is the unified (wider) type; the file has the narrower type. | ||
| */ | ||
| /** | ||
| * Infers the file's native type from the cast target. Only returns a narrower type when | ||
| * the adaptation is safe for integral comparisons (LONG→INTEGER). DOUBLE→INTEGER narrowing | ||
| * is not supported because literal truncation can cause incorrect predicate semantics. | ||
| */ |
There was a problem hiding this comment.
Fixed — merged the two javadoc blocks into one.
| // DOUBLE→INTEGER narrowing is intentionally not supported: Number.longValue() truncates | ||
| // fractional values, which can change comparison semantics (e.g., col < 2.7 vs col < 2). |
There was a problem hiding this comment.
Nit: if the methods stays, this can be a javadoc comment and the method simplified.
There was a problem hiding this comment.
Done — moved inline comment into the javadoc and removed the redundant one.
| * the adaptation is safe for integral comparisons (LONG→INTEGER). DOUBLE→INTEGER narrowing | ||
| * is not supported because literal truncation can cause incorrect predicate semantics. | ||
| */ | ||
| private static DataType inferFileType(DataType unifiedType, DataType castTarget) { |
There was a problem hiding this comment.
Removed — the parameter was left over from when DOUBLE→INTEGER was considered.
| if (adapted.isEmpty()) { | ||
| return formatReader.withPushedFilter(null); | ||
| } | ||
| FilterPushdownSupport.PushdownResult result = pushdownSupport.pushFilters(adapted); |
There was a problem hiding this comment.
Would probably be useful if we could cash the resolution at a level higher than per file (at some point).
There was a problem hiding this comment.
Agreed — we could cache the adapted PushdownResult keyed on the set of missing/widened columns so files with identical schemas share one translation.
* upstream/main: Mute org.elasticsearch.xpack.esql.expression.function.aggregate.FirstDocIdGroupingAggregatorFunctionTests testSimple elastic#145923 Reindex relocation: store source TaskResult at destination node (elastic#145488) Bump versions after 9.2.8 release [CI] DLMFrozenTransitionServiceTests testCheckForFrozenIndicesReturnsEarlyWhenCapacityExhausted failing [elastic#145778] (elastic#145906) Update branches.json for 9.2.8 release ESQL: Clarify inheriting from Attributes (elastic#145898) Bump versions after 9.3.3 release Update branches.json for 9.3.3 release Prune changelogs after 8.19.14 release Bump versions after 8.19.14 release Update branches.json for 8.19.14 release [ML] Call old inference API (elastic#145690) ESQL: Unmute CsvIT sumWithOverflowRow (elastic#145893) Index a document when testing runtime fields shadowing dimensions & metrics (elastic#145882) [TEST] Fix version check in testSequenceNumbersDisabled (elastic#145879) [ESQL] Per-file filter pushdown awareness (elastic#145755) Unmute testGetReindexFollowsRelocation (elastic#145841) Correctly ignore system indices when validating dot-prefixed indices (elastic#128868) [Transform] Remove tests for deleted code (elastic#145685) ESQL: Add generative tests for LIMIT BY (elastic#144238)
Make filter pushdown aware of per-file column availability and
types in UNION_BY_NAME scenarios. Files whose filter columns are
entirely absent are skipped at split discovery time. For files
that do contain the columns, pushed ESQL expressions are adapted
to the file's column set and re-translated to format-native
filters. Type-widened columns (e.g. INTEGER file vs LONG unified)
have their filter literals downcast with overflow detection.
Developed with AI-assisted tooling