Add JSON_EXTRACT ES|QL scalar function#142375
Conversation
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
This PR continues the work from #141507, moved to a clean branch with correct authorship. |
39f4e63 to
73b25ab
Compare
238e5da to
558d575
Compare
15b2fec to
8afae3e
Compare
d39c5a5 to
a1a1c0a
Compare
a1a1c0a to
57fd0fd
Compare
cdfe4d2 to
f233107
Compare
Use $ prefix in nested and array examples so users see both styles are equivalent, rather than implying $ is only for root selection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion consistently Add paired examples (bare "name" vs "$.name") to demonstrate $ prefix equivalence. Remove $ from nested/array examples so only example 2 and the top-level array (which requires $[N]) use the root selector. Rewrite example descriptions to be task-oriented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CSV spec tests for bracket notation: simple key (['name']), nested keys (['user']['address']['city']), key containing a dot (['user.name']), and mixed dot/bracket (user['address'].city). Add doc example showing bracket notation as equivalent to dot-notation for the same nested path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update description to explain that dots in dot notation are always path separators per RFC 9535, so JSON keys literally containing dots must use bracket notation. Change bracket notation doc example to use a key with a dot (user.name) to demonstrate the practical use case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tation Lead with the three ways to write paths (dot, bracket, mixed) and when they are interchangeable. Then explain when bracket notation is required (dots in keys, special characters, empty keys, array indices). Group return value semantics, error handling, and unsupported features into separate paragraphs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lity Rewrite the $ description to explain it is always optional ($.name and name are equivalent), supported for JSONPath compatibility, and only necessary when indexing into a top-level array since there is no key name to start the path with. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rrays The description incorrectly implied $ was necessary for top-level array access. In fact [0] and $[0] are equivalent — $ is always optional. Add CSV spec test for top-level array index without $ prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move standalone test methods into parameterized suppliers using helpers (fixedCase, warningCase, foldingWarningCase, nullResultCase) following the ReplaceTests/LocateTests pattern. This gives all extraction tests the framework's automatic null handling, concurrency, constant folding, and evaluator toString checks. Clean up JsonExtract.java: remove dead variables, collapse duplicate switch branches, inline unused locals, and trim redundant comments.
- Switch JsonExtract from EsqlScalarFunction to BinaryScalarFunction, eliminating manual field management, writeTo, and foldable boilerplate - Add appliesTo version tag for preview 9.4.0 per docs team request - Use single-call expectThrows with matcher in JsonPathTests
BinaryScalarFunction is not the convention for ES|QL scalar functions (only Split uses it). Revert to EsqlScalarFunction following the standard pattern (Left, Right, Replace, etc.) and use List.of instead of Arrays.asList for immutable children list per reviewer suggestion.
- Remove snapshot gate from FN_JSON_EXTRACT capability (tech preview) - Add JSON_EXTRACT to string-functions docs page with layout include - Split @FunctionInfo into description + detailedDescription - Fix AsciiDoc link syntax to Markdown
Add a json_logs test dataset with 9 rows of OpenTelemetry-style structured log data containing dotted keys, nested objects, and arrays. Convert all doc examples to use OTel-themed payloads and triple-quoted strings for readability. Add FROM-based doc example for bracket notation with dotted keys. Add 6 FROM-index tests exercising bracket notation, dynamic paths, filtering, and optimizer push-down behavior.
… add employees test - Remove JsonExtractException, throw IllegalArgumentException directly - Change empty input message from "invalid JSON input" to "empty JSON input" - Move JsonExtractSourceIT from internalClusterTest to REST integration test - Rename doTestJsonExtractFromSource to verifyJsonExtractFromSource - Static import containsString and equalTo from Matchers - Add FROM employees CONCAT round-trip CSV spec test
…ings The json_logs payload field contained embedded JSON with unescaped double quotes, producing invalid bulk API JSON. Fix toJson() to always treat text/keyword fields as strings: strip outer CSV quotes if present, escape inner quotes, and wrap. Also remove spurious outer quotes from json_logs.csv.
a300ef0 to
8b93c04
Compare
|
buildkite test this |
1 similar comment
|
buildkite test this |
Summary
Adds a new ES|QL scalar function
JSON_EXTRACT(string, path) → keywordthat extracts values from JSON strings using a subset of JSONPath (RFC 9535).Tech Preview — available from 9.4.0, gated behind
FN_JSON_EXTRACTcapability. Wired into the string functions documentation page.Parameters
stringkeyword,text,_source_sourcemetadata fieldpathkeyword,textname,user.address.city,orders[1].item)Supported path syntax
Paths can use dot notation (
user.address.city), bracket notation (['user']['address']['city']), or a mix of both (user['address'].city). For simple keys they are interchangeable —a.banda['b']produce the same result.Bracket notation is required for keys containing dots or special characters (
['user.name']), empty string keys (['']), and array indices (items[0]). Dots in dot notation are always path separators per RFC 9535 — a JSON key literally named"user.name"must use['user.name'].The JSONPath
$root selector is supported for compatibility but is always optional —$.nameandnameare equivalent, and$[0]and[0]are equivalent. Optional whitespace is allowed inside brackets ([ 0 ]is equivalent to[0]). Path matching is case-sensitive per the JSON specification.Multi-value JSONPath features (wildcards, recursive descent, filters, slicing, negative indices) are not currently supported.
Return value
The extracted value is returned as a
keywordstring: string values without surrounding quotes, numbers and booleans as their string representation, and objects or arrays as JSON strings. Returnsnullif either parameter isnullor if the extracted JSON value isnull.Returns
nulland emits a warning if the input is not valid JSON, the input is empty, the path is malformed, the path does not exist, the array index is out of bounds, or the path attempts to traverse through a non-object/non-array value.Key implementation details
XContentParser— skips fields not on the path without materializing them_source: Detects encoding viaXContentFactory.xContentType()— works with JSON, SMILE, CBOR, and YAMLXContentBuilder.copyCurrentStructure()for objects/arrays. Future zero-copy byte slicing tracked in Expose byte offsets on XContentParser for zero-copy sub-structure extraction #142873IllegalArgumentExceptiondirectly for all error conditions (empty input, invalid JSON, missing path, out-of-bounds index). The evaluator catches these and emits warnings.Documentation
{applies_to}version tags (preview 9.4.0)@FunctionInfousesdescription(short summary) +detailedDescription(full details) per docs team guidanceTest coverage
JsonExtractTests.java) — ~70 inline tests + parameterized suppliers + 4 randomized XContent encoding testsJsonPathTests.java) — dot/bracket notation, quoted keys, escapes, error positions, randomized round-tripsjson_extract.csv-spec) — bracket notation (simple, nested, dot-in-key, mixed), top-level arrays (with and without$), top-level scalars, deep mixed nesting, duplicate keys, null-in-array, FROM-index tests with OTel-stylejson_logsdataset, FROMemployeesCONCAT round-trip testJsonExtractSourceIT.javainqa/server/single-node) —JSON_EXTRACT(_source, ...)against real indexed documents across all four XContent encodings (JSON, SMILE, CBOR, YAML) in both SYNC and ASYNC modesJsonExtractSerializationTests.java)JsonExtractErrorTests.java)