Skip to content

Add JSON_EXTRACT ES|QL scalar function#142375

Merged
quackaplop merged 54 commits intoelastic:mainfrom
quackaplop:feature/json-extract
Feb 26, 2026
Merged

Add JSON_EXTRACT ES|QL scalar function#142375
quackaplop merged 54 commits intoelastic:mainfrom
quackaplop:feature/json-extract

Conversation

@quackaplop
Copy link
Copy Markdown
Contributor

@quackaplop quackaplop commented Feb 12, 2026

Summary

Adds a new ES|QL scalar function JSON_EXTRACT(string, path) → keyword that extracts values from JSON strings using a subset of JSONPath (RFC 9535).

Tech Preview — available from 9.4.0, gated behind FN_JSON_EXTRACT capability. Wired into the string functions documentation page.

Parameters

Parameter Type Description
string keyword, text, _source JSON input or _source metadata field
path keyword, text JSONPath expression (e.g., name, user.address.city, orders[1].item)

Supported path syntax

Paths can use dot notation (user.address.city), bracket notation (['user']['address']['city']), or a mix of both (user['address'].city). For simple keys they are interchangeable — a.b and a['b'] produce the same result.

Bracket notation is required for keys containing dots or special characters (['user.name']), empty string keys (['']), and array indices (items[0]). Dots in dot notation are always path separators per RFC 9535 — a JSON key literally named "user.name" must use ['user.name'].

The JSONPath $ root selector is supported for compatibility but is always optional — $.name and name are equivalent, and $[0] and [0] are equivalent. Optional whitespace is allowed inside brackets ([ 0 ] is equivalent to [0]). Path matching is case-sensitive per the JSON specification.

Multi-value JSONPath features (wildcards, recursive descent, filters, slicing, negative indices) are not currently supported.

Return value

The extracted value is returned as a keyword string: string values without surrounding quotes, numbers and booleans as their string representation, and objects or arrays as JSON strings. Returns null if either parameter is null or if the extracted JSON value is null.

Returns null and emits a warning if the input is not valid JSON, the input is empty, the path is malformed, the path does not exist, the array index is out of bounds, or the path attempts to traverse through a non-object/non-array value.

Key implementation details

  • Streaming parser: Uses XContentParser — skips fields not on the path without materializing them
  • Multi-encoding _source: Detects encoding via XContentFactory.xContentType() — works with JSON, SMILE, CBOR, and YAML
  • Structure serialization: Uses XContentBuilder.copyCurrentStructure() for objects/arrays. Future zero-copy byte slicing tracked in Expose byte offsets on XContentParser for zero-copy sub-structure extraction #142873
  • Constant path optimization: Parses path once, reuses across all rows via specialized evaluator
  • Duplicate keys: Returns first match (streaming parser semantics)
  • Error handling: Throws IllegalArgumentException directly for all error conditions (empty input, invalid JSON, missing path, out-of-bounds index). The evaluator catches these and emits warnings.

Documentation

  • Function listed in string functions page with {applies_to} version tags (preview 9.4.0)
  • @FunctionInfo uses description (short summary) + detailedDescription (full details) per docs team guidance
  • Kibana definition and docs auto-generated from annotations

Test coverage

  • Unit tests (JsonExtractTests.java) — ~70 inline tests + parameterized suppliers + 4 randomized XContent encoding tests
  • Path parsing tests (JsonPathTests.java) — dot/bracket notation, quoted keys, escapes, error positions, randomized round-trips
  • CSV spec tests (json_extract.csv-spec) — bracket notation (simple, nested, dot-in-key, mixed), top-level arrays (with and without $), top-level scalars, deep mixed nesting, duplicate keys, null-in-array, FROM-index tests with OTel-style json_logs dataset, FROM employees CONCAT round-trip test
  • REST integration tests (JsonExtractSourceIT.java in qa/server/single-node) — JSON_EXTRACT(_source, ...) against real indexed documents across all four XContent encodings (JSON, SMILE, CBOR, YAML) in both SYNC and ASYNC modes
  • Serialization tests (JsonExtractSerializationTests.java)
  • Error tests (JsonExtractErrorTests.java)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 12, 2026

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@quackaplop
Copy link
Copy Markdown
Contributor Author

This PR continues the work from #141507, moved to a clean branch with correct authorship.

quackaplop and others added 17 commits February 25, 2026 23:41
Use $ prefix in nested and array examples so users see both styles
are equivalent, rather than implying $ is only for root selection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion consistently

Add paired examples (bare "name" vs "$.name") to demonstrate $ prefix
equivalence. Remove $ from nested/array examples so only example 2 and
the top-level array (which requires $[N]) use the root selector.
Rewrite example descriptions to be task-oriented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CSV spec tests for bracket notation: simple key (['name']), nested
keys (['user']['address']['city']), key containing a dot (['user.name']),
and mixed dot/bracket (user['address'].city). Add doc example showing
bracket notation as equivalent to dot-notation for the same nested path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update description to explain that dots in dot notation are always path
separators per RFC 9535, so JSON keys literally containing dots must use
bracket notation. Change bracket notation doc example to use a key with
a dot (user.name) to demonstrate the practical use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tation

Lead with the three ways to write paths (dot, bracket, mixed) and when
they are interchangeable. Then explain when bracket notation is required
(dots in keys, special characters, empty keys, array indices). Group
return value semantics, error handling, and unsupported features into
separate paragraphs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lity

Rewrite the $ description to explain it is always optional ($.name and
name are equivalent), supported for JSONPath compatibility, and only
necessary when indexing into a top-level array since there is no key
name to start the path with.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rrays

The description incorrectly implied $ was necessary for top-level array
access. In fact [0] and $[0] are equivalent — $ is always optional.
Add CSV spec test for top-level array index without $ prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move standalone test methods into parameterized suppliers using helpers
(fixedCase, warningCase, foldingWarningCase, nullResultCase) following
the ReplaceTests/LocateTests pattern. This gives all extraction tests
the framework's automatic null handling, concurrency, constant folding,
and evaluator toString checks.

Clean up JsonExtract.java: remove dead variables, collapse duplicate
switch branches, inline unused locals, and trim redundant comments.
- Switch JsonExtract from EsqlScalarFunction to BinaryScalarFunction,
  eliminating manual field management, writeTo, and foldable boilerplate
- Add appliesTo version tag for preview 9.4.0 per docs team request
- Use single-call expectThrows with matcher in JsonPathTests
BinaryScalarFunction is not the convention for ES|QL scalar functions
(only Split uses it). Revert to EsqlScalarFunction following the
standard pattern (Left, Right, Replace, etc.) and use List.of instead
of Arrays.asList for immutable children list per reviewer suggestion.
- Remove snapshot gate from FN_JSON_EXTRACT capability (tech preview)
- Add JSON_EXTRACT to string-functions docs page with layout include
- Split @FunctionInfo into description + detailedDescription
- Fix AsciiDoc link syntax to Markdown
Add a json_logs test dataset with 9 rows of OpenTelemetry-style
structured log data containing dotted keys, nested objects, and arrays.
Convert all doc examples to use OTel-themed payloads and triple-quoted
strings for readability. Add FROM-based doc example for bracket notation
with dotted keys. Add 6 FROM-index tests exercising bracket notation,
dynamic paths, filtering, and optimizer push-down behavior.
… add employees test

- Remove JsonExtractException, throw IllegalArgumentException directly
- Change empty input message from "invalid JSON input" to "empty JSON input"
- Move JsonExtractSourceIT from internalClusterTest to REST integration test
- Rename doTestJsonExtractFromSource to verifyJsonExtractFromSource
- Static import containsString and equalTo from Matchers
- Add FROM employees CONCAT round-trip CSV spec test
…ings

The json_logs payload field contained embedded JSON with unescaped double
quotes, producing invalid bulk API JSON. Fix toJson() to always treat
text/keyword fields as strings: strip outer CSV quotes if present, escape
inner quotes, and wrap. Also remove spurious outer quotes from json_logs.csv.
@quackaplop
Copy link
Copy Markdown
Contributor Author

buildkite test this

1 similar comment
@quackaplop
Copy link
Copy Markdown
Contributor Author

buildkite test this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:ES|QL v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants