Fix precision loss when dealing with JSON numbers containing decimal point by findepi · Pull Request #28882 · trinodb/trino

findepi · 2026-03-26T11:39:02Z

fixes Loss of decimal number precision in json_parse, JSON constructor and connectors #28867
fixes Cast from JSON number with decimal point to VARCHAR loses precision #28881

General
* Improve precision when casting JSON numbers with decimal point to VARCHAR. #28881
* Fix incorrect result when using `json_parse` or JSON type constructor
  and document contains numbers with decimal point with more than 16 significant digits. #28867

MySQL, PostgreSQL, Mongo, Pinot, SingleStore
* Fix incorrect result when reading JSON column and document contains numbers
  with decimal point with more than 16 significant digits. #28867

findepi · 2026-03-26T11:50:57Z

-            // An alternative is calling getLongValue and then BigintOperators.castToVarchar.
-            // It doesn't work as well because it can result in overflow and underflow exceptions for large integral numbers.
-            case VALUE_NUMBER_INT -> utf8Slice(parser.getText());
+            case VALUE_NUMBER_INT, VALUE_NUMBER_FLOAT -> utf8Slice(parser.getDecimalValue().toString());


The returned text might be longer than double.toString we used to use.
This can cause query failures when casting json to varchar(n) for certain n values.
I think it's fine. The format of returned string is something that is not guaranteed to be stable.

Does this actually have to go through Git DoubleValue or can you just fetch the raw text out of the JsonParser and avoid the big decimal code entirelly?

Different JSON syntaxes encode same number: 100.0, 100.000, 1e2, 10e1, 00000000100.0.
I don't expect JSON number to VARCHAR cast to simply "leak" JSON internal syntax.

In theory, json values are supposed to be normalized, so it shouldn't matter.
in practice, SpecializeCastWithJsonParse assumes the cast normalizes the values.
Query result could depend on whether SpecializeCastWithJsonParse kicks in or not. This feels wrong.

Added a benchmark and posted results:

Retain precision when casting JSON number to VARCHAR #28917 (comment)
(new PR per Fix precision loss when dealing with JSON numbers containing decimal point #28882 (comment) )

The general purpose `jsonParse` utility was lossy when it comes to numbers containing a decimal point. This affected `json_parse` SQL function, `JSON` SQL type constructor and connectors which use `jsonParse` to canonicalize JSON representation on remote data read (e.g. PostgreSQL).

Before the change, when casting a JSON number containing decimal point to VARCHAR, the number would be converted first to DOUBLE. It resulted in unnecessary loss of information.

dain · 2026-03-26T20:38:25Z

@coderabbitai full review

coderabbitai · 2026-03-26T20:38:32Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-03-26T20:54:58Z

📝 Walkthrough

Walkthrough

This PR addresses precision loss when casting JSON numbers to VARCHAR and other types, and when parsing JSON with decimal points. The changes include: enabling USE_BIG_DECIMAL_FOR_FLOATS in JsonTypeUtil to parse floating-point JSON numbers as BigDecimal instead of Double, refactoring numeric token handling in JsonUtil to consolidate VALUE_NUMBER_INT and VALUE_NUMBER_FLOAT paths using getDecimalValue(), removing the unused ConnectorSession parameter from JSON cast methods in JsonToArrayCast, JsonToMapCast, and JsonToRowCast, and introducing comprehensive test coverage for JSON↔ARRAY/MAP casting with updated expected outputs reflecting the precision fix.

Assessment against linked issues

Objective	Addressed	Explanation
Preserve decimal precision when parsing JSON numbers in `json_parse` and JSON constructors [`#28867`]	✅
Prevent scientific notation conversion when casting JSON numbers to VARCHAR [`#28881`]	✅

Out-of-scope changes

Code Change	Explanation
Removal of `ConnectorSession` parameter from `toArray`, `toMap`, and `toRow` methods (JsonToArrayCast.java, JsonToMapCast.java, JsonToRowCast.java)	The linked issues focus on precision loss in JSON number casting and do not mention refactoring method signatures to remove session parameters. While this refactoring may be necessary preparatory work, it extends beyond the stated scope of fixing precision issues.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.1)

core/trino-main/src/test/java/io/trino/type/TestArrayOperators.java

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/trino-main/src/test/java/io/trino/type/TestArrayOperators.java`:
- Around line 1141-1163: The test block in TestArrayOperators.java exercises
JSON->ARRAY(VARCHAR) for integers, booleans, empty strings and short/normalized
decimals but misses a regression case for long-decimal values (>16 significant
digits); add another assertion in the same group using
assertions.expression("cast(a as ARRAY(VARCHAR))").binding("a", "JSON '[...]'")
that includes a long decimal literal (e.g. a numeric value with >16 significant
digits) and assert .hasType(new ArrayType(VARCHAR)).matches(...) where the
expected CAST(ARRAY[... ] AS ARRAY(VARCHAR)) contains the long decimal preserved
exactly as a string (not truncated/normalized), so the test fails if precision
is lost. Ensure the new assertion follows the same style and placement as the
other cases in the block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 536b1ec1-d907-49e3-9ce9-044a10567c0d

📥 Commits

Reviewing files that changed from the base of the PR and between d92f188 and 65591ea.

📒 Files selected for processing (10)

core/trino-main/src/main/java/io/trino/operator/scalar/JsonToArrayCast.java
core/trino-main/src/main/java/io/trino/operator/scalar/JsonToMapCast.java
core/trino-main/src/main/java/io/trino/operator/scalar/JsonToRowCast.java
core/trino-main/src/main/java/io/trino/sql/ir/optimizer/rule/SpecializeCastWithJsonParse.java
core/trino-main/src/main/java/io/trino/util/JsonUtil.java
core/trino-main/src/test/java/io/trino/type/TestArrayOperators.java
core/trino-main/src/test/java/io/trino/type/TestJsonOperators.java
core/trino-main/src/test/java/io/trino/type/TestMapOperators.java
lib/trino-plugin-toolkit/src/main/java/io/trino/plugin/base/util/JsonTypeUtil.java
lib/trino-plugin-toolkit/src/test/java/io/trino/plugin/base/util/TestJsonTypeUtil.java

coderabbitai · 2026-03-26T20:55:01Z

+        // array with number elements (converted to string)
+        assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))")
+                .binding("a", "JSON '[128, 12345678901234567890]'"))
+                .hasType(new ArrayType(VARCHAR))
+                .matches("CAST(ARRAY['128', '12345678901234567890'] AS ARRAY(VARCHAR))");
+
+        // array with boolean elements
+        assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))")
+                .binding("a", "JSON '[true, false]'"))
+                .hasType(new ArrayType(VARCHAR))
+                .matches("CAST(ARRAY['true', 'false'] AS ARRAY(VARCHAR))");
+
+        // array with empty string
+        assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))")
+                .binding("a", "JSON '[\"test\", \"\", \"data\"]'"))
+                .hasType(new ArrayType(VARCHAR))
+                .matches("CAST(ARRAY['test', '', 'data'] AS ARRAY(VARCHAR))");
+
+        // array with various types including scientific notation and string "null"
+        assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))")
+                .binding("a", "JSON '[true, false, 12, 12.3, 1.23E1, \"puppies\", \"kittens\", \"null\", null]'"))
+                .hasType(new ArrayType(VARCHAR))
+                .matches("CAST(ARRAY['true', 'false', '12', '12.3', '12.3', 'puppies', 'kittens', 'null', null] AS ARRAY(VARCHAR))");


🛠️ Refactor suggestion | 🟠 Major

Add the actual long-decimal JSON -> ARRAY(VARCHAR) regression case.

Lines 1143-1163 only cover integers, short decimals, and exponent normalization, so they still pass if the original >16-significant-digit decimal regression comes back.

➕ Suggested regression assertion

assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))") .binding("a", "JSON '[true, false, 12, 12.3, 1.23E1, \"puppies\", \"kittens\", \"null\", null]'")) .hasType(new ArrayType(VARCHAR)) .matches("CAST(ARRAY['true', 'false', '12', '12.3', '12.3', 'puppies', 'kittens', 'null', null] AS ARRAY(VARCHAR))"); + + assertThat(assertions.expression("cast(a as ARRAY(VARCHAR))") + .binding("a", "JSON '[123456789012345678901234567890.12345678]'")) + .hasType(new ArrayType(VARCHAR)) + .matches("CAST(ARRAY['123456789012345678901234567890.12345678'] AS ARRAY(VARCHAR))");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/test/java/io/trino/type/TestArrayOperators.java` around lines 1141 - 1163, The test block in TestArrayOperators.java exercises JSON->ARRAY(VARCHAR) for integers, booleans, empty strings and short/normalized decimals but misses a regression case for long-decimal values (>16 significant digits); add another assertion in the same group using assertions.expression("cast(a as ARRAY(VARCHAR))").binding("a", "JSON '[...]'") that includes a long decimal literal (e.g. a numeric value with >16 significant digits) and assert .hasType(new ArrayType(VARCHAR)).matches(...) where the expected CAST(ARRAY[... ] AS ARRAY(VARCHAR)) contains the long decimal preserved exactly as a string (not truncated/normalized), so the test fails if precision is lost. Ensure the new assertion follows the same style and placement as the other cases in the block.

dain

I'm curious/concerned about the performance impact of cycling through BigDecimal for this code. My experience with BigDecimal is that it's literally the performance cliff and should be avoided at all costs. So I am pretty concerned about us negatively affecting everyone using JSON. Can we add some benchmarks or something, or is there just something I'm missing here? Maybe the code that you've changed isn't used by a normal path or something like that.

dain · 2026-03-26T20:52:07Z

    private static final JsonMapper SORTED_MAPPER = new JsonMapperProvider().get()
            .rebuild()
            .configure(ORDER_MAP_ENTRIES_BY_KEYS, true)
+            .configure(USE_BIG_DECIMAL_FOR_FLOATS, true)


What's the performance impact of this?

I don't know. However, I value our correctness first approach. This is correct thing to do.

The whole SORTED_MAPPER.readValue(parser, Object.class) looks like optimization candidate. I think we could e.g. operate on TreeNode or something, to avoid unnecessary conversions, or work on top of json parser tokens. There seems to be a lot of optimization potential here.

I think this is worth figuring out. This could be a big regression for our users. JSON is one of the most common formats in practice, and it is a shame none of the standard benchmarks exercise it.

See my top level comment on the PR. I expect this should not have impact, and may not be necessary at all.

benchmark added, see results:

Fix numeric precision loss in JSON parsing #28916 (comment)

(new PR per #28882 (comment) )

dain · 2026-03-26T20:53:59Z

-            // An alternative is calling getLongValue and then BigintOperators.castToVarchar.
-            // It doesn't work as well because it can result in overflow and underflow exceptions for large integral numbers.
-            case VALUE_NUMBER_INT -> utf8Slice(parser.getText());
+            case VALUE_NUMBER_INT, VALUE_NUMBER_FLOAT -> utf8Slice(parser.getDecimalValue().toString());


Does this actually have to go through Git DoubleValue or can you just fetch the raw text out of the JsonParser and avoid the big decimal code entirelly?

findepi · 2026-03-27T08:57:42Z

CI

Cassandra flaky tests: Query timed out after PT30S #28896

losipiuk

I like the code but I agree some benchmarking would be nice to have

dain · 2026-03-27T19:39:22Z

I did more research on this with ChatGPT. USE_BIG_DECIMAL_FOR_FLOATS should only impact code that is using a generic path where you ask for the value as an Object or Number. For the code paths modified in this PR, I don't see any obvious usages of method like that. If so, is it possible to not set this?

The second concern is round triping through big decimal in the to VARCHAR cast. I think this comes down to a design choice. If we want to canonicalize the value, then we need to use the code as it is in the PR. If we instead are ok with (or want) the value preserved as it was written in the original JSON then we should use getText. Note in both cases, the text will be a valid numeric value because Jackson enforces this in the parser. The down side of canonicalizing is the performance penality for BigDecimal parsing, and object construction, compared to simple text copy in the getText version.

I feel strongly that we should not canonicalize here and should instead preserve the original value as is. I think that is what end users want in the case of converting a numeric value to text.

findepi · 2026-03-30T07:40:04Z

I did more research on this with ChatGPT. USE_BIG_DECIMAL_FOR_FLOATS should only impact code that is using a generic path where you ask for the value as an Object or Number. For the code paths modified in this PR, I don't see any obvious usages of method like that. If so, is it possible to not set this?

It is suggesting that USE_BIG_DECIMAL_FOR_FLOATS is unnecessary.
Whereas it is the fix for lossy json_parse function (#28867).

The second concern is round triping through big decimal in the to VARCHAR cast. I think this comes down to a design choice. If we want to canonicalize the value, then we need to use the code as it is in the PR. If we instead are ok with (or want) the value preserved as it was written in the original JSON then we should use getText. Note in both cases, the text will be a valid numeric value because Jackson enforces this in the parser. The down side of canonicalizing is the performance penality for BigDecimal parsing, and object construction, compared to simple text copy in the getText version.

The downside is potentially unstable query results: two different code paths that are supposed to be equivalent will now produce different results. Details in #28882 (comment)

findepi · 2026-03-30T07:43:39Z

I extracted the commits that are not being disputed into separate PR:

JsonTypeUtil code cleanup and json parsing test improvements #28915

findepi · 2026-03-30T07:47:33Z

The meaty changes are two things that are related, but can be reviewed and discussed independently. I will split the PR into two.

findepi · 2026-03-30T07:52:43Z

Created

let's start review there when the cleanup (#28915) is merged. I will tag you as reviewers there when ready.

cla-bot bot added the cla-signed label Mar 26, 2026

findepi mentioned this pull request Mar 26, 2026

Add support for cast between NUMBER and JSON #28868

Merged

findepi commented Mar 26, 2026

View reviewed changes

findepi added 5 commits March 26, 2026 12:52

Add more tests for CAST JSON to/from ARRAY

803aff5

Add unit test for JsonTypeUtil.jsonParse

d8134ab

Import utf8Slice statically

f8c0ada

Retain precision when casting JSON number to VARCHAR

3d467c5

Before the change, when casting a JSON number containing decimal point to VARCHAR, the number would be converted first to DOUBLE. It resulted in unnecessary loss of information.

findepi force-pushed the findepi/json-precision-loss branch from 2a3a0d6 to 924b465 Compare March 26, 2026 11:52

findepi added 2 commits March 26, 2026 13:09

Remove stray comment

4297e9a

Document SpecializeCastWithJsonParse

ad69d22

findepi force-pushed the findepi/json-precision-loss branch from 924b465 to 65591ea Compare March 26, 2026 12:12

findepi requested review from dain, losipiuk, martint and wendigo March 26, 2026 12:57

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

dain reviewed Mar 26, 2026

View reviewed changes

Remove unused ConnectorSession JSON function parameter

a2dd63c

findepi force-pushed the findepi/json-precision-loss branch from 65591ea to a2dd63c Compare March 27, 2026 08:36

findepi requested a review from dain March 27, 2026 08:36

empty: roll the dice 🎲

70032a6

losipiuk reviewed Mar 27, 2026

View reviewed changes

findepi mentioned this pull request Mar 30, 2026

JsonTypeUtil code cleanup and json parsing test improvements #28915

Merged

This was referenced Mar 30, 2026

Fix numeric precision loss in JSON parsing #28916

Merged

Retain precision when casting JSON number to VARCHAR #28917

Draft

findepi closed this Mar 30, 2026

findepi deleted the findepi/json-precision-loss branch March 30, 2026 07:53

Conversation

findepi commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dain commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026

Walkthrough

Assessment against linked issues

Out-of-scope changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

findepi commented Mar 27, 2026

Uh oh!

losipiuk left a comment

Choose a reason for hiding this comment

Uh oh!

dain commented Mar 27, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

findepi commented Mar 26, 2026 •

edited

Loading