Skip to content

Fix numeric precision loss in JSON parsing #28916

Merged
findepi merged 3 commits intotrinodb:masterfrom
findepi:findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5
Apr 2, 2026
Merged

Fix numeric precision loss in JSON parsing #28916
findepi merged 3 commits intotrinodb:masterfrom
findepi:findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Mar 30, 2026

release notes

General
* Fix incorrect result when using `json_parse` or JSON type constructor
  and document contains numbers with decimal point with more than 16 significant digits. #28867

MySQL, PostgreSQL, Mongo, Pinot, SingleStore
* Fix incorrect result when reading JSON column and document contains numbers
  with decimal point with more than 16 significant digits. #28867

@cla-bot cla-bot bot added the cla-signed label Mar 30, 2026
@findepi findepi marked this pull request as draft March 30, 2026 07:50
@findepi findepi force-pushed the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch from e4fb4d1 to faddc7d Compare March 30, 2026 09:04
@findepi findepi requested review from dain, losipiuk and wendigo March 30, 2026 09:05
@findepi findepi marked this pull request as ready for review March 30, 2026 09:06
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Mar 30, 2026

Added benchmark. Results visualized here:

https://jmh.morethan.io/?sources=https://gist.githubusercontent.com/findepi/dae686271a067ff54df12f48c8b78df2/raw/ff63adfbfdec7fe00ea569d0ea700c74bdff6c0d/BenchmarkJsonTypeUtil.01.before.json,https://gist.githubusercontent.com/findepi/dae686271a067ff54df12f48c8b78df2/raw/ff63adfbfdec7fe00ea569d0ea700c74bdff6c0d/BenchmarkJsonTypeUtil.02.after.json

Text results

Before

Benchmark                                               (jsonType)  Mode  Cnt        Score        Error  Units
BenchmarkJsonTypeUtil.benchmarkJsonParse             SCALAR_STRING  avgt   30      358.610 ±      2.661  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse             SCALAR_NUMBER  avgt   30      359.633 ±      3.628  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               ARRAY_SMALL  avgt   30      812.243 ±     49.913  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               ARRAY_LARGE  avgt   30     5059.240 ±    106.653  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse              OBJECT_SMALL  avgt   30     1268.128 ±     25.858  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse              OBJECT_LARGE  avgt   30    12193.597 ±    115.073  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse            NESTED_SHALLOW  avgt   30     1645.495 ±     10.003  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               NESTED_DEEP  avgt   30     3197.938 ±     23.660  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse  HIGHLY_NESTED_REAL_WORLD  avgt   30  8618907.378 ± 208904.652  ns/op

After

Benchmark                                               (jsonType)  Mode  Cnt        Score       Error  Units
BenchmarkJsonTypeUtil.benchmarkJsonParse             SCALAR_STRING  avgt   30      369.704 ±     2.229  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse             SCALAR_NUMBER  avgt   30      354.668 ±     3.448  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               ARRAY_SMALL  avgt   30      738.836 ±    19.273  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               ARRAY_LARGE  avgt   30     5497.927 ±   172.197  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse              OBJECT_SMALL  avgt   30     1210.911 ±    51.861  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse              OBJECT_LARGE  avgt   30    12326.006 ±   147.163  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse            NESTED_SHALLOW  avgt   30     1677.369 ±    45.527  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse               NESTED_DEEP  avgt   30     3031.450 ±    75.591  ns/op
BenchmarkJsonTypeUtil.benchmarkJsonParse  HIGHLY_NESTED_REAL_WORLD  avgt   30  8602561.551 ± 33265.020  ns/op

cc @dain @losipiuk - you asked for benchmark results

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Mar 30, 2026

I think these results address @dain 's concern (#28882 (comment)).
We don't get performance cliff here.

@findepi findepi force-pushed the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch 4 times, most recently from 88d88a4 to 5992621 Compare March 30, 2026 16:26
private static final JsonMapper SORTED_MAPPER = new JsonMapperProvider().get()
.rebuild()
.configure(ORDER_MAP_ENTRIES_BY_KEYS, true)
.configure(USE_BIG_DECIMAL_FOR_FLOATS, true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't backward compatible, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the tests it is definitely a behavior change. We could call this a fix or a backwards incompatible change. I think of it as a fix, but I think it makes sense to mark this as a backwards incompatible change in the release notes so we properly warn users.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should call it out, but fix itself is a correct thing

@dain
Copy link
Copy Markdown
Member

dain commented Mar 30, 2026

This is really interesting behavior. So in Jackson, if I don't have this something like getText returns values that have been round-tripped through double?

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Mar 30, 2026

So in Jackson, if I don't have this something like getText returns values that have been round-tripped through double?

No. I believe getText works as expected.

However, in Trino, most JSON values come from JsonTypeUtil.jsonParse, which "normalizes" JSON values by parsing and serializing. This is where numbers with decimal points are round-tripped through double.

@findepi findepi force-pushed the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch from 5992621 to a0b1633 Compare March 31, 2026 07:11
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Mar 31, 2026

( just rebased after #28932 merged, no other changes )

@dain
Copy link
Copy Markdown
Member

dain commented Mar 31, 2026

So in Jackson, if I don't have this something like getText returns values that have been round-tripped through double?

No. I believe getText works as expected.

However, in Trino, most JSON values come from JsonTypeUtil.jsonParse, which "normalizes" JSON values by parsing and serializing. This is where numbers with decimal points are round-tripped through double.

I think I brought this up elsewhere. I think for json cast to string, we should just preserve the number as is without normalizing. I think this is really want users want, and it happens to be cheaper.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Apr 1, 2026

I think for json cast to string, we should just preserve the number as is without normalizing.

cast to string is not being changed here. It's covered by

however, this PR can be viewed as prerequisite.
On current master all json numbers are normalized and lossy (round-trip thru double), and this PR fixes this.

Copy link
Copy Markdown
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but drop the empty commit at the end

@findepi findepi force-pushed the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch from 9a09c50 to 01dc0d0 Compare April 1, 2026 18:19
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Apr 1, 2026

Rebased to avoid logical merge conflicts after this merged

The general purpose `jsonParse` utility was lossy when it comes to
numbers containing a decimal point.

This affected `json_parse` SQL function, `JSON` SQL type constructor and
connectors which use `jsonParse` to canonicalize JSON representation on
remote data read (e.g. PostgreSQL).
@findepi findepi force-pushed the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch from 01dc0d0 to 0a4450b Compare April 1, 2026 18:25
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Apr 1, 2026

Resolved logical merge conflicts (tests only)

@findepi findepi merged commit 5747d32 into trinodb:master Apr 2, 2026
99 checks passed
@findepi findepi deleted the findepi/fix-numeric-precision-loss-in-json-parsing-34c7d5 branch April 2, 2026 07:52
@github-actions github-actions bot added this to the 481 milestone Apr 2, 2026
@ebyhr ebyhr mentioned this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Loss of decimal number precision in json_parse, JSON constructor and connectors

3 participants