Retain precision when casting JSON number to VARCHAR by findepi · Pull Request #28917 · trinodb/trino

findepi · 2026-03-30T07:51:04Z

overview

Normalize JSON numeric values using Java BigDecmial in JSON to VARCHAR cast. Some examples:

1.34e2 -> 13.4
1.34567890123e1 -> 13.4567890123
1.34567890123e8 -> 134567890.123
1.34567890123e11 -> 134567890123
1.34567890123e12-> 1.34567890123E+12

0.000000000000000 -> 0.0
0e1000 -> 0.0
0e-1000 -> 0.0

1 -> 1
100000000000000000000000000000000000000000000000000000000000000000000e-68 -> 1.0
0.100000000000000 -> 0.1

release notes

General
* Improve precision when casting JSON numbers with decimal point to VARCHAR. #28881

findepi · 2026-03-30T11:14:55Z

Added Benchmark. Results follow.

Before

Benchmark                                               (jsonType)  (varcharLength)  Mode  Cnt     Score    Error  Units
BenchmarkJsonOperators.benchmarkCastToVarchar         STRING_SHORT            10000  avgt   30    57.470 ±  0.752  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         STRING_SHORT       2147483647  avgt   30    56.193 ±  1.297  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        STRING_MEDIUM            10000  avgt   30   176.703 ±  3.310  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        STRING_MEDIUM       2147483647  avgt   30   174.852 ±  2.693  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar          STRING_LONG            10000  avgt   30  1098.212 ± 20.769  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar          STRING_LONG       2147483647  avgt   30  1101.451 ± 24.898  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar  STRING_WITH_UNICODE            10000  avgt   30    91.809 ±  0.299  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar  STRING_WITH_UNICODE       2147483647  avgt   30    91.903 ±  0.293  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_INTEGER            10000  avgt   30    62.553 ±  0.780  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_INTEGER       2147483647  avgt   30    61.504 ±  0.534  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_DECIMAL            10000  avgt   30   235.296 ±  1.065  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_DECIMAL       2147483647  avgt   30   236.338 ±  0.889  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar    NUMBER_SCIENTIFIC            10000  avgt   30   178.933 ±  0.826  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar    NUMBER_SCIENTIFIC       2147483647  avgt   30   179.991 ±  1.060  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         BOOLEAN_TRUE            10000  avgt   30    38.901 ±  0.272  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         BOOLEAN_TRUE       2147483647  avgt   30    39.677 ±  1.132  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        BOOLEAN_FALSE            10000  avgt   30    41.334 ±  1.634  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        BOOLEAN_FALSE       2147483647  avgt   30    39.803 ±  0.112  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar                 NULL            10000  avgt   30    37.529 ±  0.102  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar                 NULL       2147483647  avgt   30    37.812 ±  0.358  ns/op

After

Benchmark                                               (jsonType)  (varcharLength)  Mode  Cnt     Score    Error  Units
BenchmarkJsonOperators.benchmarkCastToVarchar         STRING_SHORT            10000  avgt   30    60.485 ±  4.568  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         STRING_SHORT       2147483647  avgt   30    55.325 ±  0.272  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        STRING_MEDIUM            10000  avgt   30   174.716 ±  2.581  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        STRING_MEDIUM       2147483647  avgt   30   173.241 ±  2.266  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar          STRING_LONG            10000  avgt   30  1115.777 ± 25.443  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar          STRING_LONG       2147483647  avgt   30  1106.917 ± 22.895  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar  STRING_WITH_UNICODE            10000  avgt   30    91.915 ±  0.193  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar  STRING_WITH_UNICODE       2147483647  avgt   30    91.778 ±  0.419  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_INTEGER            10000  avgt   30    68.107 ±  0.756  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_INTEGER       2147483647  avgt   30    68.553 ±  0.301  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_DECIMAL            10000  avgt   30    95.879 ±  0.436  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar       NUMBER_DECIMAL       2147483647  avgt   30    96.344 ±  0.789  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar    NUMBER_SCIENTIFIC            10000  avgt   30    82.827 ±  0.582  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar    NUMBER_SCIENTIFIC       2147483647  avgt   30    82.897 ±  0.924  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         BOOLEAN_TRUE            10000  avgt   30    38.935 ±  0.156  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar         BOOLEAN_TRUE       2147483647  avgt   30    38.784 ±  0.201  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        BOOLEAN_FALSE            10000  avgt   30    42.150 ±  0.896  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar        BOOLEAN_FALSE       2147483647  avgt   30    39.383 ±  0.304  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar                 NULL            10000  avgt   30    37.661 ±  0.102  ns/op
BenchmarkJsonOperators.benchmarkCastToVarchar                 NULL       2147483647  avgt   30    37.562 ±  0.187  ns/op

https://jmh.morethan.io/?sources=https://gist.githubusercontent.com/findepi/3eab5435b181fe7df19ffb0a952daa95/raw/7114825186c5a172ef472b9724c87358af7c4873/castToVarchar.01.before.json,https://gist.githubusercontent.com/findepi/3eab5435b181fe7df19ffb0a952daa95/raw/7114825186c5a172ef472b9724c87358af7c4873/castToVarchar.02.after.json

findepi · 2026-03-30T11:17:14Z

For some reason the benchmark shows performance improvement for the affected case.

I think there results address @dain's concern (#28882 (comment)).

findepi · 2026-03-30T16:08:45Z

The current implementation suffers from the problem that cast(JSON '...' as ARRAY(VARCHAR)) yields different results than cast(json_parse('...') as ARRAY(VARCHAR)).

- add test cases with numbers with leading/trailing zeros. - verify that casting string -> JSON -> array(varchar) and an optimized path behave the same.

Before the change, when casting a JSON number containing decimal point to VARCHAR, the number would be converted first to DOUBLE. It resulted in unnecessary loss of information.

dain · 2026-03-30T18:21:18Z

IMO we should just retain the original text from the JSON. I don't see an upside to normalizing to the text, and the downside is:

we loose information this might be meaningful to the user. For example, they might be in an environment where exponentiated numbers are important. Or possible the trailing zeros imply precision of a measurement.
performance. Parsing and printing through BigDecimal is not cheap, and object heavy

The general purpose `jsonParse` utility was lossy when it comes to numbers containing a decimal point. This affected `json_parse` SQL function, `JSON` SQL type constructor and connectors which use `jsonParse` to canonicalize JSON representation on remote data read (e.g. PostgreSQL).

together with `Fix numeric precision loss in JSON parsing`, this works now

findepi · 2026-03-30T19:50:14Z

2. performance. Parsing and printing through BigDecimal is not cheap, and object heavy

Per benchmarks, this turned not to be an issue?
Can you maybe run them too and compare results?

I don't see an upside to normalizing to the text

The "upside" is that cast(JSON '...' as ARRAY(VARCHAR)) and cast(json_parse('...') as ARRAY(VARCHAR)) should yield same results. Can we agree this is expected behavior worth maintaining?

I found a bug in current implementation, which invalidated this assumption. However, combined with #28916 (cherry picked here to run CI), this works as expected, via decimals.

However, with case VALUE_NUMBER_FLOAT -> utf8Slice(parser.getText()) the cast(JSON '...' as ARRAY(VARCHAR)) = cast(json_parse('...') as ARRAY(VARCHAR)) no longer holds.

we loose information this might be meaningful to the user.

We at least agree that lossy cast to varchar is a problem, i.e. #28881 is a bug.
Going through doubles is definitely most lossy from all options considered.

cla-bot bot added the cla-signed label Mar 30, 2026

findepi marked this pull request as draft March 30, 2026 07:51

This was referenced Mar 30, 2026

Fix precision loss when dealing with JSON numbers containing decimal point #28882

Closed

Add support for cast between NUMBER and JSON #28868

Merged

findepi force-pushed the findepi/retain-precision-when-casting-json-number-to-varchar-806435 branch from d45a3c0 to ccc9ea0 Compare March 30, 2026 11:12

findepi requested review from dain, losipiuk and wendigo March 30, 2026 11:17

findepi marked this pull request as ready for review March 30, 2026 11:17

findepi force-pushed the findepi/retain-precision-when-casting-json-number-to-varchar-806435 branch 3 times, most recently from 6df9050 to 6c10312 Compare March 30, 2026 11:56

findepi marked this pull request as draft March 30, 2026 16:07

findepi added 4 commits March 30, 2026 18:26

Add more jsonParse tests

433b3c0

Expand tests from JSON to VARCHAR

d097cbe

- add test cases with numbers with leading/trailing zeros. - verify that casting string -> JSON -> array(varchar) and an optimized path behave the same.

Benchmark JsonOperators.castToVarchar

d0be5b7

Retain precision when casting JSON number to VARCHAR

14700b4

Before the change, when casting a JSON number containing decimal point to VARCHAR, the number would be converted first to DOUBLE. It resulted in unnecessary loss of information.

findepi force-pushed the findepi/retain-precision-when-casting-json-number-to-varchar-806435 branch from 6c10312 to 14700b4 Compare March 30, 2026 16:49

findepi added 2 commits March 30, 2026 21:44

fixup! Retain precision when casting JSON number to VARCHAR

f1e7fd1

together with `Fix numeric precision loss in JSON parsing`, this works now

This was referenced Mar 30, 2026

Fail on trailing content after JSON null in CAST #28932

Merged

Fix numeric precision loss in JSON parsing #28916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain precision when casting JSON number to VARCHAR#28917

Retain precision when casting JSON number to VARCHAR#28917
findepi wants to merge 6 commits intotrinodb:masterfrom
findepi:findepi/retain-precision-when-casting-json-number-to-varchar-806435

findepi commented Mar 30, 2026 •

edited by dain

Loading

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026 •

edited

Loading

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

dain commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

findepi commented Mar 30, 2026 • edited by dain Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

overview

related

release notes

Uh oh!

findepi commented Mar 30, 2026

Before

After

Uh oh!

findepi commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

dain commented Mar 30, 2026

Uh oh!

findepi commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

findepi commented Mar 30, 2026 •

edited by dain

Loading

findepi commented Mar 30, 2026 •

edited

Loading