Skip to content

Conversation

@wendigo
Copy link
Contributor

@wendigo wendigo commented Oct 16, 2025

DataSize are serialized to exact bytes to ensure that aggregated values are exact and errors do not accumulate. For the human ingestion it is better to format to most succinct string.

Description

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Summary by Sourcery

Enable succinct, human-readable DataSize formatting in UI JSON responses for QueryInfo.

Enhancements:

  • Introduce DataSizeSerializer to conditionally output concise human-readable data sizes
  • Bind DataSizeSerializer as the JSON serializer for DataSize in ServerMainModule
  • Update UiQueryResource to build a custom JsonCodec with succinct DataSize formatting and return JSON

@cla-bot cla-bot bot added the cla-signed label Oct 16, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 16, 2025

Reviewer's Guide

This update adds a configurable Jackson serializer for DataSize to produce succinct, human-readable sizes and wires it into the server JSON binder, then updates the UI query endpoint to use a custom JsonCodec that enables this serializer when returning query info as JSON.

Sequence diagram for query info JSON serialization with succinct DataSize

sequenceDiagram
    participant Client
    participant UiQueryResource
    participant JsonCodec_QueryInfo
    participant DataSizeSerializer
    participant ObjectMapper
    Client->>UiQueryResource: GET /ui/query/{queryId}
    UiQueryResource->>JsonCodec_QueryInfo: toJson(queryInfo)
    JsonCodec_QueryInfo->>DataSizeSerializer: serialize DataSize fields (succinct enabled)
    DataSizeSerializer->>JsonCodec_QueryInfo: returns human-readable DataSize
    JsonCodec_QueryInfo->>UiQueryResource: returns JSON
    UiQueryResource->>Client: returns JSON response with human-readable DataSize
Loading

Class diagram for the new DataSizeSerializer and related changes

classDiagram
    class DataSizeSerializer {
        +SUCCINCT_DATA_SIZE_ENABLED : String
        +serialize(dataSize: DataSize, jsonGenerator: JsonGenerator, serializerProvider: SerializerProvider)
    }
    class JsonSerializer_DataSize
    DataSizeSerializer --|> JsonSerializer_DataSize
    class UiQueryResource {
        -queryInfoCodec: JsonCodec<QueryInfo>
        +UiQueryResource(objectMapper, dispatchManager, accessControl, sessionContextFactory)
        -buildQueryInfoCodec(objectMapper): JsonCodec<QueryInfo>
    }
    UiQueryResource ..> DataSizeSerializer : uses
    class JsonCodecFactory {
        +jsonCodec(type: Class<T>): JsonCodec<T>
        +prettyPrint()
    }
    UiQueryResource ..> JsonCodecFactory : uses
    class DataSize {
        +succinct(): DataSize
        +toString(): String
    }
    DataSizeSerializer ..> DataSize : serializes
    class ServerMainModule {
        +setup(Binder binder)
    }
    ServerMainModule ..> DataSizeSerializer : binds
    class JsonCodec_T {
        +toJson(value: T): String
    }
    class JsonCodec_QueryInfo
    UiQueryResource ..> JsonCodec_QueryInfo : uses
Loading

File-Level Changes

Change Details Files
Introduce custom DataSize JSON serializer with optional succinct formatting
  • Implemented DataSizeSerializer that checks a context attribute to choose between exact and succinct output
  • Defined SUCCINCT_DATA_SIZE_ENABLED flag for toggling formatting
  • Registered DataSizeSerializer in the JSON binder for DataSize type
core/trino-main/src/main/java/io/trino/server/DataSizeSerializer.java
core/trino-main/src/main/java/io/trino/server/ServerMainModule.java
Enable succinct DataSize serialization in UI query JSON responses
  • Injected ObjectMapper into UiQueryResource and initialized a JsonCodec with succinct DataSize enabled
  • Added buildQueryInfoCodec method to configure Jackson ContextAttributes and create a pretty-printing JsonCodec
  • Updated getQueryInfo endpoint to use queryInfoCodec.toJson(...) and set JSON media type
core/trino-main/src/main/java/io/trino/server/ui/UiQueryResource.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `core/trino-main/src/main/java/io/trino/server/DataSizeSerializer.java:29` </location>
<code_context>
+    public static final String SUCCINCT_DATA_SIZE_ENABLED = "dataSize.succinct.enabled";
+
+    @Override
+    public void serialize(DataSize dataSize, JsonGenerator jsonGenerator, SerializerProvider serializerProvider)
+            throws IOException
+    {
</code_context>

<issue_to_address>
**suggestion:** Consider null handling for DataSize in serializer.

Explicitly check for null DataSize and handle it appropriately to prevent NullPointerExceptions.
</issue_to_address>

### Comment 2
<location> `core/trino-main/src/main/java/io/trino/server/DataSizeSerializer.java:32-34` </location>
<code_context>
+    public void serialize(DataSize dataSize, JsonGenerator jsonGenerator, SerializerProvider serializerProvider)
+            throws IOException
+    {
+        if (serializerProvider.getAttribute(SUCCINCT_DATA_SIZE_ENABLED) == Boolean.TRUE) {
+            jsonGenerator.writeString(dataSize.succinct().toString());
+            return;
</code_context>

<issue_to_address>
**suggestion:** Type comparison for attribute may be too strict.

Using 'Boolean.TRUE.equals(...)' is safer, as it correctly handles boxed Booleans and avoids issues with strict '==' comparison.

```suggestion
        if (Boolean.TRUE.equals(serializerProvider.getAttribute(SUCCINCT_DATA_SIZE_ENABLED))) {
            jsonGenerator.writeString(dataSize.succinct().toString());
            return;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@wendigo wendigo force-pushed the serafin/data-size-handling branch 2 times, most recently from 1fd23d4 to 17bcbf1 Compare October 16, 2025 19:45
@github-actions github-actions bot added the ui Web UI label Oct 16, 2025
@wendigo wendigo changed the title Make query JSON data sizes human readable Make query JSON data sizes human readable & file downloadable Oct 16, 2025
@wendigo wendigo force-pushed the serafin/data-size-handling branch 2 times, most recently from 737625f to de68b0d Compare October 16, 2025 20:33
@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

I've added digest pruning so that query json will be smaller :)

@wendigo wendigo requested a review from electrum October 16, 2025 20:34
@martint
Copy link
Member

martint commented Oct 16, 2025

The query json is not meant to be for human consumption.

@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

@martint but it is used by humans :) (at least by some of us)

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % UI changes

@raunaqmorarka
Copy link
Member

The query json is not meant to be for human consumption.

We use it all the time to debug performance issues. The fields removed here don't make the json any less consume-able by machines and bring down the size of the json pulled by the UI or any other user.

@wendigo wendigo force-pushed the serafin/data-size-handling branch from bf66e16 to 4ae62b8 Compare October 16, 2025 21:25
@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

UI changes dropped (will revisit later with better proposal)

@raunaqmorarka raunaqmorarka changed the title Make query JSON data sizes human readable & file downloadable Make query JSON more compact Oct 16, 2025
@wendigo wendigo force-pushed the serafin/data-size-handling branch from 4ae62b8 to 19dfed6 Compare October 16, 2025 21:33
@martint
Copy link
Member

martint commented Oct 16, 2025

It's not just removing the t-digests. It's changing the encoding of data sizes to use the succinct representation, which is lossy.

@raunaqmorarka
Copy link
Member

It's not just removing the t-digests. It's changing the encoding of data sizes to use the succinct representation, which is lossy.

The succinct representation should only be used on the final DataSize values that go out from the coordinator REST endpoint. I think we want even the UI to show succinct representation as that is easier to read.

@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

@martint it is lossy but only applies to the query json download. Not to every other serialization path. These are final values, not ones before aggregation.

For query json order of magnitude matters. Not down to a single byte values. Eventually we could make this configurable using server config but I doubt that this would be used.

@martint
Copy link
Member

martint commented Oct 16, 2025

The UI should be the one to decide how to render the data sizes. For example, if it's trying to draw a chart, it may need to normalize to a different unit that the most succinct one for one metric.

but only applies to the query json download.

That's not what I see in the code. It seems to apply to all requests to /ui/api/query

Comment on lines +174 to +187
// Do not output @class property for metric types
mapper.addMixIn(Metric.class, DropTypeInfo.class);
// Do not output @type property for OperatorInfo
mapper.addMixIn(OperatorInfo.class, DropTypeInfo.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brittle, and will get out of sync as the QueryInfo continues to evolve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a dump in a human-consumable format, we should consider introducing an explicit API for that instead of monkey-patching the representation produced by the programmatic API used by the UI.

@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

@martint only to the getQueryInfo where codec is used

@martint
Copy link
Member

martint commented Oct 16, 2025

Yes, sorry, that's what I meant: /ui/api/query/<id>

@wendigo
Copy link
Contributor Author

wendigo commented Oct 16, 2025

I narrowed it down only to the Query JSON tab in the UI

- Serialize data sizes using succinct form (only query json download)
- Do not output raw digest in query.json
- Do not serialize type information for metrics
- Do not serialize type information for operator infos
- Cleanup old prunning logic to reduce copying
@wendigo wendigo force-pushed the serafin/data-size-handling branch from 6e98bb4 to c3d385b Compare October 17, 2025 08:26
@wendigo wendigo changed the title Make query JSON more compact Make query JSON more compact for UI Oct 17, 2025
@wendigo wendigo merged commit 2b453a1 into master Oct 17, 2025
205 of 209 checks passed
@wendigo wendigo deleted the serafin/data-size-handling branch October 17, 2025 10:53
@github-actions github-actions bot added this to the 478 milestone Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants