Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/blocks/{round} endpoint returning malformed logs (non-ASCII data only) #5807

Closed
joe-p opened this issue Oct 26, 2023 · 5 comments · Fixed by #5865
Closed

/blocks/{round} endpoint returning malformed logs (non-ASCII data only) #5807

joe-p opened this issue Oct 26, 2023 · 5 comments · Fixed by #5865
Assignees
Labels
bug Something isn't working

Comments

@joe-p
Copy link
Contributor

joe-p commented Oct 26, 2023

Subject of the issue

The logs in /blocks/{round} endpoint return malformed data, but only when the log contains non-ascii data.

Your environment

Local network in Algokit docker container. Version 3.19.0.stable [rel/stable] (commit #7037cb3b)

Steps to reproduce

  1. Submit appcall with non-ascii logs
  2. Call blocks/{round} endpoint and parse the logs directly from the block

Expected behaviour

All logged data is present and can be decoded client-side.

Actual behaviour

The data is malformed and often (but not always) missing bytes and/or contains the wrong bytes.

Notes

A repo reproducing this bug can be seen here: https://github.com/joe-p/block-log-debug/tree/main

So far I have only confirmed that this is a bug with logs, but I have a suspicion that it might affect other []string fields in the block like app arguments. I will add some more testcases to the above repo to verify if that is the case.

This only affects the /blocks/ endpoint. There are many tools, such as the ATC and algokit, that rely on logs to parse non-ascii data (ie. ABI return types) from the transaction response and they all work just fine.

This is currently blocking progress being made on an event parsing mechanism


Proposed Solution

The underlying causes of this issue in the /v2/blocks/{round} endpoint are deep rooted and difficult to change. Because of this, a better solution would be to create a new /v2/blocks/{round}/logs endpoint that returns the logs for a given round (similar to the current /v2/blocks/{round}/txids endpoint).

The proposed schema for this new endpoint is:

interface Response {
  logs: Array<{
    // Txid of the top-level transaction where the logs were produced
    txid: string,
    // Application ID of the app which produced the logs
    appId: number,
    // Array of log strings produced by the application. In JSON, this will be a base64 encoded string,
    // in msgpack this will be properly tagged as a binary string
    logs: string[]
  }>
}

It should be straightforward to create a response following this schema from a block.

@joe-p joe-p added the bug Something isn't working label Oct 26, 2023
@jasonpaulos
Copy link
Member

Thanks for creating this issue and including the code to reproduce.

Problem

There are two reasons you're seeing invalid data, and I'll discuss them below.

JSON

When requesting the block as JSON, algod does not base64 encode log values. As a result the raw binary log string gets encoded as a JSON string, meaning some characters are not able to be represented and will be replaced by the unicode replacement character, � or \ufffd. Once replaced with this character, it's not possible to recover the original value.

Msgpack

When requesting the block as msgpack, which is a binary format, there's no need to base64 the log values, so the encoded responses does not lose any data. However, in this encoding log strings are tagged with the msgpack "string" type instead of "binary." When you decode the msgpack values in your javascript client, the msgp library you use sees the "string" type and attempts to decode the log value into a string, which isn't possible for all characters and results in the same problematic representation with unicode replacement characters.

Example

For context, here's an example of a binary string with some invalid unicode characters being converted to a UTF-8 string.

new Uint8Array([61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50, 176, 184, 221, 66, 188, 171, 36, 135, 121])
=\ufffdv\ufffd'\ufffd+D\ufffdtiTy.z\ufffd\ufffd\ufffd\u000f\ufffd\ufffd\u00132\ufffd\ufffd\ufffdB\ufffd\ufffd$\ufffdy

or

=�v�'�+D�tiTy.z���\u000f��\u00132���B��$�y

What we can do

On the algod side

Due to backwards compatibility concerns and the fact that msgpack is the canonical encoding for Algorand blocks, unfortunately it would be extremely difficult to modify the msgpack encoding.

We might be able to change the JSON response, since it's not supposed to be canonical, but I need to do more research and think about this more.

On the client side

The msgpack spec does indicate that decoding clients should provide a way to get the raw bytes for strings for exactly this reason, but I've yet to see a javascript msgpack client which offers this. If you can find a client that does this, either in javascript or another language you can use, that would make the msgpack response usable. (Go is the only language which I'm 100% certain allows this, as it does not require strings to be UTF-8, which is why this works for go-algorand.)

@joe-p
Copy link
Contributor Author

joe-p commented Oct 27, 2023

meaning some characters are not able to be represented and will be replaced by the unicode replacement character, � or \ufffd.

Perhaps I'm misunderstanding unicode, but wouldn't this still result in the same amount of bytes, just incorrect bytes. The behavior observed here is that it's not just the wrong bytes but the wrong amount of bytes.

@jannotti
Copy link
Contributor

jannotti commented Oct 27, 2023 via email

@jasonpaulos
Copy link
Member

I just updated the issue with a proposed solution after talking with @joe-p offline

@gmalouf
Copy link
Contributor

gmalouf commented May 14, 2024

closed by #5865

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants