Cannot load a gzipped JSON trace with multiple blocks #872

vmarkovtsev · 2024-08-29T21:33:37Z

Perfetto trace loader doesn't support "FEXTRA" multi-block gzip files. How to reproduce:

Install https://github.com/vinlyx/mgzip
Take any exiting JSON trace
Code

with open("trace.json") as fin:
    with mgzip.open("trace.json.gz", "wt", thread=8, blocksize=1 << 16) as fout:
        while buffer := fin.read(1 << 16):
            fout.write(buffer)

For example,

../trace_processor --httpd trace.json.gz
JSON trace file is incomplete

This will work:

gzip -d trace.json.gz
gzip trace.json
../trace_processor --httpd trace.json.gz

Why does this weird gzip format property matter to me? We, 100B-parameter base LLM trainers in PyTorch, deal with a few hundred megs of profile that require considerable time to compress every few minutes, so compressing them on 192 available CPU cores gives a considerable benefit.

The text was updated successfully, but these errors were encountered:

LalitMaganti · 2024-08-29T21:38:15Z

We use zlib to implement decompression. If there's a way to configure zlib to read these gzip streams, happy to add support. Otherwise, we would not be able to fix this as it's just a bit too niche to justify adding to trace processor.

We'd need someone externally to spend a bit of time to figure out tho how to configure zlib to read this though and doubly helpful if this can be contributed as it's a very self contained issue.

vmarkovtsev · 2024-08-30T08:11:01Z

Thanks @LalitMaganti
I will post my investigation intermediates and work in this issue, and hopefully engineer a PR sooner or later.
For now, I found this: https://stackoverflow.com/questions/65188890/what-gzip-extra-field-subfields-exist

vmarkovtsev · 2024-08-30T08:13:26Z

gzip decompression happens here: https://github.com/google/perfetto/blob/master/src/trace_processor/importers/gzip/gzip_trace_parser.cc

LalitMaganti · 2024-08-30T18:17:47Z

In practice, the use of zlib happens in https://github.com/google/perfetto/blob/master/src/trace_processor/util/gzip_utils.cc

vmarkovtsev · 2024-08-30T20:44:56Z

This is what I learned today:

The missing feature is support concatenated gzip streams, which is a part of RFC1952 section 2.2.

A gzip file consists of a series of "members" (compressed data sets).

node.js had to solve exactly same problem back in 2016: Node's zlib.ungzip does not support concatenated files nodejs/node#4306 zlib: support concatenated archives nodejs/node#5120
This is really a bugfix for RFC-compliant gzip support.

vmarkovtsev · 2024-08-30T20:58:48Z

Therefore, I would change code inspired by node.js

case Z_STREAM_END:
    return Result{ResultCode::kEof, out_size - z_stream_->avail_out};

to

case Z_STREAM_END:
    if (next two bytes at offset out_size - z_stream_->avail_out are magic 0x1f 0x8b) {
        // next stream detected
        return Result{ResultCode::kOk, out_size - z_stream_->avail_out};
    } else {
        return Result{ResultCode::kEof, out_size - z_stream_->avail_out};
    }

LalitMaganti · 2024-08-30T21:09:19Z

Approach seems good to me in that case, patches adding support for this are welcome: please follow https://perfetto.dev/docs/contributing/getting-started#contributing

vmarkovtsev · 2024-08-31T16:05:48Z

https://android-review.googlesource.com/c/platform/external/perfetto/+/3250056

LalitMaganti · 2024-08-31T16:38:04Z

https://r.android.com/3250057 should solve the edge case I point out in your change :)

primiano closed this as completed in ff57684 Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load a gzipped JSON trace with multiple blocks #872

Cannot load a gzipped JSON trace with multiple blocks #872

vmarkovtsev commented Aug 29, 2024

LalitMaganti commented Aug 29, 2024 •

edited

Loading

vmarkovtsev commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

LalitMaganti commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

LalitMaganti commented Aug 30, 2024

vmarkovtsev commented Aug 31, 2024

LalitMaganti commented Aug 31, 2024

Cannot load a gzipped JSON trace with multiple blocks #872

Cannot load a gzipped JSON trace with multiple blocks #872

Comments

vmarkovtsev commented Aug 29, 2024

LalitMaganti commented Aug 29, 2024 • edited Loading

vmarkovtsev commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

LalitMaganti commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

vmarkovtsev commented Aug 30, 2024

LalitMaganti commented Aug 30, 2024

vmarkovtsev commented Aug 31, 2024

LalitMaganti commented Aug 31, 2024

LalitMaganti commented Aug 29, 2024 •

edited

Loading