Option to avoid parsing entire file? #2135

hvianna · 2024-03-02T20:58:44Z

Hello!

I'm having some issues when trying to retrieve the metadata of a large (15GB) video file with parseBlob() - disk usage skyrockets and it takes about 1 minute and 20 seconds to resolve with the metadata, so it looks like the it's parsing the entire file.

Sometimes the browser just crash or I get an out of memory error (having the dev tools open seems to make things worse / slower).

I tried using skipPostHeaders: true and duration: false, but it seems parseBlob() doesn't take an options object.

I'd appreciate any advice.

Kind regards.

The text was updated successfully, but these errors were encountered:

hvianna · 2024-03-02T23:07:26Z

Update:

fetchFromUrl( url, { skipPostHeaders: true } ) also doesn't seem to prevent it from reading the entire file until it returns the metadata. At least for this particular file, which is an .mkv with an AVC video track and two audio tracks (DTS and PCM).

Borewit · 2024-07-10T18:45:18Z

Does music-metadata v9.0.0 solve you issue?

The implementation of reading from Blobs have been changed from buffering to streaming.

hvianna · 2024-07-11T21:07:00Z

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

Also, do I still need a buffer polyfill for the browser? If I remove it, I can only retrieve metadata from flac files, everything else gives me the error below:

I'm testing with the following code:

// for web files (URLs)
const response = await fetch( uri );
const metadata = await parseWebStream( response.body, response.headers.get('content-type'), { skipPostHeaders: true } );

// for FileSystem API files
const file = await handle.getFile();
const metadata = await parseBlob( file );

Thanks.

pcbowers · 2024-07-12T04:14:37Z

@Borewit Unless I'm missing something, it looks like parseWebStream is not being exported and thus cannot be used: https://github.com/Borewit/music-metadata/blob/v9.0.0/lib/index.ts#L11.

Furthermore, on use of this code:

const response = await fetch(`https://my/mp3/file`);
const metadata = await parseWebStream(response.body!, response.headers.get('content-type')!, {
  skipPostHeaders: true,
  includeChapters: true,
  skipCovers: true
});

I get this error:

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'stream' must be a byte stream. Received ReadableStream { locked: false, state: 'readable', supportsBYOB: false }
    at new NodeError (node:internal/errors:405:5)
    at setupReadableStreamBYOBReader (node:internal/webstreams/readablestream:2155:11)
    at new ReadableStreamBYOBReader (node:internal/webstreams/readablestream:916:5)
    at ReadableStream.getReader (node:internal/webstreams/readablestream:352:12)
    at new WebStreamReader (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/peek-readable/lib/WebStreamReader.js:12:30)
    at Module.fromWebStream (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/strtok3/lib/core.js:25:36)
    at Module.parseWebStream (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/music-metadata/lib/core.js:29:39)
    at Array.eval (/home/pcbowers/projects/hono/src/index.ts:12:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async getRequestListener.overrideGlobalObjects (file:///home/pcbowers/projects/hono/node_modules/.pnpm/@[email protected][email protected]/node_modules/@hono/vite-dev-server/dist/dev-server.js:69:32) {
  code: 'ERR_INVALID_ARG_VALUE'

I wish I knew more about it or else I would have debugged further! Leaving this here instead of on a new issue since I think fixing this would solve "avoid parsing entire file"

Borewit · 2024-07-12T05:34:10Z

~~Please put #2135 (comment) as a new issue @pcbowers , it is unrelated.~~

Moved #2135 (comment) to issue #2143

Borewit · 2024-07-12T10:07:21Z

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

That was bad, do you mind giving it a try with v9.0.1 @hvianna ?

hvianna · 2024-07-12T17:45:40Z

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

That was bad, do you mind giving it a try with v9.0.1 @hvianna ?

It works fine for flac and mp3, no more Buffer-related errors.

I'm still getting errors for webm and mkv, though.

using parseWebStream():

TypeError: Cannot read properties of undefined (reading 'docType')
    at MatroskaParser.parse (MatroskaParser.js:50:68)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3172:17)

using parseBlob():

Error: End-Of-Stream
    at ReadStreamTokenizer.readBuffer (ReadStreamTokenizer.js:44:19)
    at async MatroskaParser.readBuffer (MatroskaParser.js:221:9)
    at async MatroskaParser.parseContainer (MatroskaParser.js:151:39)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parse (MatroskaParser.js:49:26)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3175:17)

Borewit · 2024-07-12T17:58:27Z

Parse 'parseBlob()' is calling parseWebStream() internally, so it is weird you have inconsistent results.

music-metadata/lib/core.ts

Lines 23 to 29 in d6c2755

    
           export async function parseBlob(blob: Blob, options: IOptions = {}): Promise<IAudioMetadata> { 
        
             const fileInfo: strtok3.IFileInfo = {mimeType: blob.type, size: blob.size}; 
        
             if (blob instanceof File) { 
        
               fileInfo.path = (blob as File).name; 
        
             } 
        
             return parseWebStream(blob.stream() as any, fileInfo, options); 
        
           }

Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/

hvianna · 2024-07-12T18:04:53Z

Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/

Yes, same error. I tried with a few video formats (webm, mkv, mp4)..

Fileinfo of one of them:

General
Complete name                            : W:\DIY - Tips & Tricks - Tips in life.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.9 MiB
Duration                                 : 4 min 11 s
Overall bit rate                         : 828 kb/s
Frame rate                               : 30.000 FPS
Writing application                      : Lavf58.29.100

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : [email protected]
Format settings                          : CABAC / 5 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 5 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 4 min 11 s
Bit rate                                 : 692 kb/s
Width                                    : 576 pixels
Height                                   : 1 024 pixels
Display aspect ratio                     : 0.562
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.039
Stream size                              : 20.8 MiB (84%)
Title                                    : Twitter-vork muxer
Writing library                          : x264 core 164 r3095 baee400
Encoding settings                        : cabac=1 / ref=5 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=2 / psy=0 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=4 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / stitchable=1 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=infinite / keyint_min=30 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=28.0 / qcomp=0.60 / qpmin=10 / qpmax=69 / qpstep=4 / vbv_maxrate=2048 / vbv_bufsize=2048 / crf_max=0.0 / nal_hrd=none / filler=0 / ip_ratio=1.40 / aq=2:1.00
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 4 min 11 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 3.84 MiB (15%)
Title                                    : Twitter-vork muxer
Default                                  : Yes
Alternate group                          : 1

Borewit · 2024-07-12T18:15:43Z

I managed to get an end-of-stream exception as well, parsing an MP4 file.

Issue may be caused by https://github.com/Borewit/peek-readable/blob/master/lib/WebStreamReader.ts

Not something I can resolve quickly.

hvianna · 2024-07-12T18:29:56Z

No problem, thanks for investigating this.

In the meantime, I'll keep testing it with more audio files. I love the fact that my bundle size has decreased around 100 kB with the new music-metadata, compared to the latest music-metadata-browser. Awesome job!

hvianna · 2024-07-20T15:10:36Z

I did some testing with music-metadata v9.0.3 and this is what I got:

file size	container	audio streams	time to resolve
2.3 GB	mp4	aac	12 s
4.3 GB	mkv	ac3 + dts	24 s
15 GB	mkv	dts + pcm	80 s
17 GB	mkv	pcm	99 s

It still reads the entire file, even with { skipPostHeaders: true } in the options, or if I set fileInfo.size to a small value.

I'm not sure if this can be avoided at all, since I don't think you can skip to a random position in the stream (without reading all the data up to that point sequentially).

hvianna changed the title ~~Avoid parsing entire file when using parseBlob()?~~ Option to avoid parsing entire file? Mar 2, 2024

Borewit transferred this issue from Borewit/music-metadata-browser Jul 10, 2024

Borewit mentioned this issue Jul 12, 2024

music-metadata 9.0.0 has still Buffer dependencies #2141

Closed

Borewit mentioned this issue Jul 12, 2024

parseWebStream not exported in Node.js entry point #2143

Closed

Borewit mentioned this issue Jul 13, 2024

Issue parsing Matroska files using browser Web Streams #2145

Closed

Borewit added the improvement label Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to avoid parsing entire file? #2135

Option to avoid parsing entire file? #2135

hvianna commented Mar 2, 2024 •

edited

Loading

hvianna commented Mar 2, 2024

Borewit commented Jul 10, 2024

hvianna commented Jul 11, 2024

pcbowers commented Jul 12, 2024

Borewit commented Jul 12, 2024 •

edited

Loading

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

hvianna commented Jul 20, 2024

Option to avoid parsing entire file? #2135

Option to avoid parsing entire file? #2135

Comments

hvianna commented Mar 2, 2024 • edited Loading

hvianna commented Mar 2, 2024

Borewit commented Jul 10, 2024

hvianna commented Jul 11, 2024

pcbowers commented Jul 12, 2024

Borewit commented Jul 12, 2024 • edited Loading

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

Borewit commented Jul 12, 2024

hvianna commented Jul 12, 2024

hvianna commented Jul 20, 2024

hvianna commented Mar 2, 2024 •

edited

Loading

Borewit commented Jul 12, 2024 •

edited

Loading