Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to avoid parsing entire file? #2135

Open
hvianna opened this issue Mar 2, 2024 · 12 comments
Open

Option to avoid parsing entire file? #2135

hvianna opened this issue Mar 2, 2024 · 12 comments

Comments

@hvianna
Copy link

hvianna commented Mar 2, 2024

Hello!

I'm having some issues when trying to retrieve the metadata of a large (15GB) video file with parseBlob() - disk usage skyrockets and it takes about 1 minute and 20 seconds to resolve with the metadata, so it looks like the it's parsing the entire file.

Sometimes the browser just crash or I get an out of memory error (having the dev tools open seems to make things worse / slower).

I tried using skipPostHeaders: true and duration: false, but it seems parseBlob() doesn't take an options object.

I'd appreciate any advice.

Kind regards.

@hvianna
Copy link
Author

hvianna commented Mar 2, 2024

Update:

fetchFromUrl( url, { skipPostHeaders: true } ) also doesn't seem to prevent it from reading the entire file until it returns the metadata. At least for this particular file, which is an .mkv with an AVC video track and two audio tracks (DTS and PCM).

@hvianna hvianna changed the title Avoid parsing entire file when using parseBlob()? Option to avoid parsing entire file? Mar 2, 2024
@Borewit Borewit transferred this issue from Borewit/music-metadata-browser Jul 10, 2024
@Borewit
Copy link
Owner

Borewit commented Jul 10, 2024

Does music-metadata v9.0.0 solve you issue?

The implementation of reading from Blobs have been changed from buffering to streaming.

@hvianna
Copy link
Author

hvianna commented Jul 11, 2024

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

image

Also, do I still need a buffer polyfill for the browser? If I remove it, I can only retrieve metadata from flac files, everything else gives me the error below:

image

I'm testing with the following code:

// for web files (URLs)
const response = await fetch( uri );
const metadata = await parseWebStream( response.body, response.headers.get('content-type'), { skipPostHeaders: true } );

// for FileSystem API files
const file = await handle.getFile();
const metadata = await parseBlob( file );

Thanks.

@pcbowers
Copy link

@Borewit Unless I'm missing something, it looks like parseWebStream is not being exported and thus cannot be used: https://github.com/Borewit/music-metadata/blob/v9.0.0/lib/index.ts#L11.

Furthermore, on use of this code:

const response = await fetch(`https://my/mp3/file`);
const metadata = await parseWebStream(response.body!, response.headers.get('content-type')!, {
  skipPostHeaders: true,
  includeChapters: true,
  skipCovers: true
});

I get this error:

TypeError [ERR_INVALID_ARG_VALUE]: The argument 'stream' must be a byte stream. Received ReadableStream { locked: false, state: 'readable', supportsBYOB: false }
    at new NodeError (node:internal/errors:405:5)
    at setupReadableStreamBYOBReader (node:internal/webstreams/readablestream:2155:11)
    at new ReadableStreamBYOBReader (node:internal/webstreams/readablestream:916:5)
    at ReadableStream.getReader (node:internal/webstreams/readablestream:352:12)
    at new WebStreamReader (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/peek-readable/lib/WebStreamReader.js:12:30)
    at Module.fromWebStream (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/strtok3/lib/core.js:25:36)
    at Module.parseWebStream (file:///home/pcbowers/projects/hono/node_modules/.pnpm/[email protected]/node_modules/music-metadata/lib/core.js:29:39)
    at Array.eval (/home/pcbowers/projects/hono/src/index.ts:12:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async getRequestListener.overrideGlobalObjects (file:///home/pcbowers/projects/hono/node_modules/.pnpm/@[email protected][email protected]/node_modules/@hono/vite-dev-server/dist/dev-server.js:69:32) {
  code: 'ERR_INVALID_ARG_VALUE'

I wish I knew more about it or else I would have debugged further! Leaving this here instead of on a new issue since I think fixing this would solve "avoid parsing entire file"

@Borewit
Copy link
Owner

Borewit commented Jul 12, 2024

Please put #2135 (comment) as a new issue @pcbowers , it is unrelated.

Moved #2135 (comment) to issue #2143

@Borewit
Copy link
Owner

Borewit commented Jul 12, 2024

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

That was bad, do you mind giving it a try with v9.0.1 @hvianna ?

@hvianna
Copy link
Author

hvianna commented Jul 12, 2024

I'm not sure yet, music-metadata 9.0.0 gives me this error when trying to parse mkv and webm files:

That was bad, do you mind giving it a try with v9.0.1 @hvianna ?

It works fine for flac and mp3, no more Buffer-related errors.

I'm still getting errors for webm and mkv, though.

using parseWebStream():

TypeError: Cannot read properties of undefined (reading 'docType')
    at MatroskaParser.parse (MatroskaParser.js:50:68)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3172:17)

using parseBlob():

Error: End-Of-Stream
    at ReadStreamTokenizer.readBuffer (ReadStreamTokenizer.js:44:19)
    at async MatroskaParser.readBuffer (MatroskaParser.js:221:9)
    at async MatroskaParser.parseContainer (MatroskaParser.js:151:39)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parseContainer (MatroskaParser.js:139:33)
    at async MatroskaParser.parse (MatroskaParser.js:49:26)
    at async parse (ParserFactory.js:57:5)
    at async retrieveMetadata (index.js:3175:17)

@Borewit
Copy link
Owner

Borewit commented Jul 12, 2024

Parse 'parseBlob()' is calling parseWebStream() internally, so it is weird you have inconsistent results.

export async function parseBlob(blob: Blob, options: IOptions = {}): Promise<IAudioMetadata> {
const fileInfo: strtok3.IFileInfo = {mimeType: blob.type, size: blob.size};
if (blob instanceof File) {
fileInfo.path = (blob as File).name;
}
return parseWebStream(blob.stream() as any, fileInfo, options);
}

Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/

@hvianna
Copy link
Author

hvianna commented Jul 12, 2024

Do you experience the same issues here?: https://audio-tag-analyzer.netlify.app/

Yes, same error. I tried with a few video formats (webm, mkv, mp4)..

image

Fileinfo of one of them:

General
Complete name                            : W:\DIY - Tips & Tricks - Tips in life.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 24.9 MiB
Duration                                 : 4 min 11 s
Overall bit rate                         : 828 kb/s
Frame rate                               : 30.000 FPS
Writing application                      : Lavf58.29.100

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : [email protected]
Format settings                          : CABAC / 5 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 5 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 4 min 11 s
Bit rate                                 : 692 kb/s
Width                                    : 576 pixels
Height                                   : 1 024 pixels
Display aspect ratio                     : 0.562
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.039
Stream size                              : 20.8 MiB (84%)
Title                                    : Twitter-vork muxer
Writing library                          : x264 core 164 r3095 baee400
Encoding settings                        : cabac=1 / ref=5 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=2 / psy=0 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=4 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / stitchable=1 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=infinite / keyint_min=30 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=28.0 / qcomp=0.60 / qpmin=10 / qpmax=69 / qpstep=4 / vbv_maxrate=2048 / vbv_bufsize=2048 / crf_max=0.0 / nal_hrd=none / filler=0 / ip_ratio=1.40 / aq=2:1.00
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 4 min 11 s
Bit rate mode                            : Constant
Bit rate                                 : 128 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 3.84 MiB (15%)
Title                                    : Twitter-vork muxer
Default                                  : Yes
Alternate group                          : 1

@Borewit
Copy link
Owner

Borewit commented Jul 12, 2024

I managed to get an end-of-stream exception as well, parsing an MP4 file.

Issue may be caused by https://github.com/Borewit/peek-readable/blob/master/lib/WebStreamReader.ts

Not something I can resolve quickly.

@hvianna
Copy link
Author

hvianna commented Jul 12, 2024

No problem, thanks for investigating this.

In the meantime, I'll keep testing it with more audio files. I love the fact that my bundle size has decreased around 100 kB with the new music-metadata, compared to the latest music-metadata-browser. Awesome job!

@hvianna
Copy link
Author

hvianna commented Jul 20, 2024

I did some testing with music-metadata v9.0.3 and this is what I got:

file size container audio streams time to resolve
2.3 GB mp4 aac 12 s
4.3 GB mkv ac3 + dts 24 s
15 GB mkv dts + pcm 80 s
17 GB mkv pcm 99 s

It still reads the entire file, even with { skipPostHeaders: true } in the options, or if I set fileInfo.size to a small value.

I'm not sure if this can be avoided at all, since I don't think you can skip to a random position in the stream (without reading all the data up to that point sequentially).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants