Can we utilize multiple processors/GPU to fasten up rendering? #101

adityagupta1089 · 2021-05-15T18:07:11Z

We can divide the frames amongst the CPU/GPU cores and then combine them together.

pssoft7 · 2021-06-21T14:20:47Z

I'd like to vote up for this.

Ocawesome101 · 2022-02-09T15:27:31Z

This would be awesome. My main gripe with rendering using MIDIVisualizer is how slow it is, even on my reasonably fast laptop.

erickim555 · 2022-12-09T10:50:52Z

If I'm reading the code correctly, this is how MIDIVisualizer is generating+exporting the MIDI visualizations:

"[MIDIVisualizer issues OpenGL draw commands to draw a given frame] -> [OpenGL executes the draw commands, waits for frame to be fully rendered to the User] -> [MIDIVisualizer then exports the GUI's framebuffer contents to the exported video frame]".

Note that this is done sequentially, frame-by-frame.

So, refactoring to utilize GPU rendering could involve the following. You'd want to refactor the code to issue batched render requests to minimize CPU<->GPU data transfer. To illustrate, rather than asking the GPU to draw each frame one-at-a-time ("please draw frame 0. wait. receive frame 0's rendered pixels. please draw frame 1. wait. receive frame 1's rendered pixels. please draw frame 2..."), we'd want to ask the GPU to render N frames at a time, say N=64 frames at a time (N has to be small enough where we don't exceed the host's available GPU memory).

At first glance, seems like a fairly non-trivial refactor. Definitely do-able though!

Perhaps an easier alternative approach is to do a "divide-and-conquer" CPU-based approach:
(1) evenly split the input MIDI file into N chunks (ideally, dividing based on real-time, rather than dividing based on number of MIDI events. But as a first approximation doing the latter is probably fine for a first implementation).
(2) Call the render() function on each MIDI chunk to generate N output vid subfiles, then concat the N subfiles into the final output video file. Notably, each render() call should execute in its own separate thread, to take advantage of parallelism (eg multiple processors).

If you're interested, here's the entry point for exporting a MIDI file visualization from the CLI: https://github.com/kosua20/MIDIVisualizer/blob/master/src/rendering/Renderer.cpp#L1365

erickim555 · 2022-12-09T21:19:00Z

By checking my resource util, I found out that MIDIVisualizer is already using the GPU during rendering (likely because OpenGL is able to auto-utilize the GPU if one is available, neat!). Here's some measurements:

Interestingly, GPU utilization is quite low, ~30% GPU util. And, CPU util is quite low: ~6% for the MIDIVisualizer process. So, the bottleneck is elsewhere.

I'm willing to bet that the bottleneck is disk writing. If you look at the second image above, during rendering my system is writing ~3.5 MB/sec to the output video file, which is the max write speed of my drive (Samsung SSD 970 EVO Plus 1TB).

Based on the above, I wonder if the code is indeed being bottlenecked by disk writes? If so, I wonder if there's a way to improve pipelining so that we decouple disk writing (eg video encoding) and rendering. Since CPU/GPU util is low, it appears that rendering can outpace disk writing.

asl97 · 2022-12-12T00:34:35Z

I'm willing to bet that the bottleneck is disk writing. If you look at the second image above, during rendering my system is writing ~3.5 MB/sec to the output video file, which is the max write speed of my drive (Samsung SSD 970 EVO Plus 1TB).

First, let me say that the NVMe SSD can do way more than 3.5MB/s, base of your now seemingly delete comment, you were using MPEG4 which would be a sequenced write and that drive should be more than capable of doing 3000MB/s, if it really can only do 3.5MB/s, then something is really wrong with your system.

Even with a clean 100% separation of the render and encoding, the most you get out of that is just a 2x speed up

The limitation comes from the encoding of the video which isn't easy to multithread, adding threads to the encoding progress tend to reduce quality, causing artifacting, as the data for each frame depends on the frame before it

The screenshot pretty much says it all, it is limited by how fast a single core of your computer can encode it as shown as a flat line in the cpu usage, a simple test you could do is to start multiple recording process, An example is provided below base on your now deleted comment, you should find the time it takes to record scales very well and it writing an additional ~3.5MB/s per instances base on the specific command in your deleted example

examples

The formal should be around 3x slower than the later

example1.bat

./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video1.mp4  --format MPEG4

./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video2.mp4  --format MPEG4

./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video3.mp4  --format MPEG4

example2.bat

START /B ./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video1.mp4  --format MPEG4

START /B ./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video2.mp4  --format MPEG4

START /B ./MIDIVisualizer --midi 'C:\Users\Eric\Documents\REAPER Media\youtube_video_record_settings\youtube_video_record_settings_keyboard\youtube_video_record_settings_keyboard_bpm240.mid' --size 1920 1080 --export video3.mp4  --format MPEG4

As for spliting up a single midi and rendering them separately and merging them afterwards, the merging will take just as long as the encoding step because it will basically need to be reencoded to properly merge it together, not to mention the artifacting that comes with reencoding, if you really want to try it, there is the midicopy program that can do the spliting for you

As for PNG multithreading, where it is more likely to get write bottleneck as it's basically random write, it's already done 92f86f0

erickim555 · 2022-12-12T00:55:48Z

> First, let me say that the NVMe SSD can do way more than 3.5MB/s, base of your now seemingly delete comment, you were using MPEG4 which would be a sequenced write and that drive should be more than capable of doing 3000MB/s, if it really can only do 3.5MB/s, then something is really wrong with your system.

Ah yup, you're right, I was misreading the drive specs, >3000MBps is indeed my max write speed, not 3MBps. So not disk-write-bottlenecked, and it seems plausible from your info that video encoding is the bottleneck (as it's not able to effectively utilize all cores).

Thanks for the insights! Very helpful.

Regarding splitting MIDI + concat-ing the N video files: I'm not an expert with video codec formats, but it seems that it's possible to do an efficient concatenation (without any re-encoding required) for certain video formats (MPEG-2 seems to be one), but other video formats a re-encode is seemingly required (eg MPEG-4): https://stackoverflow.com/a/11175851

So, there could be a possible route forward with the split+merge approach for certain video formats. Not sure what the pro's and con's of each are yet format are though.

Regarding difficulty to multi-thread video encoding: I see your points. I think I'd have to dig deeper into video encoding implementations/algs (particularly for various popular video formats, eg mpeg-2 vs mpeg-4), and see what are the industry-standard high-perf encoding techniques these days. Maybe there's some new tricks/libraries that can greatly accelerate things? Or maybe we should be considering other video formats that are more performant?

Then, there's the option of doing GPU video encoding, which I'm not sure MIDIVisualizer is allowing right now. FFmpeg does allow GPU encoding, but it seems that the user would have to configure their system to enable FFmpeg+GPU-encoding: https://stackoverflow.com/questions/44510765/gpu-accelerated-video-processing-with-ffmpeg

At the end of the day, the current perf isn't a deal breaker. By tuning some export settings, I was able to get export time down quite a bit (eg for a 40sec MIDI clip, takes ~17 secs to export) with still acceptable quality. It's a fun little rabbit hole to go into though!

adityagupta1089 added the enhancement label May 15, 2021

kosua20 added the sorted label Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we utilize multiple processors/GPU to fasten up rendering? #101

Can we utilize multiple processors/GPU to fasten up rendering? #101

adityagupta1089 commented May 15, 2021 •

edited

Loading

pssoft7 commented Jun 21, 2021

Ocawesome101 commented Feb 9, 2022

erickim555 commented Dec 9, 2022 •

edited

Loading

erickim555 commented Dec 9, 2022

asl97 commented Dec 12, 2022

erickim555 commented Dec 12, 2022 •

edited

Loading

Can we utilize multiple processors/GPU to fasten up rendering? #101

Can we utilize multiple processors/GPU to fasten up rendering? #101

Comments

adityagupta1089 commented May 15, 2021 • edited Loading

pssoft7 commented Jun 21, 2021

Ocawesome101 commented Feb 9, 2022

erickim555 commented Dec 9, 2022 • edited Loading

erickim555 commented Dec 9, 2022

asl97 commented Dec 12, 2022

erickim555 commented Dec 12, 2022 • edited Loading

adityagupta1089 commented May 15, 2021 •

edited

Loading

erickim555 commented Dec 9, 2022 •

edited

Loading

erickim555 commented Dec 12, 2022 •

edited

Loading