Skip to content
This repository has been archived by the owner on Feb 22, 2024. It is now read-only.

RGB normalization for motion detection, and use of threshold #172

Closed
MV10 opened this issue Aug 23, 2020 · 31 comments
Closed

RGB normalization for motion detection, and use of threshold #172

MV10 opened this issue Aug 23, 2020 · 31 comments

Comments

@MV10
Copy link
Collaborator

MV10 commented Aug 23, 2020

Hi Ian, in one of our discussions in a recent PR, you said something to the effect that using the motion detection threshold value inside the pixel-level loop as well as outside might be confusing. The comment didn't stick with me for some reason and we got sidetracked into other topics. However, I think you were on to something. As you know, within the loop, the threshold is compared to the summed difference of the RGB values of individual pixels, but externally it represents the number of pixels which differ. Those are two wildly different concepts, and purely by coincidence, I ran a test where two very different images didn't register as different using the FrameDiffAnalyser algorithm. But before I go into that, let me show you what I was working on that led to this.

This one is pretty cool. RGB normalization calculates the proportion each color channel contributes to the color, rather than the intensity of each channel (which is what RGB normally signifies). I've read that this largely eliminates lighting differences -- and it seems to work reasonably well. It's simple to calculate (probably another great candidate for an OpenGL transform).

float r = f.Bytes[index];
float g = f.Bytes[index + 1];
float b = f.Bytes[index + 2];
float sum = r + g + b;
if (sum == 0) return;
f.Bytes[index] = (byte)((r / sum) * 255f);
f.Bytes[index + 1] = (byte)((g / sum) * 255f);
f.Bytes[index + 2] = (byte)((b / sum) * 255f);

(I'm not using MMALSharp for this yet, it's a stand alone program I wrote so I could play with the algorithms with static images for repeatability.)

Below, A and B are 1296x972 BMPs captured by my Pi (pointed at my office closet where there will be no unexpected motion). In order to simulate false motion detection due to lighting changes, I shined one of those insanely bright LED flashlights on the door. The black and white image C are all the pixels the current FrameDiffAnalyser algorithm would flag as different ... 179,316 pixels to be exact -- rather higher than our 130 threshold (which, to be fair, is optimized for 640x480, at least in terms of total diff count). In the second row of images, X and Y are the RGB normalized versions of A and B, and Z only shows 73 pixels as different!

(You may notice I accidentally left date/time overlays turned on, so those contribute a little bit to the differences. But one interesting effect of normalization -- which may turn out to be a problem, I don't know yet -- is that bright white colors average down to a neutral gray -- the timestamp overlay disappears in the normalized images.)

image

To test that a truly different image will register as motion, I set up two new stills (no annotation!), A and B below, and to my very great surprise the FrameDiffAnalyser algorithm only "sees" four different pixels! Apparently my hand is close enough to the (summed RGB) wall color that the threshold system fails completely. I've actually run this test quite a few times with different image pairs and different amounts of my arm in-frame and in different places, and when my arm is in front of the yellow wall, it fails this way consistently. (This also explains why my testing here in my office sometimes seemed to "lag" when I was intentionally triggering motion -- my hand was in-frame and probably still not being detected.)

At first I was excited, the normalized images resulted in a diff of more than 500 pixels -- but look at the Z bitmap which highlights where the differences are. It's just noise. So from that, I conclude the RGB sum/subtraction/diff process is not especially reliable.

image

The bad news is that I don't have anything in mind to fix this, but since my plan with this motion detection test program was to try new things, maybe I'll figure something out.

I noticed also that you have converters for color spaces, I suspect just looking for H-channel differences in something like HSV may work better.

I'm definitely open to suggestions, as I'm certain you've researched all of this far more than I have!

@MV10
Copy link
Collaborator Author

MV10 commented Aug 23, 2020

Hue distance thresholds and (very low) saturation thresholds look promising. (I guess RGB normalization is basically the HSV hue channel...)

image

@MV10
Copy link
Collaborator Author

MV10 commented Aug 25, 2020

While the image above seemed promising, and with further tuning I got even better results, for some reason (that I didn't bother digging into) it was really bad at detecting black. I wore a black T-shirt in one test and it didn't detect any part of that as a difference. So if you were ever attacked by professional ninjas, well, you were in big trouble. Worse, I found a bug in my HSV conversion, and fixing that didn't improve the Black T-Shirt Ninja issue, and it added lots of noise to the light-and-shadow test. So now I'm using the MMALSharp conversions, figuring I'll end up using those anyway.

However, HSV is still looking promising. With a totally different approach, I'm getting good, repeatable results now with a variety of test image pairs. The basic steps for the example below are to convert each pixel to HSV (I'm hoping MMAL can simply output in HSV?) then apply the following rules:

  1. If the pixel value-channel from both frames is below a minimum threshold, skip it.
  2. Calculate the differences in each of the HSV channels.
  3. If the hue difference is below a minimum threshold, skip it.
  4. Apply multipliers to each of the HSV channel diffs (each channel has a separate modifier).
  5. Add modified diffs, the pixel is different if the total exceeds a minimum confidence threshold.

image

I think combining this with a couple more steps could produce very solid results:

  1. Divide the image into smaller cells as my recent PR did.
  2. Store the pixel diff count per cell.
  3. Set a minimum pixel diff per cell ... below this, any diffs are ignored for the cell.
  4. Optionally require the minimum pixel diff count across a minimum number of cells.

Step 3 is a fast and easy improvement because it ignores "sparkles" and other cells with minor artifacts (whereas today those contribute to the overall diff count for the image). I think step 4 would help in my desire to implement an "ignore my small dog" mode without the heavier effort to carry out the cell proximity analysis I was considering earlier.

Finally, assuming all this stuff doesn't drag us down into 3FPS territory 😬 ... I suspect it'll be useful to require motion detection across some minimum number of frames before calling onDetect. But I'll wait to worry about that until I have a good setup watching somewhere outside where light and shadow make things extra tricky.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 27, 2020

I'm experimenting with adding this to PR #169.

@techyian
Copy link
Owner

Thanks Jon. I have been reading your updates and I'm finding your research interesting. I must admit, my own research around this didn't progress much further than RGB diff checking and I'm very much a newbie in this area. I think the stage you've got things to is impressive and I'll continue to read your feedback around HSV difference.

I'm hoping MMAL can simply output in HSV?

I'm not aware of this. Natively, MMAL will output each frame as YUV420 as this is the pixel format that the camera modules use themselves. Telling MMAL to output as RGB causes data to go through the image conversion block so there is a performance hit here. I'd be interested to hear how fast the HSV software conversion works in the MMALColor class. Eventually it might make sense to invest some time into integrating with OpenGL and doing these operations on the GPU but I've not looked at how straight forward this would be.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 27, 2020

I should have mentioned I understand summed RGB diff comparison is a completely common technique, and as far as I can see it's implemented correctly. I hope I didn't sound like I was criticizing too much. I'm a total newbie, too!

I have that last approach working locally in that PR branch, but I definitely need to take a step back and set up to record and process video file input for repeatable true-motion testing. It's a very interesting problem to me, but I don't think I'll have definite conclusions any time soon, so I'm going to stash those changes and leave that branch as-is.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 27, 2020

And now I'm running into something that is making me feel very stupid -- how to actually get the raw stream written to a file. I thought I could do something as simple as this, but output port configuration fails when I use OPAQUE and all the other video formats are lossy:

using (var capture = new VideoStreamCaptureHandler(rawPathname))
using (var encoder = new MMALVideoEncoder())
{
    var portCfg = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true);
    encoder.ConfigureOutputPort(portCfg, capture);
    cam.Camera.VideoPort.ConnectTo(encoder);

    await Task.Delay(2000);
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
    await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
}

Since that wasn't working, I wondered if I could use OPAQUE to read raw data from a stream, but as you know, that doesn't work either. I produced a few seconds of raw RGB data using raspivid (-r foo.raw -rf rgb switches) but I don't know how to read it, or whether it actually can be read.

Is this another edge case where I should just write something myself (which I think is probably easy), or am I just being obtuse and overlooking something obvious?

@techyian
Copy link
Owner

Don't worry about it, you're not being stupid. The OPAQUE format is a proprietary Broadcom format used internally within MMAL, it's essentially a pointer to the image data which makes transmission between components more efficient. If you want to record raw video frames the easiest thing to do is attach a splitter to the camera's video port and then attach a capture handler directly to one of the splitter's output ports, as we have been doing for motion detection in previous examples. When initialising the MMALPortConfig object, set the encodingType and pixelFormat parameters to a pixel format, such as MMALEncoding.RGB24.

Let me know if you're still struggling. This example should help you.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 27, 2020

Nice. Can I read with RGB24 in both slots, too?

@techyian
Copy link
Owner

Yep. If you look at the motion detection example in the wiki, you can see that we're getting the raw frames via the resizer component, but that's still using RGB24 for both parameters in the MMALPortConfig object.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 27, 2020

Thank you, I think I've gotten it working. I wanted to save through the resizer so the raw stream is the same as the motion detection processes receive (plus I write to ramdisk so space is limited).

Save:

using (var capture = new VideoStreamCaptureHandler(rawPathname))
using (var splitter = new MMALSplitterComponent())
using (var resizer = new MMALIspComponent())
{
    splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
    resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), capture);
    cam.Camera.VideoPort.ConnectTo(splitter);
    splitter.Outputs[0].ConnectTo(resizer);

    await Task.Delay(2000);
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
    await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
}

Read and output to h.264 ... and I guess the default quality is amazingly bad, I thought it was broken, 215MB of raw frames (no motion) compressed to just 26K, lol ... I need to make this MP4 to be sure it's all writing but I think it works:

using (var stream = File.OpenRead(rawPathname))
using (var input = new InputCaptureHandler(stream))
using (var splitter = new MMALSplitterComponent())
using (var output = new VideoStreamCaptureHandler(h264Pathname))
using (var encoder = new MMALVideoEncoder())
{
    splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true), null, input);

    var encoderCfg = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, width: 640, height: 480, framerate: 24, zeroCopy: true);
    encoder.ConfigureOutputPort<FileEncodeOutputPort>(0, encoderCfg, output);

    splitter.Outputs[0].ConnectTo(encoder);

    await standalone.ProcessAsync(splitter);
}

@MV10
Copy link
Collaborator Author

MV10 commented Aug 28, 2020

I'm a bit puzzled that my h.264 has significant artifacts despite setting the max values for quality and bitrate. It's not a deal breaker since that output is just for me to visualize what the code is doing but setting those didn't make any difference at all versus the defaults. I thought perhaps I should call RequestIFrame before processing starts, but that doesn't help either.

MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, width: 640, height: 480, framerate: 24, quality: 40, bitrate: MMALVideoEncoder.MaxBitrateLevel4);

Is this some side-effect of going from RGB24 back to I420? But it's really bad (see below).

Transcoding is trivial, and as I understand it ffmpeg -c copy won't change the video:

ffmpeg -framerate 24 -i /media/ramdisk/test.h264 -c copy /media/ramdisk/test.mp4

image

@techyian
Copy link
Owner

When using H.264 encoding, the quality parameter actually refers to Quantization. Lower values indicate a lower compression rate and allow a higher bitrate. I appreciate it's confusing as JPEG quality for stills is actually the opposite to this (high value means high quality!). To quote https://www.vcodex.com/an-overview-of-h264-advanced-video-coding/:

"Setting QP to a high value means that more coefficients are set to zero, resulting in high compression at the expense of poor decoded image quality. Setting QP to a low value means that more non-zero coefficients remain after quantization, resulting in better decoded image quality but lower compression."

You could try a lower number such as 10, you could go even lower if you wanted but you'll just have to play with the settings. You could also disable the Quantization parameter (leave as 0) which forces variable bitrate. With that said, I've come across this issue with artifacts showing on the video, I'm hoping that tweaking these settings will fix it for you though.

With your read example, you don't actually need the splitter component, you should be able to feed directly into a Video encoder component.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 28, 2020

Oops, I actually knew that but had forgotten. 10 is good, VBR is still pretty bad. Thanks for clarifying about the splitter, I had thought there was something special about it that allowed the RGB24/RGB24 input.

I don't know if there's general value in this yet, but I'm doing something kind of neat -- I broke down FrameDiffAnalyser into a FrameDiffBuffer class that manages the frame buffering, mask, and metrics (cells, stride, etc.) and abstracted the frame processing to an IFrameDiffAlgorithm -- sort of a plug-in model for comparing the two frames. If this runs quickly enough, it would be easy to abstract further into a video FX system (which is essentially what I'm building for this motion analysis). I'll post more about this when it's all working.

Thanks for all the help!

/// <summary>
/// Represents a frame-difference-based motion detection algorithm.
/// </summary>
public interface IFrameDiffAlgorithm
{
    /// <summary>
    /// Invoked after the buffer's <see cref="FrameDiffBuffer.TestFrame"/> is available
    /// for the first time and frame metrics have been collected. Allows the algorithm
    /// to modify the test frame, if necessary.
    /// </summary>
    /// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
    void FirstFrameCompleted(FrameDiffBuffer buffer);

    /// <summary>
    /// Invoked when <see cref="FrameDiffBuffer"/> has a full test frame and a
    /// new full comparison frame available.
    /// </summary>
    /// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
    void AnalyseFrames(FrameDiffBuffer buffer);

    /// <summary>
    /// Invoked when <see cref="FrameDiffBuffer"/> has been reset. The algorithm should also
    /// reset stateful data, if any.
    /// </summary>
    /// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
    void ResetAnalyser(FrameDiffBuffer buffer);
}

@MV10
Copy link
Collaborator Author

MV10 commented Aug 28, 2020

So once again I'm a bit stumped by the pipeline. I'm trying to do this:

RGB24 file > InputCaptureHandler > MotionAnalysis > video encoder > VideoStreamCaptureHandler > h264 file

I don't know what sort of component MotionAnalysis ought to be. I thought it would replace VideoStreamCaptureHandler, but I wasn't thinking about the fact that the video encoder is outputting h.264.

So it needs to be downstream of something (???) that outputs RGB24 ImageContext from the InputCaptureHandler, but it also needs to be capable of feeding back into the video encoder for h.264 encoding. (The idea being that MotionAnalysis will modify the frame data.) I think everything that can output ImageContext only outputs to capture handlers, and capture handlers can't feed an encoder's input port (as far as I can tell). The same is true of callback handlers, I think -- they only output to capture handlers, too.

I suppose if the pipeline can't support additional passes like this, I can always break this into two separate rounds of processing (set the video encoder to RGB24 in and out, put my analysis as an output capture handler, then circle back and convert to h.264).

Edit: This is what I meant -- using the resizer as a pass-through (doesn't work with the encoder, it doesn't like RGB24 for both input and output). It works but it still seems like I ought to be able to do the analysis then somehow output to the encoder in one shot. And because I'm reading raw and writing raw that 1GB ramdisk fills up fast!

using (var stream = File.OpenRead(rawPathname))
using (var input = new InputCaptureHandler(stream))
using (var analysis = new MotionAnalysisCaptureHandler(analysisPathname, motionConfig))
using (var resizer = new MMALIspComponent())
{
    var cfg = new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true);
    resizer.ConfigureInputPort(cfg, null, input);
    resizer.ConfigureOutputPort<FileEncodeOutputPort>(0, cfg, analysis);

    Console.WriteLine("Processing raw RGB24 file through vis filter...");
    await standalone.ProcessAsync(resizer);
}

@MV10
Copy link
Collaborator Author

MV10 commented Aug 28, 2020

And now that I have it working, I see another minor problem in the frame diff logic (just the basic buffering) -- if your usage doesn't involve disabling motion detection once it fires, the class will continue periodically updating the test frame, which means the test frame could end up including whatever is moving around. I still think it's good to update the test frame, but there should be a "cooldown" where the test frame isn't updated until no motion has been detected for awhile. I almost think motion detection should continue running when disable is called, and enable/disable only applies to triggering the Action.

Basically a note-to-myself, I guess ... I still want to focus on this pipeline thing. But it's pretty cool seeing an MP4 highlighting the stuff that triggered motion! I'm only doing the basic FrameDiffAnalyser at the moment, all pixels are output as grayscale except those that are considered changed.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 29, 2020

This is pretty crazy. Putting summed RGB diff testing into video form makes me realize it isn't triggering off anything we thought it was. The way I'm altering the video is to set unchanged pixels to quarter grayscale, and leave changed pixels at full normal color. When I move my entire body into frame it's only responding to shadows in the background. Now, this may be a quirk of this specific scene but it's rather dramatic. At no point does it actually detect me in the frame,

I'm enjoying messing around with this way too much.

image

@techyian
Copy link
Owner

😕 That is very bizarre. I'm sure I've done a similar exercise in the past where I drew a red outline around the detected object and it worked - although it's also feasible that I dreamt that! Please let me know if you figure out why. Again, I must apologise for not being able to assist much on the coding side of things, work has picked up a lot lately so I don't have much left in the tank to dedicate to the library.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 29, 2020

No problem at all! Do you think you might find some time for those PRs? No sweat if it's too much, I can wait, I'm sure this motion stuff will occupy me for quite some time (which also means I'll be blabbing here as I go). I'd greatly appreciate it, though, if you could explain whether I can do both steps in one shot in the pipeline somewhere, ultimately I'd like to stream the motion analysis realtime (a couple posts back from yesterday).

My wife reminds me we're entertaining friends today, so no more nerding out today. I dropped one of the MP4s on youtube though -- this variation separates the RGB threshold from the total-pixel-count threshold, and it also compares pixel diff in grayscale which reduces the noise a little, but ultimately I don't think summed RGB diff works. I will also go to Google and read about it again to be sure we have the algorithm right.

Anyway, enjoy (only 15 seconds):

https://youtu.be/9kGsfu_vX9s

@MV10
Copy link
Collaborator Author

MV10 commented Aug 29, 2020

Partygoers to arrive any moment, but I figured it out while driving back from the store.

It needs to compare the absolute value of the diff, this line is the problem:

if (rgb2 - rgb1 > threshold)

Still very noisy and has weird cutouts in fleshtone versus my yellow wall but this works far, far better!

if(Math.Abs(rgb2 - rgb1) > threshold)

@techyian
Copy link
Owner

Ah, nice spot, no negatives allowed! I'd love to see the difference between your YouTube video earlier and this fix. I'm sure there are further optimisations that can be made to reduce the noise but that sounds promising!

@MV10
Copy link
Collaborator Author

MV10 commented Aug 30, 2020

Enjoy my ugly mug. 👍🏻 This one is actually also using cell-based detection, so each cell has to register a minimum percentage of change (50% in this case) and then a certain number of cells must change in the frame (currently 20) to trigger motion.

https://youtu.be/ULqzobZi1QY

But even though it sees me now, I'm back to the original issue -- trying to ignore lighting changes. So I'll be moving on to HSV, but if you step through the video slowly and watch that red/green bar, cell-based detection improves accuracy quite a lot.

I see you merged #167, thanks. Think you can look at #169?

I have wiki update notes, so I'll start working on them this week.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 30, 2020

Wow, using the .NET Color class (really a struct, if I remember correctly) is a huge perf hit in MMALSharp's Color utility. Processing a 15 second raw file (323MB 640x480) to HSV (running every pixel through Color.FromRGBA and doing nothing else) takes 40 seconds on my Pi4 ramdisk. Using RGB byte inputs processes the same file in 13 seconds!

Also interesting that converting to HSV and back to RGB produces errors in some frames -- I realize a byte/float round-trip will never be perfect, but it's a lot more noise than I would have expected.

pi@raspberrypi:~/pi-cam-test $ dotnet pi-cam-test.dll -vis output.raw

pi-cam-test

-vis
Preparing pipeline...
Processing raw RGB24 file through motion analysis filter...
Analysis complete.

Elapsed: 00:00:13


Preparing pipeline...
Processing raw RGB24 file to h.264...

Transcoding h.264 to mp4...
frame=  359 fps=0.0 q=-1.0 Lsize=     364kB time=00:00:14.91 bitrate= 200.1kbits/s speed=2.11e+03x
Exiting.

Elapsed: 00:00:16

@MV10
Copy link
Collaborator Author

MV10 commented Aug 31, 2020

https://youtu.be/lWDZZ7U9m-o

The latest results are pretty good:

  • summed RGB diffs with higher pixel RGB diff threshold (200)
  • changed-pixel-count threshold as a percentage of the cell (in this case 50%)
  • motion triggered by a number of cells exceeding that threshold (in this case 20)

By increasing the per-pixel RGB diff threshold to 200 I was able to get it to ignore that shading "bloom" across the top of the image and similar noise across the file cabinet (which the Pi is on top of). Since the maximum summed RGB is 255 * 3 or 765, that's the range for pixel-diff threshold. This video shows the strength of the RGB diff as a grayscale relative to that maximum possible diff. Being able to see the diff strengths was the key to figuring out how to tune it for this particular scene - it became obvious those "blooms" were relatively weak diff signals, suggesting small tweaks to the threshold were all that was needed.

I think for CCTV usage, this type of video would be a valuable tool for interactively tuning motion detection parameters, except that I can't find a way to fit it into the pipeline where it could (a) feed modified frame data into an encoder (so you could stream the results, for example) while (b) still receiving frame buffer data with associated ImageContext into the motion detection / analysis code (which seems to only be available at the end of the pipeline in an output capture handler). I guess it would just need to be another dedicated capture handler.

I've pretty much abandoned the idea of trying to normalize or otherwise filter lighting differences -- anything strong enough to do that also removes too much information needed to detect actual motion. I tried seven algorithms I found online. Hopefully one day somebody smarter than me will discover MMALSharp and figure it out!

But because there are other algorithms out there (many of them!), the way I've separated the frame diff buffering from the algorithm might still be useful. A pain point while I was trying to rapidly iterate over motion algorithms was the config class ... I'm thinking about something like this:

public class MotionConfig<T>
{
    public T AlgorithmConfig { get; set; }
    public TimeSpan TestFrameInterval { get; set; }
    public TimeSpan TestFrameRefreshCooldown { get; set; }
    public string MotionMaskPathname { get; set; }
}

@techyian
Copy link
Owner

I think this is brilliant and is far more than I'd have been able to achieve so thank you for spending the time and effort getting this into the library. Will your test application be proprietary or will you eventually be able to share that too? I'm very impressed with how fast your application and the library is able to detect motion and draw that into the video, there's very little delay at all! Are we nearly at the peak of what can be achieved here do you think? It would be interesting to see how this performs in a real life scenario over a longer period of time.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 31, 2020

Everything is planned to be fully OSS.

Drawing into the buffer didn't add any appreciable overhead. I just wrote a super basic stream-based capture handler and the analysis does a callback with the byte buffer to write that into the stream. The only thing that really impacted performance was HSV conversion, even after getting rid of the Color class. In fact some of my HSV tests involving channel weighted thresholds caused my Pimoroni CPU fan to kick in, which I've not otherwise heard run during any of this.

The code is a bit rough right now and has lots of hard-coding scattered about, but here's the code I've been working with:

https://github.com/MV10/pi-cam-test/tree/local_mmal_dev_pkg

FrameDiffDriver - handles all the buffering and invokes the algorithm

FrameDiffMetrics - thread safe struct for passing frame details to the algorithm

IFrameDiffAlgorithm - algorithm interface

MotionAnalysisCaptureHandler - writes a raw stream

AnalyseSummedRGBCells - the motion detection analysis itself

@MV10
Copy link
Collaborator Author

MV10 commented Aug 31, 2020

Oh and nothing here but the readme, which lists my plans, but ultimately this is what I want to make:

https://github.com/MV10/smartcam

This is all just hobby activities for me. We're starting to think about retiring and I need hobbies that are cheaper than motorsports! I was quite happy to discover MMALSharp, I was looking for information about how the Pi camera hardware worked and stumbled on it completely by accident.

@MV10
Copy link
Collaborator Author

MV10 commented Aug 31, 2020

Am I correct in thinking that MMALDownstreamComponent has to be one of the hardware-backed components like vc.ril.resize? I couldn't "fake it" and create a video-altering plug-in with input / output ports that can fit into the middle the way things like the resizer does?

@techyian
Copy link
Owner

techyian commented Sep 1, 2020

Excellent, I'll have a look at your repo over the next few days :)

Am I correct in thinking that MMALDownstreamComponent has to be one of the hardware-backed components like vc.ril.resize? I couldn't "fake it" and create a video-altering plug-in with input / output ports that can fit into the middle the way things like the resizer does?

The thought has crossed my mind and I did start to entertain the idea in #63, but then I felt that the callback handlers superseded this and I think I'd recommend using those going forward. The callback handlers are given the complete buffer object, and you can hook your own Input and Output callback handlers to a MMAL component.

@MV10
Copy link
Collaborator Author

MV10 commented Sep 1, 2020

I'll just PR the changes, it'll be easier than digging through my junk-drawer repos! And I wanted to get it into the library anyway to perf test it the right way.

I tried figuring out how to do this from a callback handler, but they still only output at the end of the pipeline (and I don't think the input callback handlers have information like end-of-frame, do they?). The essential problem is that (as far as I can tell), when outputting raw RGB frames (as was the case with motion analysis like the videos I've posted), there's no opportunity to feed those raw frames through a video encoder or one of the other middle-of-the-pipeline components. That's what led me to wonder if those have to be hardware-based pipeline components. It looked like either the input or the output has to be something the hardware understands.

Those videos were a four-step process:

  • camera -> resizer -> raw file 1
  • raw file 1 -> input handler -> motion analysis -> video output handler -> raw file 2
  • raw file 2 -> input handler -> h.264 encoder -> video output handler -> h.264 file
  • h.264 file -> ffmpeg -> mp4 -> my eyeballs, your eyeballs, YouTube fame, etc.

(Maybe three steps, I meant to research whether ffmpeg could process raw to mp4 but never got around to it.) That was OK because I was after repeatable testing, although two raw files on a ramdisk was a tight squeeze!

But this is the ultimate goal, and I don't think output callback handlers can accomplish either of these, can they? (With the assumption that the "motion analysis" component is both reading and writing raw frames.)

  • camera -> resizer -> motion analysis -> h.264 encoder -> video output handler -> h.264 file
  • camera -> resizer -> motion analysis -> h.264 encoder -> cvlc output handler -> MJPEG stream

@techyian
Copy link
Owner

techyian commented Sep 1, 2020

my eyeballs, your eyeballs, YouTube fame, etc.

Ha ha, you never know, maybe one day! :)

One thing that comes to mind is the Connection callback handler. These are activated when the useCallback parameter of ConnectTo is set to true. I haven't really done anything with these, and there is a performance penalty involved (as to how much I'm not sure, I've never timed the difference over tunnelled connections which are the default but on a Pi 4 the penality might not be as great).

Additional work would be required to make these suitable, though. At the moment any modifications you'd want to make to a frame whilst in-flight would need to be done within the callback handler itself as they're not hooked up to a capture handler and that doesn't fit well with the motion detection work currently in place. I'm not even sure the capture handlers are suitable objects for Connection callbacks as they don't return anything, they just process to a target?

To get started you'd want to create your own callback handler which inherits from ConnectionCallbackHandler. Next, when creating your connection, store the IConnection object returned and register it, i.e:

var connection = resizer.Outputs[0].ConnectTo(vidEncoder, useCallback: true);
connection.RegisterCallbackHandler(new MyConnectionHandler(connection));

When you do this, the following code will be invoked. This whole area needs looking at really as it's quite early stuff and hasn't ever been fully tested. You will find that currently, only the InputCallback is invoked as the else block is never entered.

If you want to populate the IBuffer object, you can just call ReadIntoBuffer which accepts a byte[] and some other params. This would then pass your modified frame along to the next component.

Does this sound like the sort of thing that could help you? I'm trying to think of how you can achieve what you're looking for without being too hacky! Please expect a bumpy ride!

@MV10
Copy link
Collaborator Author

MV10 commented Sep 1, 2020

That sounds promising. What do you mean about tunneled connections? (I see it now.) One of the reasons I started thinking about a "plug-in" approach to motion detection was to weaken the coupling to output capture handlers, specifically because I hoped to drive the code in different ways. I'll try to take a look at that tomorrow.

I have these changes all polished up and working in the library, I'll PR it shortly, it'll be much easier for you to review that way.

PR #175

Running just straight motion detection (no modified frames) on a quiet scene took a very small hit, we're at around 9ms per frame, which is still pretty great, about 111 FPS. That's probably due to a bunch of "are we doing analysis?" checks relating to drawing modified frames. Adding the analysis (modifying the frame buffer) climbs to 12ms per frame, or 83 FPS.

One of the other things I added is an alternate approach to disabling and re-enabling motion detection. The current method works the same way, but I wanted the ability to temporarily disable the OnDetect callback while the algorithm continued to watch for motion (and do analysis or whatever). This allows that test frame "cooldown" I mentioned (so we don't update the test frame with an image that has a moving object) and it also allows the OnDetect code to monitor a new value, the amount of time elapsed since the last detection event. This again is all CCTV related, you'll often want to record until motion stops (or more likely, some short period after motion stops, in case it re-starts).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants