-
Notifications
You must be signed in to change notification settings - Fork 33
RGB normalization for motion detection, and use of threshold #172
Comments
While the image above seemed promising, and with further tuning I got even better results, for some reason (that I didn't bother digging into) it was really bad at detecting black. I wore a black T-shirt in one test and it didn't detect any part of that as a difference. So if you were ever attacked by professional ninjas, well, you were in big trouble. Worse, I found a bug in my HSV conversion, and fixing that didn't improve the Black T-Shirt Ninja issue, and it added lots of noise to the light-and-shadow test. So now I'm using the MMALSharp conversions, figuring I'll end up using those anyway. However, HSV is still looking promising. With a totally different approach, I'm getting good, repeatable results now with a variety of test image pairs. The basic steps for the example below are to convert each pixel to HSV (I'm hoping MMAL can simply output in HSV?) then apply the following rules:
I think combining this with a couple more steps could produce very solid results:
Step 3 is a fast and easy improvement because it ignores "sparkles" and other cells with minor artifacts (whereas today those contribute to the overall diff count for the image). I think step 4 would help in my desire to implement an "ignore my small dog" mode without the heavier effort to carry out the cell proximity analysis I was considering earlier. Finally, assuming all this stuff doesn't drag us down into 3FPS territory 😬 ... I suspect it'll be useful to require motion detection across some minimum number of frames before calling |
I'm experimenting with adding this to PR #169. |
Thanks Jon. I have been reading your updates and I'm finding your research interesting. I must admit, my own research around this didn't progress much further than RGB diff checking and I'm very much a newbie in this area. I think the stage you've got things to is impressive and I'll continue to read your feedback around HSV difference.
I'm not aware of this. Natively, MMAL will output each frame as YUV420 as this is the pixel format that the camera modules use themselves. Telling MMAL to output as RGB causes data to go through the image conversion block so there is a performance hit here. I'd be interested to hear how fast the HSV software conversion works in the |
I should have mentioned I understand summed RGB diff comparison is a completely common technique, and as far as I can see it's implemented correctly. I hope I didn't sound like I was criticizing too much. I'm a total newbie, too! I have that last approach working locally in that PR branch, but I definitely need to take a step back and set up to record and process video file input for repeatable true-motion testing. It's a very interesting problem to me, but I don't think I'll have definite conclusions any time soon, so I'm going to stash those changes and leave that branch as-is. |
And now I'm running into something that is making me feel very stupid -- how to actually get the raw stream written to a file. I thought I could do something as simple as this, but output port configuration fails when I use using (var capture = new VideoStreamCaptureHandler(rawPathname))
using (var encoder = new MMALVideoEncoder())
{
var portCfg = new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true);
encoder.ConfigureOutputPort(portCfg, capture);
cam.Camera.VideoPort.ConnectTo(encoder);
await Task.Delay(2000);
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
} Since that wasn't working, I wondered if I could use Is this another edge case where I should just write something myself (which I think is probably easy), or am I just being obtuse and overlooking something obvious? |
Don't worry about it, you're not being stupid. The OPAQUE format is a proprietary Broadcom format used internally within MMAL, it's essentially a pointer to the image data which makes transmission between components more efficient. If you want to record raw video frames the easiest thing to do is attach a splitter to the camera's video port and then attach a capture handler directly to one of the splitter's output ports, as we have been doing for motion detection in previous examples. When initialising the Let me know if you're still struggling. This example should help you. |
Nice. Can I read with |
Yep. If you look at the motion detection example in the wiki, you can see that we're getting the raw frames via the resizer component, but that's still using |
Thank you, I think I've gotten it working. I wanted to save through the resizer so the raw stream is the same as the motion detection processes receive (plus I write to ramdisk so space is limited). Save: using (var capture = new VideoStreamCaptureHandler(rawPathname))
using (var splitter = new MMALSplitterComponent())
using (var resizer = new MMALIspComponent())
{
splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), capture);
cam.Camera.VideoPort.ConnectTo(splitter);
splitter.Outputs[0].ConnectTo(resizer);
await Task.Delay(2000);
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
await cam.ProcessAsync(cam.Camera.VideoPort, cts.Token);
} Read and output to h.264 ... and I guess the default quality is amazingly bad, I thought it was broken, 215MB of raw frames (no motion) compressed to just 26K, lol ... I need to make this MP4 to be sure it's all writing but I think it works: using (var stream = File.OpenRead(rawPathname))
using (var input = new InputCaptureHandler(stream))
using (var splitter = new MMALSplitterComponent())
using (var output = new VideoStreamCaptureHandler(h264Pathname))
using (var encoder = new MMALVideoEncoder())
{
splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true), null, input);
var encoderCfg = new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, width: 640, height: 480, framerate: 24, zeroCopy: true);
encoder.ConfigureOutputPort<FileEncodeOutputPort>(0, encoderCfg, output);
splitter.Outputs[0].ConnectTo(encoder);
await standalone.ProcessAsync(splitter);
} |
I'm a bit puzzled that my h.264 has significant artifacts despite setting the max values for quality and bitrate. It's not a deal breaker since that output is just for me to visualize what the code is doing but setting those didn't make any difference at all versus the defaults. I thought perhaps I should call MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, width: 640, height: 480, framerate: 24, quality: 40, bitrate: MMALVideoEncoder.MaxBitrateLevel4); Is this some side-effect of going from RGB24 back to I420? But it's really bad (see below). Transcoding is trivial, and as I understand it ffmpeg
|
When using H.264 encoding, the quality parameter actually refers to Quantization. Lower values indicate a lower compression rate and allow a higher bitrate. I appreciate it's confusing as JPEG quality for stills is actually the opposite to this (high value means high quality!). To quote https://www.vcodex.com/an-overview-of-h264-advanced-video-coding/: "Setting QP to a high value means that more coefficients are set to zero, resulting in high compression at the expense of poor decoded image quality. Setting QP to a low value means that more non-zero coefficients remain after quantization, resulting in better decoded image quality but lower compression." You could try a lower number such as 10, you could go even lower if you wanted but you'll just have to play with the settings. You could also disable the Quantization parameter (leave as 0) which forces variable bitrate. With that said, I've come across this issue with artifacts showing on the video, I'm hoping that tweaking these settings will fix it for you though. With your read example, you don't actually need the splitter component, you should be able to feed directly into a Video encoder component. |
Oops, I actually knew that but had forgotten. 10 is good, VBR is still pretty bad. Thanks for clarifying about the splitter, I had thought there was something special about it that allowed the RGB24/RGB24 input. I don't know if there's general value in this yet, but I'm doing something kind of neat -- I broke down Thanks for all the help! /// <summary>
/// Represents a frame-difference-based motion detection algorithm.
/// </summary>
public interface IFrameDiffAlgorithm
{
/// <summary>
/// Invoked after the buffer's <see cref="FrameDiffBuffer.TestFrame"/> is available
/// for the first time and frame metrics have been collected. Allows the algorithm
/// to modify the test frame, if necessary.
/// </summary>
/// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
void FirstFrameCompleted(FrameDiffBuffer buffer);
/// <summary>
/// Invoked when <see cref="FrameDiffBuffer"/> has a full test frame and a
/// new full comparison frame available.
/// </summary>
/// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
void AnalyseFrames(FrameDiffBuffer buffer);
/// <summary>
/// Invoked when <see cref="FrameDiffBuffer"/> has been reset. The algorithm should also
/// reset stateful data, if any.
/// </summary>
/// <param name="buffer">The <see cref="FrameDiffBuffer"/> invoking this method.</param>
void ResetAnalyser(FrameDiffBuffer buffer);
} |
So once again I'm a bit stumped by the pipeline. I'm trying to do this: RGB24 file > InputCaptureHandler > MotionAnalysis > video encoder > VideoStreamCaptureHandler > h264 file I don't know what sort of component MotionAnalysis ought to be. I thought it would replace VideoStreamCaptureHandler, but I wasn't thinking about the fact that the video encoder is outputting h.264. So it needs to be downstream of something (???) that outputs RGB24 I suppose if the pipeline can't support additional passes like this, I can always break this into two separate rounds of processing (set the video encoder to RGB24 in and out, put my analysis as an output capture handler, then circle back and convert to h.264). Edit: This is what I meant -- using the resizer as a pass-through (doesn't work with the encoder, it doesn't like RGB24 for both input and output). It works but it still seems like I ought to be able to do the analysis then somehow output to the encoder in one shot. And because I'm reading raw and writing raw that 1GB ramdisk fills up fast! using (var stream = File.OpenRead(rawPathname))
using (var input = new InputCaptureHandler(stream))
using (var analysis = new MotionAnalysisCaptureHandler(analysisPathname, motionConfig))
using (var resizer = new MMALIspComponent())
{
var cfg = new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480, framerate: 24, zeroCopy: true);
resizer.ConfigureInputPort(cfg, null, input);
resizer.ConfigureOutputPort<FileEncodeOutputPort>(0, cfg, analysis);
Console.WriteLine("Processing raw RGB24 file through vis filter...");
await standalone.ProcessAsync(resizer);
} |
And now that I have it working, I see another minor problem in the frame diff logic (just the basic buffering) -- if your usage doesn't involve disabling motion detection once it fires, the class will continue periodically updating the test frame, which means the test frame could end up including whatever is moving around. I still think it's good to update the test frame, but there should be a "cooldown" where the test frame isn't updated until no motion has been detected for awhile. I almost think motion detection should continue running when disable is called, and enable/disable only applies to triggering the Basically a note-to-myself, I guess ... I still want to focus on this pipeline thing. But it's pretty cool seeing an MP4 highlighting the stuff that triggered motion! I'm only doing the basic |
This is pretty crazy. Putting summed RGB diff testing into video form makes me realize it isn't triggering off anything we thought it was. The way I'm altering the video is to set unchanged pixels to quarter grayscale, and leave changed pixels at full normal color. When I move my entire body into frame it's only responding to shadows in the background. Now, this may be a quirk of this specific scene but it's rather dramatic. At no point does it actually detect me in the frame, I'm enjoying messing around with this way too much. |
😕 That is very bizarre. I'm sure I've done a similar exercise in the past where I drew a red outline around the detected object and it worked - although it's also feasible that I dreamt that! Please let me know if you figure out why. Again, I must apologise for not being able to assist much on the coding side of things, work has picked up a lot lately so I don't have much left in the tank to dedicate to the library. |
No problem at all! Do you think you might find some time for those PRs? No sweat if it's too much, I can wait, I'm sure this motion stuff will occupy me for quite some time (which also means I'll be blabbing here as I go). I'd greatly appreciate it, though, if you could explain whether I can do both steps in one shot in the pipeline somewhere, ultimately I'd like to stream the motion analysis realtime (a couple posts back from yesterday). My wife reminds me we're entertaining friends today, so no more nerding out today. I dropped one of the MP4s on youtube though -- this variation separates the RGB threshold from the total-pixel-count threshold, and it also compares pixel diff in grayscale which reduces the noise a little, but ultimately I don't think summed RGB diff works. I will also go to Google and read about it again to be sure we have the algorithm right. Anyway, enjoy (only 15 seconds): |
Partygoers to arrive any moment, but I figured it out while driving back from the store. It needs to compare the absolute value of the diff, this line is the problem: if (rgb2 - rgb1 > threshold) Still very noisy and has weird cutouts in fleshtone versus my yellow wall but this works far, far better! if(Math.Abs(rgb2 - rgb1) > threshold) |
Ah, nice spot, no negatives allowed! I'd love to see the difference between your YouTube video earlier and this fix. I'm sure there are further optimisations that can be made to reduce the noise but that sounds promising! |
Enjoy my ugly mug. 👍🏻 This one is actually also using cell-based detection, so each cell has to register a minimum percentage of change (50% in this case) and then a certain number of cells must change in the frame (currently 20) to trigger motion. But even though it sees me now, I'm back to the original issue -- trying to ignore lighting changes. So I'll be moving on to HSV, but if you step through the video slowly and watch that red/green bar, cell-based detection improves accuracy quite a lot. I see you merged #167, thanks. Think you can look at #169? I have wiki update notes, so I'll start working on them this week. |
Wow, using the .NET Also interesting that converting to HSV and back to RGB produces errors in some frames -- I realize a byte/float round-trip will never be perfect, but it's a lot more noise than I would have expected.
|
The latest results are pretty good:
By increasing the per-pixel RGB diff threshold to 200 I was able to get it to ignore that shading "bloom" across the top of the image and similar noise across the file cabinet (which the Pi is on top of). Since the maximum summed RGB is 255 * 3 or 765, that's the range for pixel-diff threshold. This video shows the strength of the RGB diff as a grayscale relative to that maximum possible diff. Being able to see the diff strengths was the key to figuring out how to tune it for this particular scene - it became obvious those "blooms" were relatively weak diff signals, suggesting small tweaks to the threshold were all that was needed. I think for CCTV usage, this type of video would be a valuable tool for interactively tuning motion detection parameters, except that I can't find a way to fit it into the pipeline where it could (a) feed modified frame data into an encoder (so you could stream the results, for example) while (b) still receiving frame buffer data with associated ImageContext into the motion detection / analysis code (which seems to only be available at the end of the pipeline in an output capture handler). I guess it would just need to be another dedicated capture handler. I've pretty much abandoned the idea of trying to normalize or otherwise filter lighting differences -- anything strong enough to do that also removes too much information needed to detect actual motion. I tried seven algorithms I found online. Hopefully one day somebody smarter than me will discover MMALSharp and figure it out! But because there are other algorithms out there (many of them!), the way I've separated the frame diff buffering from the algorithm might still be useful. A pain point while I was trying to rapidly iterate over motion algorithms was the config class ... I'm thinking about something like this: public class MotionConfig<T>
{
public T AlgorithmConfig { get; set; }
public TimeSpan TestFrameInterval { get; set; }
public TimeSpan TestFrameRefreshCooldown { get; set; }
public string MotionMaskPathname { get; set; }
} |
I think this is brilliant and is far more than I'd have been able to achieve so thank you for spending the time and effort getting this into the library. Will your test application be proprietary or will you eventually be able to share that too? I'm very impressed with how fast your application and the library is able to detect motion and draw that into the video, there's very little delay at all! Are we nearly at the peak of what can be achieved here do you think? It would be interesting to see how this performs in a real life scenario over a longer period of time. |
Everything is planned to be fully OSS. Drawing into the buffer didn't add any appreciable overhead. I just wrote a super basic stream-based capture handler and the analysis does a callback with the byte buffer to write that into the stream. The only thing that really impacted performance was HSV conversion, even after getting rid of the The code is a bit rough right now and has lots of hard-coding scattered about, but here's the code I've been working with: https://github.com/MV10/pi-cam-test/tree/local_mmal_dev_pkg FrameDiffDriver - handles all the buffering and invokes the algorithm FrameDiffMetrics - thread safe struct for passing frame details to the algorithm IFrameDiffAlgorithm - algorithm interface MotionAnalysisCaptureHandler - writes a raw stream AnalyseSummedRGBCells - the motion detection analysis itself |
Oh and nothing here but the readme, which lists my plans, but ultimately this is what I want to make: https://github.com/MV10/smartcam This is all just hobby activities for me. We're starting to think about retiring and I need hobbies that are cheaper than motorsports! I was quite happy to discover MMALSharp, I was looking for information about how the Pi camera hardware worked and stumbled on it completely by accident. |
Am I correct in thinking that |
Excellent, I'll have a look at your repo over the next few days :)
The thought has crossed my mind and I did start to entertain the idea in #63, but then I felt that the callback handlers superseded this and I think I'd recommend using those going forward. The callback handlers are given the complete buffer object, and you can hook your own Input and Output callback handlers to a MMAL component. |
I'll just PR the changes, it'll be easier than digging through my junk-drawer repos! And I wanted to get it into the library anyway to perf test it the right way. I tried figuring out how to do this from a callback handler, but they still only output at the end of the pipeline (and I don't think the input callback handlers have information like end-of-frame, do they?). The essential problem is that (as far as I can tell), when outputting raw RGB frames (as was the case with motion analysis like the videos I've posted), there's no opportunity to feed those raw frames through a video encoder or one of the other middle-of-the-pipeline components. That's what led me to wonder if those have to be hardware-based pipeline components. It looked like either the input or the output has to be something the hardware understands. Those videos were a four-step process:
(Maybe three steps, I meant to research whether ffmpeg could process raw to mp4 but never got around to it.) That was OK because I was after repeatable testing, although two raw files on a ramdisk was a tight squeeze! But this is the ultimate goal, and I don't think output callback handlers can accomplish either of these, can they? (With the assumption that the "motion analysis" component is both reading and writing raw frames.)
|
Ha ha, you never know, maybe one day! :) One thing that comes to mind is the Connection callback handler. These are activated when the Additional work would be required to make these suitable, though. At the moment any modifications you'd want to make to a frame whilst in-flight would need to be done within the callback handler itself as they're not hooked up to a capture handler and that doesn't fit well with the motion detection work currently in place. I'm not even sure the capture handlers are suitable objects for Connection callbacks as they don't return anything, they just process to a target? To get started you'd want to create your own callback handler which inherits from
When you do this, the following code will be invoked. This whole area needs looking at really as it's quite early stuff and hasn't ever been fully tested. You will find that currently, only the If you want to populate the Does this sound like the sort of thing that could help you? I'm trying to think of how you can achieve what you're looking for without being too hacky! Please expect a bumpy ride! |
That sounds promising. I have these changes all polished up and working in the library, I'll PR it shortly, it'll be much easier for you to review that way. PR #175 Running just straight motion detection (no modified frames) on a quiet scene took a very small hit, we're at around 9ms per frame, which is still pretty great, about 111 FPS. That's probably due to a bunch of "are we doing analysis?" checks relating to drawing modified frames. Adding the analysis (modifying the frame buffer) climbs to 12ms per frame, or 83 FPS. One of the other things I added is an alternate approach to disabling and re-enabling motion detection. The current method works the same way, but I wanted the ability to temporarily disable the |
Hi Ian, in one of our discussions in a recent PR, you said something to the effect that using the motion detection threshold value inside the pixel-level loop as well as outside might be confusing. The comment didn't stick with me for some reason and we got sidetracked into other topics. However, I think you were on to something. As you know, within the loop, the threshold is compared to the summed difference of the RGB values of individual pixels, but externally it represents the number of pixels which differ. Those are two wildly different concepts, and purely by coincidence, I ran a test where two very different images didn't register as different using the
FrameDiffAnalyser
algorithm. But before I go into that, let me show you what I was working on that led to this.This one is pretty cool. RGB normalization calculates the proportion each color channel contributes to the color, rather than the intensity of each channel (which is what RGB normally signifies). I've read that this largely eliminates lighting differences -- and it seems to work reasonably well. It's simple to calculate (probably another great candidate for an OpenGL transform).
(I'm not using MMALSharp for this yet, it's a stand alone program I wrote so I could play with the algorithms with static images for repeatability.)
Below, A and B are 1296x972 BMPs captured by my Pi (pointed at my office closet where there will be no unexpected motion). In order to simulate false motion detection due to lighting changes, I shined one of those insanely bright LED flashlights on the door. The black and white image C are all the pixels the current
FrameDiffAnalyser
algorithm would flag as different ... 179,316 pixels to be exact -- rather higher than our 130 threshold (which, to be fair, is optimized for 640x480, at least in terms of total diff count). In the second row of images, X and Y are the RGB normalized versions of A and B, and Z only shows 73 pixels as different!(You may notice I accidentally left date/time overlays turned on, so those contribute a little bit to the differences. But one interesting effect of normalization -- which may turn out to be a problem, I don't know yet -- is that bright white colors average down to a neutral gray -- the timestamp overlay disappears in the normalized images.)
To test that a truly different image will register as motion, I set up two new stills (no annotation!), A and B below, and to my very great surprise the
FrameDiffAnalyser
algorithm only "sees" four different pixels! Apparently my hand is close enough to the (summed RGB) wall color that the threshold system fails completely. I've actually run this test quite a few times with different image pairs and different amounts of my arm in-frame and in different places, and when my arm is in front of the yellow wall, it fails this way consistently. (This also explains why my testing here in my office sometimes seemed to "lag" when I was intentionally triggering motion -- my hand was in-frame and probably still not being detected.)At first I was excited, the normalized images resulted in a diff of more than 500 pixels -- but look at the Z bitmap which highlights where the differences are. It's just noise. So from that, I conclude the RGB sum/subtraction/diff process is not especially reliable.
The bad news is that I don't have anything in mind to fix this, but since my plan with this motion detection test program was to try new things, maybe I'll figure something out.
I noticed also that you have converters for color spaces, I suspect just looking for H-channel differences in something like HSV may work better.
I'm definitely open to suggestions, as I'm certain you've researched all of this far more than I have!
The text was updated successfully, but these errors were encountered: