CCTV and Motion Detection

⚠️ You are viewing the v0.7 Alpha documentation. If you are not cloning the latest code from this repository or using dev packages from MyGet, then you may wish to look at the v0.6 examples instead.

Revisions

v0.7 Alpha
v0.6 (Current)

Motion Detection Basics

Frame differencing

Frame differencing is a common motion-detection technique whereby a test frame (sometimes called the "background frame") is compared against new frames (or "current frame") for changes exceeding various thresholds. The MMALSharp library has new APIs and classes that let you configure motion detection behavior, including callbacks to run custom code when motion is detected.

There are different strategies to detect differences between frames. The provided implementation combines two techniques which help reject sensor noise and small localized motion (such as an insect, or even a small pet).

At the most basic level, the algorithm compares individual pixels. This is called "RGB summing" because the red, green, and blue values are added together for each pixel in both images. If the difference between the test frame and the new frame exceeds a threshold, the pixel is considered changed. The image is subdivided into a grid of smaller rectangles called cells. The size of each cell and the number of pixels in the cell depends on the image resolution. There is a second threshold which defines the percentage of pixels in the cell which must change for the entire cell to be considered changed. This is how sensor noise and other minor changes are discarded. Finally, there is a third threshold, which is the number of cells across the entire image that must register changes in order to signal that motion detection has occurred. This is how real but small and unimportant motion is ignored (insects, pets, and distant background movement, for example). All of these thresholds are configurable.

Typically motion detection doesn't require or benefit from high resolution. 640 x 480 should be adequate, although you should always feed raw RGB24, RGB32, or RGBA images into the system. Image artifacts from lossy compression algorithms like h.264 will be mistaken for motion and the RGB summing algorithm is not compatible with the YUV pixel format. At 640 x 480 x RGB24, a Raspberry Pi 4B can easily process full-motion video using the provided algorithms (an improvement over v0.6 which could only process about 5 frames per second on the same hardware).

The new FrameBufferCaptureHandler class provides management and control of motion detection. The following example demonstrates the most basic possible motion detection. This does nothing but write messages to the console when motion is detected. Later we'll see more complete examples that capture video and snapshots.

public async Task SimpleMotionDetection(int totalSeconds)
{
    // Assumes the camera has been configured.
    var cam = MMALCamera.Instance;

    using (var motionCaptureHandler = new FrameBufferCaptureHandler())
    using (var resizer = new MMALIspComponent())
    {
        // The ISP resizer is used to output a small (640x480) image to ensure high performance. As described in the
        // wiki, frame difference motion detection only works reliably on uncompressed, unencoded raw RGB data. The
        // resizer outputs this raw frame data directly into the motion detection handler.
        resizer.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
        resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), motionCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(resizer);

        // Camera warm-up.
        await Task.Delay(2000);

        // We'll use the default settings for this example.
        var motionConfig = new MotionConfig(algorithm: new MotionAlgorithmRGBDiff());

        // Duration of the motion-detection operation.
        var stoppingToken = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
        Console.WriteLine($"Detecting motion for {totalSeconds} seconds.");

        await cam.WithMotionDetection(
            motionCaptureHandler,
            motionConfig,
            // This callback will be invoked when motion has been detected.
            async () =>
            {
                // When motion is detected, temporarily disable notifications
                motionCaptureHandler.DisableMotionDetection();
                        
                // Wait 2 seconds
                Console.WriteLine($"\n     {DateTime.Now:hh\\:mm\\:ss} Motion detected, disabling detection for 2 seconds.");
                await Task.Delay(2000, stoppingToken.Token);

                // Re-enable motion detection
                if(!stoppingToken.IsCancellationRequested)
                {
                    Console.WriteLine($"     {DateTime.Now:hh\\:mm\\:ss} ...motion detection re-enabled.");
                    motionCaptureHandler.EnableMotionDetection();
                }
            })
            .ProcessAsync(cam.Camera.VideoPort, stoppingToken.Token);
    }
    cam.Cleanup();
}

The WithMotionDetection method configures the camera processing loop for motion detection by identifying the FrameBufferCaptureHandler responsible for motion detection, the MotionConfig defining the applicable settings, and an asynchronous callback which is invoked when motion is detected.

IMPORTANT: It is your responsibility to ensure all exceptions are handled inside your callback.

The callback is an event handler, which means it is an async void delegate. Event handlers are the only scenario in .NET applications where async void is an acceptable method signature (versus the common async Task signature). Since it returns void instead of Task, there is no enclosing method which can intercept an exception, and unhandled exceptions will immediately terminate the process.

Configuration: Motion mask

Motion detection commonly requires ignoring areas of the camera view where real or apparent motion may occur that is not of interest. The library allows you to configure a mask bitmap to define areas to be ignored.

Masking is especially useful for outdoor scenes where "background" motion like trees, clouds, or passing vehicular traffic may trigger unwanted events. Masking can also be helpful indoors where changes like reflections in a picture frame, movement on a television screen, or even blinking LEDs on electronic devices may be mistaken as motion.

The mask bitmap must be the same size and color depth as the motion detection frames, and the file format should be either BMP or PNG format. The library can also load a JPG mask file, but this is not recommended as compression artifacts may produce inaccuracies.

Fully-black pixels in the mask will be ignored -- they will always be treated as if no motion has occurred. Thus, the easiest way to create a mask is to capture a still picture (for example, using the raspistill utility with a -e BMP or -e PNG encoding switch) and load that into any image editor to blank out the unwanted regions.

The mask is specified as an optional pathname argument to the MotionConfig constructor:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(),
    maskBitmap: "/home/pi/images/motionmask.bmp"
);

An exception will be thrown if the mask cannot be found, or if the resolution or color-depth does not match the motion detection image configuration (in these examples, that is always 640 x 480 x RGB24).

Testing has not shown any discernable changes to performance when a mask is used.

Configuration: Test frames

Motion detection based on frame differencing algorithms compares a test frame to newly received frames. The first full frame captured by the camera is stored as the test frame. To help compensate for gradual changes in the scene most commonly caused by lighting changes and shadows, the library is able to periodically update the test frame with a new image.

These are values you will likely need to tune for the specific scene your camera is viewing. Although you can adjust this through trial and error, it may be easier to view the algorithm output in real-time. Refer to the streaming visualisation topic later in this area of the documentation.

Two optional arguments to the MotionConfig constructor controls how this works. Both values default to 3 seconds:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(),
    testFrameInterval: TimeSpan.FromSeconds(3),
    testFrameCooldown: TimeSpan.FromSeconds(3)
);

The testFrameInterval defines how often the test frame is updated, and testFrameCooldown defines how long the scene must be "quiet" (no motion detected) before a test frame is updated. The cooldown period is checked after the interval passes, so the default values of 3 seconds means it will actually update every 6 seconds at a minimum, and possibly longer if there is ongoing motion.

Note that the cooldown is relative to triggered motion. If the scene contains minor motion that was not sufficient to trigger a motion detection event, it's possible that the new test frame will capture a moving object. If you see this happening, simply increase the intervals, the default intervals are somewhat aggressively short.

Configuration: Sensitivity

The library supports different motion detection algorithms, but currently only one algorithm is built in -- RGB summing (also called RGB differencing). While the core motion detection system is based on frame differencing, RGB differencing is based on changes at the pixel level. Because camera image sensors are naturally "noisy", and also to help reject other sources of minor, uninteresting motion, the algorithm also requires larger-scale changes at the "cell" level. Cells are an arbitrarily-sized grid applied to the image data.

The MotionConfig constructor requires a motionAlgorithm object, and the built-in MotionAlgorithmRGBDiff constructor accepts three optional arguments to control sensitivity:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(
        rgbThreshold: 200,
        cellPixelPercentage: 50,
        cellCountThreshold: 20
));

The settings shown above are the defaults.

The rgbThreshold setting controls change-detection sensitivity at the individual pixel level. The maximum value is 255 + 255 + 255 which is 765. Since the per-pixel RGB difference algorithm compares test frame pixels to new frame pixels, a value of 765 would only indicate a change when a fully-black pixel (RGB 0,0,0) switched to full-white (RGB 255,255,255) or vice-versa, so clearly much lower values are more useful. This sensitivity setting helps reject minor lighting changes and the like.

Each image frame is subdivided into a grid of "cells" based on the image resolution. The library automatically selects the grid size. The recommended resolution for motion detection is 640 x 480 which uses a 32 x 32 grid for a total of 1024 cells. This means each cell represents 20 x 15 pixels, or 300 pixels. (The number of cells varies by resolution, but most are around 800 to 1000 -- refer to the table at the end of this section.)

Each cell tracks the number of pixels that changed within the cell. When that count reaches the cellPixelPercentage value, the entire cell is considered to have changed. If the count is below that percentage, the cell is considered unchanged even if some pixels within the cell have changed. So given a 640 x 480 image using 300-pixel-count cells, the default 50% threshold means 150 pixels or more must change (exceed the rgbThreshold) within that cell to trigger a change for the entire cell. This setting helps reject very small sources of motion such as insects or a falling leaf.

Finally, motion detection events are triggered by the total number of cells which have changed using the two processes described above. The cellCountThreshold defines the minimum number of cells across the entire image that must change before the motion detection callback is invoked. This helps reject somewhat larger sources of motion such as small pets or even a television screen within view (although that's more easily ignored with a mask bitmap).

These are values you may want to tune for the types of motion you wish to detect. Although you can adjust this through trial and error, it may be easier to view the algorithm output in real-time. Refer to the streaming visualisation topic later in this area of the documentation.

CCTV (Security Cameras)

Detection events

The OnDetect delegate that you provide to respond to motion detection events can perform any action you desire: record video, take still pictures (or both, as shown below), write to log files, send emails or mobile phone messages, and so on.

The examples in this wiki call DisableMotionDetection while these activities are performed, then later call EnableMotionDetection to re-enable notification. When called in this way, disable stops everything inside the motion detection code, and the system is reset when it is re-enabled. This means the test frame update logic is interrupted, and a new test frame is stored as soon as the system is re-enabled. This is fine for simple demos, but in a more realistic usage, and particularly for CCTV where the system is running for long periods of time, you want that test frame logic to continue running in the background while your code responds to the motion event. For this reason, DisableMotionDetection accepts a bool argument, disableCallbackOnly, which allows the algorithm to continue working without invoking your delegate again.

Another alternative is to design your delegate to be tolerant of multiple frequent invocations (likely one per frame, when motion is being detected), rather than disabling motion detection at all. This is probably how a more sophisticated CCTV application would be designed. For example, you might track the duration between invocations and use a CancellationToken timeout to periodically test for the end of a motion event. This type of design also allows your program to continue operating as if motion is still detected when there are momentary interruptions in actual detection events. The details will depend heavily on your specific needs.

Recording video

Because video files are very large, it is impractical to simply record everything the camera sees. CCTV systems usually save short video clips when motion is detected, and the better systems continuously buffer a certain amount of video so that the stored video clip includes a time period before motion was detected. The MMALSharp library has a component called the CircularBufferCaptureHandler which is capable of doing exactly this.

The following example expands on the basic motion detection example by adding a splitter, a video encoder, and the new handler to output encoded h.264 video clips.

public async Task RecordMotion(int totalSeconds, int recordSeconds)
{
    // Assumes the camera has been configured.
    var cam = MMALCamera.Instance;

    // h.264 requires key frames for the circular buffer capture handler.
    MMALCameraConfig.InlineHeaders = true;

    using (var videoCaptureHandler = new CircularBufferCaptureHandler(4000000, "/home/pi/videos/detections", "h264"));
    using (var motionCaptureHandler = new FrameBufferCaptureHandler())
    using (var resizer = new MMALIspComponent())
    using (var splitter = new MMALSplitterComponent())
    using (var videoEncoder = new MMALVideoEncoder())
    {
        splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
        videoEncoder.ConfigureOutputPort(new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, 0, MMALVideoEncoder.MaxBitrateLevel4, null), videoCaptureHandler);

        // As with the basic example, the resizer sends 640 x 480 raw frames to the motion detection handler.
        resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), motionCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        splitter.Outputs[0].ConnectTo(resizer);
        splitter.Outputs[1].ConnectTo(videoEncoder);

        // Camera warm-up.
        await Task.Delay(2000);

        // We'll use the default settings for this example.
        var motionConfig = new MotionConfig(algorithm: new MotionAlgorithmRGBDiff());

        // Duration of the motion-detection operation.
        var stoppingToken = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
        Console.WriteLine($"Detecting motion for {totalSeconds} seconds.");

        await cam.WithMotionDetection(
            motionCaptureHandler,
            motionConfig,
            // This callback will be invoked when motion has been detected.
            async () =>
            {
                // When motion is detected, temporarily disable notifications
                motionCaptureHandler.DisableMotionDetection();
                Console.WriteLine($"\n     {DateTime.Now:hh\\:mm\\:ss} Motion detected, recording for {recordSeconds} seconds.");
                        
                // When the recording period expires, stop recording and re-enable capture
                var stopRecording = new CancellationTokenSource();
                stopRecording.Token.Register(() =>
                {
                    Console.WriteLine($"     {DateTime.Now:hh\\:mm\\:ss} ...recording stopped.");
                    motionCaptureHandler.EnableMotionDetection();

                    // Calling split will close the h.264 file stream and open another file to
                    // store new circular buffer data while we wait for another motion event.
                    videoCaptureHandler.StopRecording();
                    videoCaptureHandler.Split();
                });

                // Start the recording countdown
                stopRecording.CancelAfter(recordSeconds * 1000);

                // Record until the duration passes or the overall motion detection token expires
                await Task.WhenAny(

                    // Calling StartRecording saves the contents of the circular buffer, then begins appending new
                    // video frames to the buffer until StopRecording is called. The first argument is an optional
                    // initialization Action, which in this case ensures the h.264 stream emits an IFrame.
                    videoCaptureHandler.StartRecording(videoEncoder.RequestIFrame, stopRecording.Token),

                    stoppingToken.Token.AsTask()
                );

                // If the awaiter above exited because the overall stoppingToken
                // has expired, ensure we also terminate the ongoing recording.
                if(!stopRecording.IsCancellationRequested) stopRecording.Cancel();
            })
            .ProcessAsync(cam.Camera.VideoPort, stoppingToken.Token);
    }
    cam.Cleanup();
}

Other than adding the new components to the pipeline, the big change in this example is in the motion detection callback. After motion detection is disabled, we prepare a CancellationTokenSource with a registered callback which re-enables motion detection, stops the recording, and splits the file (which saves the recording and starts a new circular buffer in a new file). That CancellationTokenSource is assigned a timeout matching the desired recording duration, then we await a call to StartRecording on the circular buffer handler. When the stopRecording timeout expires, the callback runs to end the recording and we resume watching for new motion events.

High-resolution snapshots

Another common CCTV scenario is to grab one (or sometimes several) high resolution still-capture images which are used as previews, attached to emails, or sent as SMS/MMS messages to mobile phones. This is easy to accomplish by adding a second FrameBufferCaptureHandler and an image encoder to the video recording example.

public async Task RecordMotionWithSnapshot(int totalSeconds, int recordSeconds)
{
    // Assumes the camera has been configured.
    var cam = MMALCamera.Instance;

    // h.264 requires key frames for the circular buffer capture handler.
    MMALCameraConfig.InlineHeaders = true;

    using (var snapshotCaptureHandler = new FrameBufferCaptureHandler("/home/pi/images/", "jpg"))
    using (var videoCaptureHandler = new CircularBufferCaptureHandler(4000000, "/home/pi/videos/detections", "h264"));
    using (var motionCaptureHandler = new FrameBufferCaptureHandler())
    using (var resizer = new MMALIspComponent())
    using (var splitter = new MMALSplitterComponent())
    using (var videoEncoder = new MMALVideoEncoder())

    // Setting continuousCapture to true feeds every frame to the snapshotCaptureHandler
    using (var imageEncoder = new MMALImageEncoder(continuousCapture: true))
    {
        splitter.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
        videoEncoder.ConfigureOutputPort(new MMALPortConfig(MMALEncoding.H264, MMALEncoding.I420, 0, MMALVideoEncoder.MaxBitrateLevel4, null), videoCaptureHandler);
        imageEncoder.ConfigureOutputPort(new MMALPortConfig(MMALEncoding.JPEG, MMALEncoding.I420, quality: 90), imgCaptureHandler);

        // Once again, the resizer sends 640 x 480 raw frames to the motion detection handler.
        resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), motionCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(splitter);
        splitter.Outputs[0].ConnectTo(resizer);
        splitter.Outputs[1].ConnectTo(videoEncoder);
        splitter.Outputs[2].ConnectTo(imageEncoder);

        // Camera warm-up.
        await Task.Delay(2000);

        // We'll use the default settings for this example.
        var motionConfig = new MotionConfig(algorithm: new MotionAlgorithmRGBDiff());

        // Duration of the motion-detection operation.
        var stoppingToken = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
        Console.WriteLine($"Detecting motion for {totalSeconds} seconds.");

        await cam.WithMotionDetection(
            motionCaptureHandler,
            motionConfig,
            // This callback will be invoked when motion has been detected.
            async () =>
            {
                // When motion is detected, temporarily disable notifications
                motionCaptureHandler.DisableMotionDetection();
                Console.WriteLine($"\n     {DateTime.Now:hh\\:mm\\:ss} Motion detected, recording for {recordSeconds} seconds.");

                // Save a snapshot as soon as motion is detected
                imageCaptureHandler.WriteFrame();

                // When the recording period expires, stop recording and re-enable capture
                var stopRecording = new CancellationTokenSource();
                stopRecording.Token.Register(() =>
                {
                    Console.WriteLine($"     {DateTime.Now:hh\\:mm\\:ss} ...recording stopped.");
                    motionCaptureHandler.EnableMotionDetection();

                    // Calling split will close the h.264 file stream and open another file to
                    // store new circular buffer data while we wait for another motion event.
                    videoCaptureHandler.StopRecording();
                    videoCaptureHandler.Split();
                });

                // Save additional snapshots 1- and 2-seconds after motion was detected
                var snapshotOneSecond = new CancellationTokenSource();
                var snapshotTwoSeconds = new CancellationTokenSource();
                snapshotOneSecond.Token.Register(imageCaptureHandler.WriteFrame);
                snapshotTwoSeconds.Token.Register(imageCaptureHandler.WriteFrame);

                // Start the countdowns
                stopRecording.CancelAfter(recordSeconds * 1000);
                snapshotOneSecond.CancelAfter(1000);
                snapshotTwoSeconds.CancelAfter(2000);

                // Record until the duration passes or the overall motion detection token expires
                await Task.WhenAny(
                    videoCaptureHandler.StartRecording(videoEncoder.RequestIFrame, stopRecording.Token),
                    stoppingToken.Token.AsTask()
                );

                // Ensure all tokens are cancelled if the overall timeout expired.
                if(!stopRecording.IsCancellationRequested) 
                {
                    stopRecording.Cancel();
                    snapshotOneSecond.Cancel();
                    snapshotTwoSeconds.Cancel();
                }
            })
            .ProcessAsync(cam.Camera.VideoPort, stoppingToken.Token);
    }
    cam.Cleanup();
}

The changes here are easy to follow. The image encoder continously feeds new JPEG-encoded frames to our secondary frame buffer, which simply stores the image until WriteFrame is invoked. When motion is detected, the frame is saved to storage, and two more CancellationTokenSource objects are created with 1- and 2-second timeouts to call WriteFrame at those intervals. The motion event's video recording is accompanied by three full-resolution images.

Advanced Usage

Real-time streaming visualisation

The true effects of trial-and-error configuration of motion detection parameters can be difficult to judge. The library makes it possible to visualise the motion detection algorithm at work using a third variation on the FrameBufferCaptureHandler constructor and the EnableAnalysis method on the motion algorithm, which routes the contents of an analysis frame buffer to any standard output capture handler. Note that the output is raw frame data, so it won't be of much use without some sort of encoding. The example below sets up the output as an MJPEG stream you can view from a browser by piping it through ffmpeg and VLC.

Note that you may not want to run this example without a reliable cooling fan. This example is a very heavy workload for a Raspberry Pi. In addition to the GPU work to run the camera, parallel-processing full-motion video frames to detect motion, and generating an analysis image for every frame, we're also software-encoding that raw frame data stream to an h.264 stream in an ffmpeg process, then transcoding the h.264 stream to an MJPEG stream served over HTTP by VLC -- possibly over WiFi.

public async Task VisualiseMotionDetection(int totalSeconds)
{
    var cam = MMALCamera.Instance;

    // Set 640 x 480 at 20 FPS.
    MMALCameraConfig.Resolution = new Resolution(640, 480);
    MMALCameraConfig.SensorMode = MMALSensorMode.Mode7;
    MMALCameraConfig.Framerate = 20;
    cam.ConfigureCameraSettings();

    // Use the default configuration.
    var motionConfig = new MotionConfig(algorithm: new MotionAlgorithmRGBDiff());

    // Helper method to configure ExternalProcessCaptureHandlerOptions. There are
    // many optional arguments but they are generally optimized for the recommended
    // 640 x 480-based motion detection image stream.
    var raw_to_mjpeg_stream = var raw_to_mjpeg_stream = VLCCaptureHandler.StreamRawRGB24asMJPEG();

    // This manages the ffmpeg and clvc processes running under a separate bash shell.
    using (var shell = new ExternalProcessCaptureHandler(raw_to_mjpeg_stream))

    // This version of the constructor is specific to running in analysis mode. The null
    // argument could be replaced with a motion detection delegate like those provided to
    // cam.WithMotionDetection() for normal motion detection usage.
    using (var motion = new FrameBufferCaptureHandler(motionConfig, null))

    // Although we've already set the camera resolution, this allows us to specify the raw
    // format required to drive the motion detection algorithm.
    using (var resizer = new MMALIspComponent())
    {
        // This tells the algorithm to generate the analysis images and feed them
        // to an output capture handler (in this case our ffmpeg / cvlc pipeline).
        motionAlgorithm.EnableAnalysis(shell);

        resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), motion);
        cam.Camera.VideoPort.ConnectTo(resizer);

        // Camera warm-up.
        await cameraWarmupDelay(cam);

        // Tell the user how to connect to the MJPEG stream.
        Console.WriteLine($"Streaming MJPEG with motion detection analysis for {totalSeconds} sec to:");
        Console.WriteLine($"http://{Environment.MachineName}.local:8554/");

        // Set the duration and let it run...
        var stoppingToken = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
        await Task.WhenAll(new Task[]{
                shell.ProcessExternalAsync(stoppingToken.Token),
                cam.ProcessAsync(cam.Camera.VideoPort, timeout.Token),
            }).ConfigureAwait(false);
    }
    cam.Cleanup();
}

A sample analysis image is shown below. It isn't much to look at in the form of a single static image, but we can use this to explain the content of the analysis visualisation. In this case, the camera is detecting motion on a television news program. The blob at the top center is caused by a running ceiling fan's reflection on the glass in a picture frame.

sample

The faint grid of gray dots represents the corners of the motion detection cells. The image content is converted to grayscale. The brightness of each pixel represents the strength of the RGB difference between the test frame and the current frame, and cells that represent detected motion (because the changed pixel count exceeded the configured percentage) are highlighted in a magenta color. When the analysis is running in full-motion, these color effects make it very easy to see just how the algorithm is working with the current settings.

Finally, a short colored bar is drawn across the top of the image representing the total number of cells that detected motion. The line is green if no motion was detected, or red when enough motion has been detected to invoke the motion event handler callback.

Algorithm plug-in model

The MMALSharp library currently provides just one motion detection algorithm: RGB summing / differencing in the MotionDetectionRGBDiff class. However, the frame differencing motion detection system is designed as a plug-in model. The library handles storing and updating the test frame, collecting each new full frame for comparison, storing any configured mask bitmap, and managing the processing cells, but the analysis of the two frames and the motion detection decision is offloaded to the algorithm implementation.

Because these algorithms involve parallel processing, you should be aware of several thread safety / processing considerations:

Parallel threads must never read or write the same properties or fields.
Parallel threads can safely access different array elements in a shared array -- meaning each thread must uniquely "own" some portion of the array and no other thread will access those array elements.
Methods invoked in parallel can safely use by-value arguments (the method receives a local copy).
Methods invoked in parallel must never include by-reference arguments except when the referenced object exposes thread-safe content like array fields (used as described above).
Structs which only expose value-type fields (int, bool, etc.) are passed by-value and are thread safe as arguments to methods invoked in parallel.
Fields can be as much as ten times faster than properties, an important consideration in high performance procesing paths.

The algorithm can interact with a FrameDiffDriver object which is the library object that owns and controls the algorithm. This object exposes the frame-level fields used during motion detection -- images in the form of TestFrame, CurrentFrame, and FrameMask byte arrays, an array of Rectangle structs defining the cell grid as the CellRect field, and a CellDiff integer array which should reflect the degree of change represented by each cell.

A new algorithm should implement IMotionAlgorithm and can optionally derive from MotionAlgorithmBase. You can require custom configuration settings in the algorithm constructor. The interface will require you to implement several methods.

Generally you should refer to the MotionDetectionRGBDiff source code as a guide to creating a new motion detection algorithm, but some general guidelines follow to help orient you.

FirstFrameCompleted is called once when the driver has a full frame available, at which point it is able to provide a FrameAnalysisMetadata structure. This structure reflects the frame width, height, bytes per pixel, stride, cell width, and cell height. It also provides an ImageContext which the algorithm can store as a template to pass to the output handler when analysis mode is activated.

ResetAnalyser can be invoked to reset any stateful information when the driver itself has been reset (typically when motion detection is disabled then re-enabled). This could be useful for analysis which spans multiple frames, for example (which is common in some Gaussian distribution techniques).

DetectMotion is the heart of the operation. It returns a bool which indicates whether motion was detected for that test frame. Internally, it will run a Parallel.ForEach operation against the cells, invoking a private cell analysis function.

EnableAnalysis and DisableAnalysis should set a flag to control whether analysis mode is active. When analysis is active, DetectMotion should alter a local byte array matching the layout of the driver frame buffers. For each frame, this will be handed off to the analysis output capture handler using the template ImageContext provided in the FirstFrameCompleted method.

If your algorithm derives from MotionAlgorithmBase, when running in analysis mode it can invoke a couple of utility methods in the base class. HighlightCell draws a box around a given cell, and DrawIndicatorBlock draws a color-filled rectangle somewhere in the image. The provided algorithm uses this to display the amount of cell-level motion detected in each frame, but it could also be used as a visual indicator of other internal states. These are typically called at the end of DetectMotion.

Resolution / cell-count reference

Although we strongly recommend using 640 x 480 x RGB24 for motion-detection, the algorithms should theoretically work with raw frame data of any resolution. (Throughput has been measured to decrease linearlly as the resolution increases.) The following tables indicate the cell dimensions applied to the various available camera modules and resolutions.

v1 camera (OV5647)
Mode Resolution   Cells     Total  Pixels
1    1920 x 1080  30 x 30   900    64 x 36
2,3  2592 x 1944  36 x 36   1296   72 x 54
4    1296 x 972   27 x 27   729    36 x 48
5    1296 x 730   72 x 10   720    18 x 73
6,7   640 x 480   32 x 32   1024   20 x 15

v2 camera (IMX219)
Mode Resolution   Cells     Total  Pixels
1    1920 x 1080  30 x 30   900    64 x 36
2,3  3280 x 2464  40 x 22   880    82 x 112
4    1640 x 1232  40 x 22   880    41 x 56
5    1640 x 922   40 x 23   920    41 x 40.09 (see below)
6    1280 x 720   20 x 36   720    64 x 20
7     640 x 480   32 x 32   1024   20 x 15

HQ camera (IMX477)
Mode Resolution   Cells     Total  Pixels
1    2028 x 1080  26 x 36   936    78 x 30
2    2028 x 1520  26 x 38   988    78 x 40
3    4056 x 3040  26 x 32   832   156 x 95
4    1012 x 760   44 x 19   836    23 x 40

📌 The v2 1640 x 922 resolution (mode 5) does not have a useful vertical-axis divisor. 23 vertical cells yields a cell size of 41 x 40.09 pixels. This means each cell will ignore the right-most column of pixels in the cell.

The cells are also used to parallel-process the image frames, which is why they have similar total cell counts. Around 800 cells seems to be the optimal number for parallel processing on the Raspberry Pi. The specific cell counts are then chosen to divide evenly into the various image resolutions.