CCTV and Motion Detection

⚠️ You are viewing the v0.7 Alpha documentation. If you are not cloning the latest code from this repository or using dev packages from MyGet, then you may wish to look at the v0.6 examples instead. This section is still being written for v0.7

Revisions

v0.7 Alpha
v0.6 (Current)

Motion Detection Basics

Frame differencing

Frame differencing is a common motion-detection technique whereby a test frame (sometimes called the "background frame") is compared against new frames (or "current frame") for changes exceeding various thresholds. The MMALSharp library has new APIs and classes that let you configure motion detection behavior, including callbacks to run custom code when motion is detected.

There are different strategies to detect differences between frames. The provided implementation combines two techniques which help reject sensor noise and small localized motion (such as an insect, or even a small pet).

At the most basic level, the algorithm compares individual pixels. This is called "RGB summing" because the red, green, and blue values are added together for each pixel in both images. If the difference between the test frame and the new frame exceeds a threshold, the pixel is considered changed. The image is subdivided into a grid of smaller rectangles called cells. The size of each cell and the number of pixels in the cell depends on the image resolution. There is a second threshold which defines the percentage of pixels in the cell which must change for the entire cell to be considered changed. This is how sensor noise and other minor changes are discarded. Finally, there is a third threshold, which is the number of cells across the entire image that must register changes in order to signal that motion detection has occurred. This is how real but small and unimportant motion is ignored (insects, pets, and distant background movement, for example). All of these thresholds are configurable.

Typically motion detection doesn't require or benefit from high resolution. 640 x 480 should be adequate, although you should always feed raw RGB24, RGB32, or RGBA images into the system. Image artifacts from lossy compression algorithms like h.264 will be mistaken for motion and the RGB summing algorithm is not compatible with the YUV pixel format. At 640 x 480 x RGB24, a Raspberry Pi 4B can easily process full-motion video using the provided algorithms (an improvement over v0.6 which could only process about 5 frames per second on the same hardware).

The new FrameBufferCaptureHandler class provides management and control of motion detection. The following example demonstrates the most basic possible motion detection. This does nothing but write messages to the console when motion is detected. Later we'll see more complete examples that capture video and snapshots.

public async Task SimpleMotionDetection(int totalSeconds)
{
    // Assumes the camera has been configured.
    var cam = MMALCamera.Instance;

    using (var motionCaptureHandler = new FrameBufferCaptureHandler())
    using (var resizer = new MMALIspComponent())
    {
        // The ISP resizer is used to output a small (640x480) image to ensure high performance. As described in the
        // wiki, frame difference motion detection only works reliably on uncompressed, unencoded raw RGB data. The
        // resizer outputs this raw frame data directly into the motion detection handler.
        resizer.ConfigureInputPort(new MMALPortConfig(MMALEncoding.OPAQUE, MMALEncoding.I420), cam.Camera.VideoPort, null);
        resizer.ConfigureOutputPort<VideoPort>(0, new MMALPortConfig(MMALEncoding.RGB24, MMALEncoding.RGB24, width: 640, height: 480), motionCaptureHandler);

        cam.Camera.VideoPort.ConnectTo(resizer);

        // Camera warm-up.
        await Task.Delay(2000);

        // We'll use the default settings for this example.
        var motionConfig = new MotionConfig(algorithm: new MotionAlgorithmRGBDiff());

        // Duration of the motion-detection operation.
        var stoppingToken = new CancellationTokenSource(TimeSpan.FromSeconds(totalSeconds));
        Console.WriteLine($"Detecting motion for {totalSeconds} seconds.");

        await cam.WithMotionDetection(
            motionCaptureHandler,
            motionConfig,
            // This callback will be invoked when motion has been detected.
            async () =>
            {
                // When motion is detected, temporarily disable notifications
                motionCaptureHandler.DisableMotionDetection();
                        
                // Wait 2 seconds
                Console.WriteLine($"\n     {DateTime.Now:hh\\:mm\\:ss} Motion detected, disabling detection for 2 seconds.");
                await Task.Delay(2000, stoppingToken.Token);

                // Re-enable motion detection
                if(!stoppingToken.IsCancellationRequested)
                {
                    Console.WriteLine($"     {DateTime.Now:hh\\:mm\\:ss} ...motion detection re-enabled.");
                    motionCaptureHandler.EnableMotionDetection();
                }
            })
            .ProcessAsync(cam.Camera.VideoPort, stoppingToken.Token);
    }

    cam.Cleanup();
}

The WithMotionDetection method configures the camera processing loop for motion detection by identifying the FrameBufferCaptureHandler responsible for motion detection, the MotionConfig defining the applicable settings, and an asynchronous callback which is invoked when motion is detected.

IMPORTANT: It is your responsibility to ensure all exceptions are handled inside your callback.

The callback is an event handler, which means it is an async void delegate. Event handlers are the only scenario in .NET applications where async void is an acceptable method signature (versus the common async Task signature). Since it returns void instead of Task, there is no enclosing method which can intercept an exception, and unhandled exceptions will immediately terminate the process.

Configuration: Motion mask

Motion detection commonly requires ignoring areas of the camera view where real or apparent motion may occur that is not of interest. The library allows you to configure a mask bitmap to define areas to be ignored.

Masking is especially useful for outdoor scenes where "background" motion like trees, clouds, or passing vehicular traffic may trigger unwanted events. Masking can also be helpful indoors where changes like reflections in a picture frame, movement on a television screen, or even blinking LEDs on electronic devices may be mistaken as motion.

The mask bitmap must be the same size and color depth as the motion detection frames, and the file format should be either BMP or PNG format. The library can also load a JPG mask file, but this is not recommended as compression artifacts may produce inaccuracies.

Fully-black pixels in the mask will be ignored -- they will always be treated as if no motion has occurred. Thus, the easiest way to create a mask is to capture a still picture (for example, using the raspistill utility with a -e BMP or -e PNG encoding switch) and load that into any image editor to blank out the unwanted regions.

The mask is specified as an optional pathname argument to the MotionConfig constructor:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(),
    maskBitmap: "/home/pi/images/motionmask.bmp"
);

An exception will be thrown if the mask cannot be found, or if the resolution or color-depth does not match the motion detection image configuration (in these examples, that is always 640 x 480 x RGB24).

Testing has not shown any discernable changes to performance when a mask is used.

Configuration: Test frames

Motion detection based on frame differencing algorithms compares a test frame to newly received frames. The first full frame captured by the camera is stored as the test frame. To help compensate for gradual changes in the scene most commonly caused by lighting changes and shadows, the library is able to periodically update the test frame with a new image.

These are values you will likely need to tune for the specific scene your camera is viewing. Although you can adjust this through trial and error, it may be easier to view the algorithm output in real-time. Refer to the streaming visualisation topic later in this area of the documentation.

Two optional arguments to the MotionConfig constructor controls how this works. Both values default to 3 seconds:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(),
    testFrameInterval: TimeSpan.FromSeconds(3),
    testFrameCooldown: TimeSpan.FromSeconds(3)
);

The testFrameInterval defines how often the test frame is updated, and testFrameCooldown defines how long the scene must be "quiet" (no motion detected) before a test frame is updated. The cooldown period is checked after the interval passes, so the default values of 3 seconds means it will actually update every 6 seconds at a minimum, and possibly longer if there is ongoing motion.

Note that the cooldown is relative to triggered motion. If the scene contains minor motion that was not sufficient to trigger a motion detection event, it's possible that the new test frame will capture a moving object. If you see this happening, simply increase the intervals, the default intervals are somewhat aggressively short.

Configuration: Sensitivity

The library supports different motion detection algorithms, but currently only one algorithm is built in -- RGB summing (also called RGB differencing). While the core motion detection system is based on frame differencing, RGB differencing is based on changes at the pixel level. Because camera image sensors are naturally "noisy", and also to help reject other sources of minor, uninteresting motion, the algorithm also requires larger-scale changes at the "cell" level. Cells are an arbitrarily-sized grid applied to the image data.

The MotionConfig constructor requires a motionAlgorithm object, and the built-in MotionAlgorithmRGBDiff constructor accepts three optional arguments to control sensitivity:

var motionConfig = new MotionConfig(
    algorithm: new MotionAlgorithmRGBDiff(
        rgbThreshold: 200,
        cellPixelPercentage: 50,
        cellCountThreshold: 20
));

The settings shown above are the defaults.

The rgbThreshold setting controls change-detection sensitivity at the individual pixel level. The maximum value is 255 + 255 + 255 which is 765. Since the per-pixel RGB difference algorithm compares test frame pixels to new frame pixels, a value of 765 would only indicate a change when a fully-black pixel (RGB 0,0,0) switched to full-white (RGB 255,255,255) or vice-versa, so clearly much lower values are more useful. This sensitivity setting helps reject minor lighting changes and the like.

Each image frame is subdivided into a grid of "cells" based on the image resolution. The library automatically selects the grid size. The recommended resolution for motion detection is 640 x 480 which uses a 32 x 32 grid for a total of 1024 cells. This means each cell represents 20 x 15 pixels, or 300 pixels.

Each cell tracks the number of pixels that changed within the cell. When that count reaches the cellPixelPercentage value, the entire cell is considered to have changed. If the count is below that percentage, and the cell is considered unchanged, even if some pixels within the cell have changed. So given a 640 x 480 image using 300-pixel-count cells, the default 50% threshold means 150 pixels or more must change (exceed the rgbThreshold) within that cell to trigger a change for the entire cell. This setting helps reject very small sources of motion such as insects or a falling leaf.

Finally, motion detection events are triggered by the total number of cells which have changed using the two processes described above. The cellCountThreshold defines the minimum number of cells across the entire image that must change before the motion detection callback is invoked. This helps reject somewhat larger sources of motion such as small pets or even a television screen within view (although that's more easily ignored with a mask bitmap).

These are values you may want to tune for the types of motion you wish to detect. Although you can adjust this through trial and error, it may be easier to view the algorithm output in real-time. Refer to the streaming visualisation topic later in this area of the documentation.