forked from xamarin/xamarin-macios
-
Notifications
You must be signed in to change notification settings - Fork 1
MetalPerformanceShaders macOS xcode9 beta1
Vincent Dondain edited this page Jun 5, 2017
·
1 revision
#MetalPerformanceShaders.framework
diff -ruN /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h
--- /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h 1969-12-31 19:00:00.000000000 -0500
+++ /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h 2017-05-20 02:10:25.000000000 -0400
@@ -0,0 +1,931 @@
+/*!
+ * @header MetalPerformanceShaders.h
+ * @framework MetalPerformanceShaders
+ *
+ * @copyright Copyright (c) 2015 Apple Inc. All rights reserved.
+ */
+
+#import <MPSCore/MPSCore.h>
+#import <MPSImage/MPSImage.h>
+#import <MPSMatrix/MPSMatrix.h>
+#import <MPSNeuralNetwork/MPSNeuralNetwork.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!
+ * MPSSupportsMTLDevice
+ * @abstract Determine whether a MetalPerformanceShaders.framework supports a MTLDevice.
+ * @discussion Use this function to determine whether a MTLDevice can be used with interfaces in MetalPerformanceShaders.framework.
+ * @param device A valid MTLDevice
+ * @return YES The device is supported.
+ * NO The device is not supported
+ */
+BOOL MPSSupportsMTLDevice( __nullable id <MTLDevice> device ) MPS_AVAILABLE_STARTING( macos(10.13), ios(9.0), tvos(9.0));
+
+
+
+//
+// These headers contain doxygen formatted documentation. They are human readable as is,
+// but can be processed as such to make something a bit nicer looking. Our version of
+// the doxygen PDF output may be found in MetalPerformanceShaders.framework/Documentation:
+//
+// open `xcrun --show-sdk-path -sdk iphoneos9.0`/System/Library/Frameworks/MetalPerformanceShaders.framework/Documentation/MetalPerformanceShaders.pdf
+//
+// Associated man pages are installed in the SDKROOT.
+//
+// man -M `xcrun --show-sdk-path -sdk iphoneos9.0`/usr/share/man MPSKernel
+//
+
+/*! @mainpage Metal Shaders - High Performance Kernels on Metal
+ * @section section_introduction Introduction
+ *
+ * MetalPerformanceShaders.framework is a framework of highly optimized compute and graphics shaders that are
+ * designed to integrate easily and efficiently into your Metal application. These data-parallel
+ * primitives are specially tuned to take advantage of the unique hardware characteristics of each
+ * iOS GPU to ensure optimal performance. Applications adopting MetalPerformanceShaders can be sure of achieving
+ * optimal performance without needing to update their own hand-written shaders for each new iOS GPU
+ * generation. MetalPerformanceShaders can be used along with the application's existing Metal resources (such as
+ * the MTLCommandBuffer, MTLBuffer and MTLTexture objects) and shaders.
+ *
+ * In iOS 9, MetalPerformanceShaders.framework provides a series of commonly-used image processing primitives for
+ * image effects on Metal textures.
+ *
+ * In iOS 10, MetalPerformanceShaders.framework adds support for the following kernels:
+ * - collection of kernels to implement and run neural networks using previously obtained training data, on the GPU
+ * - new image processing filters to perform color-conversion and for building a gaussian pyramid
+ *
+ * @subsection subsection_usingMPS Using MPS
+ * To use MPS:
+ * link: -framework MetalPerformanceShaders
+ * include: #include <MetalPerformanceShaders/MetalPerformanceShaders.h>
+ *
+ * Advisory: MetalPerformanceShaders features are broken up into many subheaders which are
+ * included by MetalPerformanceShaders.h. The exact placement of interfaces in
+ * headers is subject to change, as functionality in component sub-frameworks can
+ * move into MPSCore.framework when the functionality needs to be shared by
+ * multiple components when features are added. To avoid source level breakage,
+ * #include the top level MetalPerformanceShaders.h header instead of lower
+ * level headers. iOS 11 already broke source compatibility for lower level headers
+ * and future releases will probably do so again. The only supported method of
+ * including MPS symbols is the top level framework header.
+ *
+ * @section section_data Data containers
+ * @subsection subsection_metal_containers MTLTextures and MTLBuffers
+ *
+ * Most data operated on by Metal Performance Shaders must be in a portable data container appropriate
+ * for use on the GPU, such as a MTLTexture, MTLBuffer or MPSImage. The first two should be
+ * self-explanatory based on your previous experience with Metal.framework. MPS will use these
+ * directly when it can.
+ *
+ * Most MPSImage and MPSCNN filters operate only on floating-point or normalized texture formats.
+ * If your data is in a UInteger or Integer MTLPixelFormat (e.g. MTLPixelFormatR8Uint as opposed
+ * to MTLPixelFormatR8Unorm) then you may need to make a texture view of the texture to change
+ * the type using [(id <MTLTexture>) newTextureViewWithPixelFormat:(MTLPixelFormat)pixelFormat],
+ * to reinterpret the data to a normalized format of corresponding signedness and precision. In certain
+ * cases such as thresholding corresponding adjustments (e.g. /255) may have to also be made to
+ * parameters passed to the MPSKernel.
+ *
+ * @subsection subsection_mpsimage MPSImages
+ * Convolutional neural networking (CNN) filters may need more than the four data channels that a
+ * MTLTexture can provide. In these cases, the MPSImage is used instead as an abstraction
+ * layer on top of a MTLTexture. When more than 4 channels are needed, additional textures in the texture2d
+ * array are added to hold additional channels in sets of four. The MPSImage tracks this information as the
+ * number of "feature channels" in an image.
+ *
+ * @subsection subsection_mpstemporaryimage MPSTemporaryImages
+ * The MPSTemporaryImage (subclass of MPSImage) extends the MPSImage to provide advanced caching of
+ * unused memory to increase performance and reduce memory footprint. They are intended as fast
+ * GPU-only storage for intermediate image data needed only transiently within a single MTLCommandBuffer.
+ * They accelerate the common case of image data which is created only to be consumed and destroyed
+ * immediately by the next operation(s) in a MTLCommandBuffer. MPSTemporaryImages provide convenient and
+ * simple way to save memory by automatically aliasing other MPSTemporaryImages in the MTLCommandBuffer.
+ * Because they alias (share texel storage with) other textures in the same MTLCommandBuffer, the valid
+ * lifetime of the data in a MPSTeporaryImage is extremely short, limited to a portion of a MTLCommandBuffer.
+ * You can not read or write data to a MPSTemporaryImage using the CPU, or use the data in other MTLCommandBuffers.
+ * Use regular MPSImages for more persistent storage.
+ *
+ * @section section_discussion The MPSKernel
+ *
+ * The MPSKernel is the base class for all MPS kernels. It defines baseline behavior for all MPS
+ * kernels, declaring the device to run the kernel on, some debugging options and a user-friendly
+ * label, should one be required. From this are derived the MPSUnaryImageKernel and MPSBinaryImageKernel
+ * sub-classes which define shared behavior for most image processing kernels (filters) such as
+ * edging modes, clipping and tiling support for image operations that consume one or two source textures.
+ * Neither these or the MPSKernel are typically be used directly. They just provide API abstraction
+ * and in some cases may allow some level of polymorphic manipulation of MPS image kernel objects.
+ *
+ * Subclasses of the MPSUnaryImageKernel and MPSBinaryImageKernel provide specialized -init and -encode
+ * methods to encode various image processing primitives into your MTLCommandBuffer, and may also
+ * provide additional configurable properties on their own. Many such image filters are available:
+ * There are convolutions (generic, box, Sobel, and Gaussian) to do edge detection, sharpening and
+ * blurring, morphological operators -- Min, Max, Dilate and Erode -- and histogram operations.
+ * In addition, there are median, resampling filters and others. All of these run on the GPU directly
+ * on MTLTextures and MTLBuffers.
+ *
+ * As the MPSKernel/MPSUnaryImageKernel/MPSBinaryImageKernel classes serve to unify a diversity of image
+ * operations into a simple consistent interface and calling sequence to apply image filters,
+ * subclasses implement details that diverge from the norm. For example, some filters may take a small
+ * set of parameters (e.g. a convolution kernel) to govern how they function. However, the overall
+ * sequence for using MPSKernel subclasses remains the same:
+ *
+ * -# Allocate the usual Metal objects: MTLDevice, MTLCommandQueue, and MTLCommandBuffer
+ * to drive a Metal compute pipeline. If your application already uses Metal, chances are you
+ * have most of these things already. MPS will fit right in to this workflow. It can
+ * encode onto MTLCommandBuffers inline with your own workload.
+ *
+ * -# Create an appropriate MPSKernel object. For example, if you want to do a Gaussian blur, make
+ * a MPSImageGaussianBlur object. MPSKernel objects are generally light weight but can be reused
+ * to save some setup time. They can not be used by multiple threads concurrently, so if you
+ * are using Metal from many threads concurrently, make extra MPSKernel objects. MPSKernel objects
+ * conform to <NSCopying>.
+ *
+ * -# Call [MPSKernelSubclass encodeToCommandBuffer:...]. Parameters for other -encode... calls
+ * vary by filter type, but operate similarly. They create a MTLCommandEncoder, write commands to
+ * run the filter into the MTLCommandBuffer and then end the MTLCommandEncoder. This means
+ * you must call -endEncoding on your current MTLCommandEncoder before calling a MPSKernel encode
+ * method. You can at this point release the MPSKernel or keep it around to use again to save
+ * some setup cost.
+ *
+ * -# If you wish to encode futher commands of your own on the MTLCommandBuffer, you must
+ * create a new MTLCommandEncoder to do so.
+ *
+ * -# (Standard Metal) When you are done with the MTLCommandBuffer, submit it to the device using
+ * typical Metal commands, such as [MTLCommandBuffer commit]. The MPS filter will begin
+ * running on the GPU. You can either use [MTLCommandBuffer waitUntilCompleted] or
+ * [MTLCommandBuffer addCompletedHandler:] to be notified when the work is done.
+ *
+ * Each MPSKernel is allocated against a particular MTLDevice. A single filter may not be used with
+ * multiple MTLDevices. (You will need to make multiple MPSKernels for that.) This is necessary because
+ * the [MPSKernel initWithDevice:...] methods sometimes allocate MTLBuffers and MTLTextures to hold
+ * data passed in as parameters to the -init method and a MTLDevice is required to make those. MPSKernels
+ * provide a copy method that allow them to be copied for a new device.
+ *
+ * MPSKernel objects are not entirely thread safe. While they may be used in a multithreaded context,
+ * you should not attempt to have multiple MPSKernel objects writing to the same MTLCommandBuffer at
+ * the same time. They share restrictions with the MTLCommandEncoder in this regard. In limited
+ * circumstances, the same MPSKernel can be used to write to multiple MTLCommandBuffers concurrently.
+ * However, that only works if the MPSKernel is treated as an immutable object. That is, if MPSKernel
+ * subclass properties of a shared filter are changed, then the change can be reflected on the other
+ * thread while the other thread is encoding its work, leading to undefined behavior. It is generally
+ * safest to just make copies of MPSKernel objects, one for each thread.
+ *
+ * For more information, please see MPSTypes.h.
+ *
+ * @subsection subsection_properties MPS{Unary/Binary}ImageKernel properties
+ * The MPS{Unary/Binary}ImageKernel base classes define several properties common to all MPSKernels:
+ *
+ * @subsubsection subsubsection_clipRect MPSKernel clipRect
+ * The clipRect property, common to MPSKernel sublcasses that write to a destination texture, describes
+ * the sub-rectangle of the destination texture overwritten by the filter. If the clipRect is larger than
+ * the destination texture, the intersection between the clipRect and destination texture bounds will be used.
+ * The clipRect may be used to avoid doing work to obscured regions of the destination image, or to
+ * manage tiling and to limit operations to parts of an image if for example, the user drew a rectangle
+ * on the screen and asked you to just apply the filter there.
+ *
+ * extern MTLRegion MPSRectNoClip; //Pass this rectangle to fill the entire destination texture.
+ *
+ * @subsubsection subsubsection_MPSoffset MPSOffset
+ * The offset (or primaryOffset or secondaryOffset) property, common to MPSKernel subclasses that
+ * use a source texture from which pixel data is read, describes the positioning of the source image
+ * relative to the result texture. A offset of {0,0,0} indicates that the top left pixel of the source
+ * texture is the center pixel used to create the top left corner of the destination texture clipRect.
+ * An offset of {1,2,0} positions the top left corner of the clipRect at {x=1, y=2, z=0} of the source
+ * image. The offset is the position of the top left corner of the clipRect in the source coordinate
+ * frame. It can be used for tiling and for translating an image up/down or left/right by pixel increments.
+ * If there is no clipRect then the offset is the top left corner of the region read by the filter.
+ * If there are multiple source textures, then the primaryOffset describes the top left corner of the
+ * region read in the primary source texture. The secondaryOffset describes the top left corner of the
+ * region read in the secondary source texture, and so forth.
+ *
+ * @subsubsection subsubsection_edgemode MPSKernelEdgeMode
+ * The edgeMode (or primaryEdgeMode or secondaryEdgeMode)describes the behavior of texture reads that
+ * stray off the edge of the source image. This can happen if the offset is negative, meaning read
+ * off the top or left edge of the image. It can also happen if the clipRect.size + offset
+ * is larger than the source image, meaning read off the bottom and right of the image. It is
+ * also possible for filters that have a filter window that stretches to examine neighboring pixels,
+ * such as convolution, morphology and resampling filters. If there are multiple source textures,
+ * then the primaryEdgeMode describes the MPSKernelEdgeMode to use with primary source texture.
+ * The secondaryEdgeMode describes the MPSKernelEdgeMode to use with the secondary source texture,
+ * and so forth.
+ *
+ * typedef NS_ENUM(NSUInteger, MPSImageEdgeMode)
+ *
+ * MPSImageEdgeModeZero Out of bound pixels are (0,0,0,1) for image formats without
+ * alpha channel and (0,0,0,0) for image with pixel format with an
+ * alpha channel
+ *
+ * MPSImageEdgeModeClamp Out of bound pixels are clamped to nearest edge pixel
+ *
+ * @subsubsection subsubsection_options MPSKernelOptions
+ * Each MPSKernel takes a MPSKernelOptions bit mask to indicate various options to use when running the filter:
+ *
+ * typedef NS_OPTIONS(NSUInteger, MPSKernelOptions)
+ *
+ * MPSKernelOptionsNone Use default options
+ *
+ * MPSKernelOptionsSkipAPIValidation Do not spend time looking at parameters passed to MPS
+ * for errors.
+ *
+ * MPSKernelOptionsAllowReducedPrecision When possible, MPSKernels use a higher precision data representation internally than
+ * the destination storage format to avoid excessive accumulation of computational
+ * rounding error in the result. MPSKernelOptionsAllowReducedPrecision advises the
+ * MPSKernel that the destination storage format already has too much precision for
+ * what is ultimately required downstream, and the MPSKernel may use reduced precision
+ * internally when it feels that a less precise result would yield better performance.
+ * When enabled, the precision of the result may vary by hardware and operating system.
+ *
+ * @section subsection_availableFilters Available MPSKernels
+ *
+ * @subsection subsection_convolution Image Convolution
+ * @subsubsection subsubsection_kernel The Image Convolution Kernel
+ * The convolution filter is at its simplest the weighted average of a pixel with its nearest neighbors.
+ * The weights are provided by a convolution kernel. The number and position of the nearest neighbors
+ * that are considered are given by the size of the convolution kernel. For example, a convolution kernel
+ * might be the following 5x5 array of weights:
+ * @code
+ * 1 2 3 2 1
+ * 2 4 6 4 2
+ * 3 6 [9] 6 3
+ * 2 4 6 4 2
+ * 1 2 3 2 1
+ * @endcode
+ * In order to calculate this 5x5 convolution result, one would multiply all of the pixels in the
+ * image within (5/2=) 2 pixels of the desired pixel by its corresponding weight, add them up and divide by
+ * a divisor to renormalize the results. Then, repeat for all other pixels in the area you wish to convolve.
+ *
+ * For those MPS filters where the convolution kernel is passed in, you provide the kernel as a
+ * normalized float array. That is, the kernel weights have the divisor already divided into them and
+ * as a consequence should usually sum to 1.0. In our tent example above, the sum over the area of the kernel
+ * is 81, so one would normalize it as follows:
+ * @code
+ * 1/81 2/81 3/81 2/81 1/81
+ * 2/81 4/81 6/81 4/81 2/81
+ * 3/81 6/81 [9/81] 6/81 3/81
+ * 2/81 4/81 6/81 4/81 2/81
+ * 1/81 2/81 3/81 2/81 1/81
+ * @endcode
+ * It is not strictly necessary that the filter weights add up to 1.0f. Edge detection filters frequently add up
+ * to zero. You may decide to have the area under the filter be a bit bigger or smaller than 1.0 to increase
+ * or reduce the contrast in the result.
+ *
+ * The MxN kernel is passed in as a 1-dimensional data array in row major order.
+ *
+ * Some convolution filters also have a notion of a bias. This is a number to be added to the
+ * result before it is written out to result texture. A bias might be used to uniformly brighten an image,
+ * set a video range baseline (e.g. 0 might actually be encoded as 16/255) or to make negative signal
+ * representable on a unorm image.
+ *
+ * A unorm image is an image comprised of unsigned normalized samples. A typical 8-bit image (e.g.
+ * MTLPixelFormatRGBA8Unorm) is a unorm image. It has unsigned samples that represent values between
+ * [0,1]. In the case of MTLPixelFormatRGBA8Unorm, the encoding of 0 is 0, and the encoding of 1.0f
+ * is UINT8_MAX (255).
+ *
+ * @subsubsection subsubsection_box The Box, Tent and Gaussian Filters
+ * There are many different convolution kernel shapes which can produce different results. A kernel consisting
+ * of all 1's is called a Box filter. It is very quick to calculate and may run nearly as fast as a texture
+ * copy, even for very large blur radii. The blur effect that you get, however, can be square in appearance
+ * and may not be entirely appealing under close scrutiny. A second pass of the box will lead to a Tent kernel.
+ * (The 5x5 tent above can be reduced into two 3x3 Box filters.) Its appearance is more pleasing. Tent
+ * operations can be found in sample code for window shadows. Both Box and Tent filters are provided by
+ * MPS. Multiple passes of a box and/or tent filters will tend to converge towards a gaussian line shape
+ * and produce a smoother blur effect. MPS also provides a Gaussian blur, though it uses a different method.
+ *
+ * @subsubsection subsubsection_laplacian Laplacian and Unsharp Mask Filters
+ * One can in practice also subtract a blur from the image to produce a sharpening effect (unsharp mask). This
+ * is done by preparing a convolution kernel which is a scaled image less a blur to reduce the low frequency
+ * component of an image. This can reduce blur, but may also emphasize noise in the image. As an example, we
+ * can do identity minus a box blur:
+ * @code
+ * | 1 1 1 |
+ * k0 * [1] - | 1 1 1 | * k2
+ * | 1 1 1 |
+ * @endcode
+ * If we pick k0 = 9 and k2 = 1, so that the two kernels have equal weight, we arrive at:
+ * @code
+ * |-1 -1 -1 |
+ * |-1 8 -1 |
+ * |-1 -1 -1 |
+ * @endcode
+ * This is a Laplacian filter for calculating image gradients (including diagonals in this case).
+ *
+ * Caution: because this convolution kernel has negative regions, it can produce negative
+ * results as well as positive ones from ordinary image data. If you intend to store the
+ * result in a unorm texture, you'll need to scale it and add a positive value to it to avoid
+ * having the negative signal clamped off. (e.g. p' = 0.5*p+0.5).
+ *
+ * An unsharp mask filter is the sum between the original image and a scaled result of the Laplacian
+ * filter. The scaling factor (and filter size and shape) adjusts the nature of the low frequency
+ * signal and the degree to which it is removed. This work can usually be combined into the convolution
+ * kernel, to do the whole thing in one pass.
+ *
+ * @subsubsection subsubsection_sobel Sobel Edge detection
+ * Instead of integrating over an area, Convolution filters can also differentiate over an
+ * area by subtracting adjacent pixels. One such filter is the Sobel edge detection filter.
+ * It produces bright signal where there are large differences between one pixel and the next
+ * and black elsewhere:
+ * @code
+ * | -1 0 1 | | 1 2 1 |
+ * Gx=| -2 0 2 | Gy= | 0 0 0 |
+ * | -1 0 1 | |-1 -2 -1 |
+ *
+ *
+ * result = sqrt( Convolve(src, Gx) * Convolve(src * Gx) +
+ * Convolve(src, Gy) * Convolve(src * Gy))
+ * @endcode
+ *
+ * @subsubsection subsubsection_otherfilters Other Filters
+ * Other effects can be achieved as well, such as emboss:
+ * @code
+ * -1 0 0
+ * 0 0 0
+ * 0 0 1
+ * @endcode
+ * @subsubsection subsubsection_separability Separable Convolution
+ * Some convolution kernels are separable. That is, the filter weights can be factored into the product
+ * of two smaller sets of weights. As an example, the tent kernel shown above can be factored into a
+ * horizontal and vertical 1-dimensional kernel each containing [1 2 3 2 1]. In this way, what might otherwise
+ * have been a 5x5 convolution with 25 multiplies and 24 adds is instead a 5x1 and 1x5 convolution with
+ * a total of 10 multiplies and 8 adds and possibly some extra load/store traffic for the two-pass algorithm.
+ * The savings get bigger for bigger filter areas. MPS convolution filters will automatically separate
+ * kernels to run faster, when possible. Some filters with fixed kernels such as Box and Guassian are inherently
+ * separable. We attempt to factor the general convolution kernel into 2 1D kernels in the -initWithDevice:...
+ * method. If you want to factor it yourself, make two MPSImageConvolution objects with 1D kernels.
+ *
+ * @subsubsection subsubsection_convolveAvailability Convolutions in MPS
+ * Convolution filters provided by MPS include:
+ *
+ * MPSImageConvolution <MPSImage/MPSImageConvolution.h> General convolution
+ * MPSImageGassianBlur <MPSImage/MPSImageConvolution.h> Gaussian blur
+ * MPSImageBox <MPSImage/MPSImageConvolution.h> Box blur
+ * MPSImageTent <MPSImage/MPSImageConvolution.h> Tent blur
+ *
+ * @subsection subsection_morphology Morphology
+ * Morphological operators are similar to convolutions in that they find a result by looking at the nearest
+ * neighbors of each pixel in the image. Instead of calculating a weighted average, morphological operators
+ * scan the kernel area looking for the maximum or minimum pixel value. The MPSImageAreaMax and
+ * MPSImageAreaMin filters return the raw maximum and minimum color channel value in the kernel area for
+ * each pixel, respectively. The MPSImageDilate and MPSImageErode filters do the same thing, except that the probe
+ * shape need not be a rectangle, and instead can be nearly any shape you desire, such as a antialiased
+ * oval, star or heart shape.
+ *
+ * When applied, the max and dilate filters have the effect of adding their shape on to the periphery of
+ * bright objects in the image. A single bright pixel, such as might be found in a photograph of a starry
+ * night sky will become the shape of a probe -- a rectangle for max, and perhaps a 5-pointed star if
+ * that is the shape you chose for the dilate filter kernel. Larger objects will adopt more rectangular
+ * or star quality into their shape. (An oval or circular probe would round the corners of a rectangular
+ * object, for example.) The min and erode filters do similar things to the darker regions of the image.
+ *
+ * When a dilate filter is followed by an erode filter (or max followed by min) with similar filters, the
+ * effect is known as a close operator. Expanding bright areas only to erode them away again leaves most of
+ * the image in roughly the same shape as it started, but small dark areas that are completely removed by the
+ * dilate operator are not replaced by the erode. Dark noise may be removed. Small enclosed dark area may
+ * be completely filled in by bright signal. Similarly erode followed by dilate is an open operator. It
+ * will tend to remove bright fine detail and fill in small bright areas surrounded by dark lines.
+ *
+ * To make a MPS morphology filter with a text glyph draw black text on a white background. MPS
+ * morphology filters must have a center pixel with value 0.
+ *
+ *
+ * Morphology filters provided by MPS include:
+ *
+ * MPSImageAreaMax <MPSImage/MPSImageMorphology.h> Area Max
+ * MPSImageAreaMin <MPSImage/MPSImageMorphology.h> Area Min
+ * MPSImageDilate <MPSImage/MPSImageMorphology.h> Dilate
+ * MPSImageErode <MPSImage/MPSImageMorphology.h> Erode
+ *
+ * @subsection subsection_histogram Histogram
+ * A image may be examined by taking the histogram of its pixels. This gives the distribution of the various
+ * intensities per color channel. The MPSImageHistogram filter can be used to calculate a histogram for a MTLTexture.
+ *
+ * In some cases, as a result of image processing operations the very dark and light regions of the intensity
+ * spectrum can become unpopulated. Perhaps a photograph is underexposed or overexposed. The MPSImageHistogramEqualization
+ * filter will redistribute the intensities to a more uniform distribution, correcting such problems.
+ * The MPSImageHistogramSpecification class allows you to cause an image to conform to a different histogram.
+ *
+ *
+ * Histogram filters provided by MPS include:
+ *
+ * MPSImageHistogram <MPSImage/MPSImageHistogram.h> Calculate the histogram of an image
+ * MPSImageHistogramEqualization <MPSImage/MPSImageHistogram.h> Redistribute intensity in an image to equalize
+ * the histogram
+ * MPSImageHistogramSpecification <MPSImage/MPSImageHistogram.h> A generalized version of histogram equalization
+ * operation. Convert the image so that its histogram
+ * matches the desired histogram provided to the
+ * histogram specification filter.
+ *
+ * @subsection subsection_median Image Median
+ * Median filters find the median value in a region surrounding each pixel in the source image. It is frequently
+ * used to remove noise from the image, but may also be used to remove fine detail like a open filter. It is widely
+ * used in image processing because in many cases it can remove noise while at the same time preserving edges.
+ *
+ * Median filters provided by MPS include:
+ *
+ * MPSImageMedian <MPSImage/MPSImageMedian.h> Calculate the median of an image using a
+ * square filter window.
+ *
+ * @subsection subsection_resampling Image Resampling
+ * Resampling operations are used to convert one regular array of pixels to another regular array of pixels,
+ * typically along a different set of axes and/or using a different sampling period. Changing the sampling period
+ * will enlarge or reduce images and/or distort the aspect ratio. Change of axis results in rotations or arbitrary
+ * affine transforms.
+ *
+ * For most imaging work on the GPU, resampling can be quickly and simply done as part of another pass using a
+ * Euler matrices or quaternions to transform the coordinate space followed by linear filtering to interpolate the
+ * value found there. However, this can lead to somewhat muddy images and may result in loss of signal when
+ * downsampling by more than a factor of two unless a low pass filter is applied first. It is also prone to
+ * the development of Moire patterns in regions of the image with regularly repeating signal, such as a picture
+ * of a masonry grid on the side of a building.
+ *
+ * The MPS resampling routines use a higher quality (but more expensive) Lanczos resampling algorithm.
+ * Especially with photographic images, it will usually produce a much nicer result. It does not require a low pass
+ * filter be applied to the image before down sampling. However, some ringing can occur near high frequency regions
+ * of the image, making the algorithm less suitable for vector art.
+ *
+ * MetalPerformanceShaders.framework provides a MPSImageLanczosScale function to allow for simple resizing of images into the clipRect
+ * of the result image. It can operate with preservation of aspect ratio or not.
+ *
+ * MPSImageLanczosScale <MPSImage/MPSResample.h> Resize or adjust aspect ratio of an image.
+ *
+ * @subsection subsection_threshold Image Threshold
+ * Thresholding operations are commonly used to separate elements of image structure from the rest of an image.
+ * Generally, these operate by making some sort of simple comparison test, for example color_intensity > 0.5, and
+ * then writing out 0 or 1 (actual values configurable) depending on the truth or falsehood of the result. It is
+ * frequently used in computer vision, or to accentuate edge detection filters.
+ *
+ * A variety of thresholding operators are supported:
+ *
+ * MPSImageThresholdBinary <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? maxVal : 0
+ * MPSImageThresholdBinaryInverse <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? 0 : maxVal
+ * MPSImageThresholdTruncate <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? thresholdVal : srcPixel
+ * MPSImageThresholdToZero <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? srcPixel : 0
+ * MPSImageThresholdToZeroInverse <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? 0 : srcPixel
+ *
+ *
+ * @subsection subsection_math Math Filters
+ * Arithmetic filters take two source images, a primary source image and a secondary source image, as input and
+ * output a single destination image. The filters apply an element-wise arithmetic operator to each pixel in a primary source
+ * image and a corresponding pixel in a secondary source image over a specified region. The supported arithmetic operators
+ * are addition, subtraction, multiplication, and division.
+ *
+ * These filters take additional parameters: primaryScale, secondaryScale, and bias and apply them to the primary source
+ * pixel (x) and secondary source pixel (y) in the following way:
+ *
+ * MPSImageAdd <MPSImage/MPSImageMath.h> Element-wise addition operator: result = ((primaryScale * x) + (secondaryScale * y)) + bias
+ * MPSImageSubtract <MPSImage/MPSImageMath.h> Element-wise subtraction operator result = ((primaryScale * x) - (secondaryScale * y)) + bias
+ * MPSImageMultiply <MPSImage/MPSImageMath.h> Element-wise multiplication operator result = ((primaryScale * x) * (secondaryScale * y)) + bias
+ * MPSImageDivide <MPSImage/MPSImageMath.h> Element-wise division operator result = ((primaryScale * x) / (secondaryScale * y)) + bias
+ *
+ * These filters also take the following additional parameters: secondarySourceStrideInPixelsX and secondarySourceStrideInPixelsY.
+ * The default value of these parameters is 1. Setting both of these parameters to 0 results in the secondarySource image being
+ * handled as a single pixel.
+ *
+ * @subsection subsection_CNN Convolutional Neural Networks
+ * Convolutional Neural Networks (CNN) is a machine learning technique that attempts to model the visual cortex as a sequence
+ * of convolution, rectification, pooling and normalization steps. Several CNN filters commonly derived from the MPSCNNKernel
+ * base class are provided to help you implement these steps as efficiently as possible.
+ *
+
+ * MPSCNNNeuronLinear <MPSNeuralNetwork/MPSCNNConvolution.h> A linear neuron activation function
+ * MPSCNNNeuronReLU <MPSNeuralNetwork/MPSCNNConvolution.h> A neuron activation function with rectified linear units
+ * MPSCNNNeuronSigmoid <MPSNeuralNetwork/MPSCNNConvolution.h> A sigmoid neuron activation function 1/(1+e**-x)
+ * MPSCNNNeuronHardSigmoid <MPSNeuralNetwork/MPSCNNConvolution.h> A hard sigmoid neuron activation function clamp((a*x)+b, 0, 1)
+ * MPSCNNNeuronTanH <MPSNeuralNetwork/MPSCNNConvolution.h> A neuron activation function using hyperbolic tangent
+ * MPSCNNNeuronAbsolute <MPSNeuralNetwork/MPSCNNConvolution.h> An absolute neuron activation function |x|
+ * MPSCNNNeuronSoftPlus <MPSNeuralNetwork/MPSCNNConvolution.h> A parametric SoftPlus neuron activation function a*log(1+e**(b*x))
+ * MPSCNNNeuronSoftSign <MPSNeuralNetwork/MPSCNNConvolution.h> A SoftSign neuron activation function x/(1+|x|)
+ * MPSCNNNeuronELU <MPSNeuralNetwork/MPSCNNConvolution.h> A parametric ELU neuron activation function x<0 ? (a*(e**x-1)) : x
+ * MPSCNNConvolution <MPSNeuralNetwork/MPSCNNConvolution.h> A 4D convolution tensor
+ * MPSCNNFullyConnected <MPSNeuralNetwork/MPSCNNConvolution.h> A fully connected CNN layer
+ * MPSCNNPoolingMax <MPSNeuralNetwork/MPSCNNPooling.h> The maximum value in the pooling area
+ * MPSCNNPoolingAverage <MPSNeuralNetwork/MPSCNNPooling.h> The average value in the pooling area
+ * MPSCNNPoolingL2Norm <MPSNeuralNetwork/MPSCNNPooling.h> The L2-Norm value in the pooling area
+ * MPSCNNDilatedPoolingMax <MPSNeuralNetwork/MPSCNNPooling.h> The maximum value in the dilated pooling area
+ * MPSCNNSpatialNormalization <MPSNeuralNetwork/MPSCNNNormalization.h>
+ * MPSCNNCrossChannelNormalization <MPSNeuralNetwork/MPSCNNNormalization.h>
+ * MPSCNNSoftmax <MPSNeuralNetwork/MPSCNNSoftMax.h> exp(pixel(x,y,k))/sum(exp(pixel(x,y,0)) ... exp(pixel(x,y,N-1))
+ * MPSCNNLogSoftmax <MPSNeuralNetwork/MPSCNNSoftMax.h> pixel(x,y,k) - ln(sum(exp(pixel(x,y,0)) ... exp(pixel(x,y,N-1)))
+ * MPSCNNUpsamplingNearest <MPSNeuralNetwork/MPSCNNUpsampling.h> A nearest upsampling layer.
+ * MPSCNNUpsamplingBilinear <MPSNeuralNetwork/MPSCNNUpsampling.h> A bilinear upsampling layer.
+ *
+ * MPSCNNKernels operate on MPSImages. MPSImages are at their core MTLTextures. However, whereas
+ * MTLTextures commonly represent image or texel data, a MPSImage is a more abstract representation
+ * of image features. The channels within a MPSImage do not necessarily correspond to colors in a
+ * color space. (Though, they can.) As a result, there can be many more than four of them. 32 or 64 channels
+ * per pixel is not uncommon. This is achieved on the MTLTexture hardware abstraction by inserting
+ * extra RGBA pixels to handle the additional feature channels (if any) beyond 4. These extra pixels are
+ * stored as multiple slices of a 2D image array. Thus, each CNN pixel in a 32-channel image is represented
+ * as 8 array slices, with 4-channels stored per-pixel in each slice. The width and height of the MTLTexture
+ * is the same as the width and height of the MPSImage. The number of slices in the MTLTexture is given by
+ * the number of feature channels rounded up to a multiple of 4.
+ *
+ * MPSImages can be created from existing MTLTextures. They may also be created anew from a MPSImageDescriptor
+ * and backed with either standard texture memory, or as MPSTemporaryImages using memory drawn from MPS's
+ * internal cached texture backing store. MPSTemporaryImages can provide great memory usage and CPU time savings,
+ * but come with significant restrictions that should be understood before using them. For example, their contents
+ * are only valid during the GPU-side execution of a single MTLCommandBuffer and can not be read from or written to
+ * by the CPU. They are provided as an efficient way to hold CNN computations that are used immediately within the
+ * scope of the same MTLCommandBuffer and then discarded. We also support concatenation by allowing the user to
+ * define from which destination feature channel to start writing the output of the current layer. In this way
+ * the application can make a large MPSImage or MPSTemporaryImage and fill in parts of it with multiple layers
+ * (as long as the destination feature channel offset is a multiple of 4).
+ *
+ * Some CNN Tips:
+ * - Think carefully about the edge mode requested for pooling layers. The default is clamp to zero, but there
+ * are times when clamp to edge value may be better.
+ * - To avoid falling off the edge of an image for filters that have a filter area (convolution, pooling) set the
+ * MPSCNNKernel.offset = (MPSOffset){ .x = kernelWidth/2, .y = kernelHeight/2, .z = 0}; and reduce the size
+ * of the output image by {kernelWidth-1, kernelHeight-1,0}. The filter area stretcheds up and to the left
+ * of the MPSCNNKernel.offset by {kernelWidth/2, kernelHeight/2}. While consistent with other MPS imaging operations,
+ * this behavior is different from some other CNN implementations.
+ * - Please remember:
+ * MPSCNNConvolution takes weights in the order weight[outputChannels][kernelHeight][kernelWidth][inputChannels / groups]
+ * MPSCNNFullyConnected takes weights in the order weight[outputChannels][sourceWidth][sourceHeight][inputChannels]
+ * - Initialize MPSCNNKernels once and reuse them
+ * - You can use MPSCNNNeurons and other Filters in MPS to perform pre-processing of images, such as scaling and resizing.
+ * - Specify a neuron filter with MPSCNNConvolution descriptor to combine the convolution and neuron operations.
+ * - Use MPSTemporaryImages for intermediate images that live for a short period of time (less than one MTLCommandBuffer).
+ * MPSTemporaryImages can reduce the amount of memory used by the convolutional neural network by several fold, and
+ * similarly reduce the amount of CPU time spent allocating storage and latency between MTLCommandBuffer.commit
+ * and when the work actually starts on the GPU. MPSTemporaryImage are for short lived storage within the time
+ * period of the execution of a single MTLCommandBuffer. You can not read or write to a MPSTemporaryImage using the CPU.
+ * Generally, they should be created as needed and thrown away promptly. Persistent objects should not retain them.
+ * Please be sure to understand the use of the MPSTemporaryImage.readCount.
+ * - Because MPS encodes its work in place in your MTLCommandBuffer, you always have the option to insert your own
+ * code in between MPSCNNKernels as a Metal shader for tasks not covered by MPS. You need not use MPS for everything.
+ *
+ * @section section_validation MPS API validation
+ * MPS uses the same API validation layer that Metal uses to alert you to API mistakes while
+ * you are developing your code. While this option is turned on (Xcode: Edit Scheme: options: Metal API Validation),
+ * common programming errors will either trigger an assert or send a warning to the the debug log. Except
+ * in the case of serious errors, little or no spew should arrive in the console under standard usage.
+ * You can also try the MPSKernel.options parameter MPSKernelOptionsSkipAPIValidation to skip most of this checking.
+ * The flag may also lead to small reductions in CPU cost.
+ *
+ * Note: where APIs are tagged nonnull, MPS expects that the value is not NULL. The validation layer
+ * may do some checking and assert. If you turn that off, then undefined behavior is the result of
+ * passing nil, and your application will likely be terminated.
+ *
+ *
+ * @section section_usage How to Add MetalPerformanceShaders.framework to your project
+ *
+ * Xcode:
+ *
+ * 1. Click project file at left, then appropriate target, then select Build Phases.
+ * 2. Open the "Link Binary With Libraries" disclosure triangle
+ * 3. Click the [+] button in the "Link Binary With Libraries" view to add a new library
+ * 4. Select MetalPerformanceShaders.framework from the list.
+ * 5. Click the Add button.
+ *
+ * Command Line:
+ *
+ * clang -framework MetalPerformanceShaders file.c -o file.o
+ *
+ * @section section_support How to Determine if MPS Works on Your Device
+ * To test whether MPS works on your device, you may call MPSSupportsMTLDevice(id<MTLDevice>).
+ * It will return YES if the device is supported.
+ *
+ * @section section_inplace In Place Operation
+ * Some MPS filters can operate in place. In-place operation means that the
+ * same texture is used to hold both the input image and the result image. Operating
+ * in place is a great way to save memory, time and energy. You can use a MPS
+ * filter in place using [MPSKernel encodeToCommandBuffer:inPlaceTexture:copyAllocator:].
+ *
+ * Unfortunately, it is not always possible for MPS filters to run in place.
+ * Whether a particular MPSKernel can operate in place can vary according to the
+ * hardware it is running on, the operating system version and the parameters and
+ * properties passed to it. You may not assume that because a MPSKernel works in place
+ * today on a particular device that it will continue to do so in the future.
+ *
+ * To simplify error handling with failed in-place operation, [MPSKernel
+ * encodeToCommandBuffer:inPlaceTexture:fallbackCopyAllocator:] takes an optional MPSCopyAllocator
+ * parameter. It is used to create a new texture when in-place operation is not possible
+ * so as to allow the operation to proceed out of place in a reliable fashion instead.
+ * (When this happens the input texture is released and replaced with a new texture.)
+ * To make use of this feature, you will need to write a MPSCopyAllocator block.
+ *
+ * @subsection subsection_micopyallocator MPSCopyAllocator
+ * Some MPSKernel objects may not be able to operate in place. When that occurs, and in-place
+ * operation is requested, MPS will call back to this block to get a new texture
+ * to overwrite instead. To avoid spending long periods of time allocating pages to back the
+ * MTLTexture, the block should attempt to reuse textures. The texture returned from the
+ * MPSCopyAllocator will be returned instead of the sourceTexture from the MPSKernel method
+ * on return. Here is a minimal MPSCopyAllocator implementation:
+ * @code
+ * // A MPSCopyAllocator to handle cases where in-place operation fails.
+ * MPSCopyAllocator myAllocator = ^id <MTLTexture>( MPSKernel * __nonnull filter,
+ * __nonnull id <MTLCommandBuffer> cmdBuf,
+ * __nonnull id <MTLTexture> sourceTexture)
+ * {
+ * MTLPixelFormat format = sourceTexture.pixelFormat; // FIXME: is this format writable?
+ * MTLTextureDescriptor *d = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: format
+ * width: sourceTexture.width
+ * height: sourceTexture.height
+ * mipmapped: NO];
+ * d.usage = MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite;
+ *
+ * //FIXME: Allocating a new texture each time is slow. They take up to 1 ms each.
+ * // There are not too many milliseconds in a video frame! You can recycle
+ * // old textures (or MTLBuffers and make textures from them) and reuse
+ * // the memory here.
+ * id <MTLTexture> result = [cmdBuf.device newTextureWithDescriptor: d];
+ *
+ * // FIXME: If there is any metadata associated with sourceTexture such as colorspace
+ * // information, MTLResource.label, MTLResource.cpuCacheMode mode,
+ * // MTLResource.MTLPurgeableState, etc., it may need to be similarly associated
+ * // with the new texture, to avoid losing your metadata.
+ *
+ * // FIXME: If filter.clipRect doesn't cover the entire image, you may need to copy
+ * // pixels from sourceTexture to result or regions of result will be
+ * // uninitialized. You can make a MTLCommandEncoder to encode work on the
+ * // MTLCommandBuffer here to do that work, if necessary. It will be scheduled
+ * // to run immediately before the MPSKernel work. Do not call
+ * // [MTLCommandBuffer enqueue/commit/waitUntilCompleted/waitUntilScheduled]
+ * // in the MPSCopyAllocator block. Make sure to call -endEncoding on the
+ * // MTLCommandEncoder so that the MTLCommandBuffer has no active encoder
+ * // before returning.
+ *
+ * // CAUTION: The next command placed on the MTLCommandBuffer after the MPSCopyAllocator
+ * // returns is almost assuredly going to be encoded with a MTLComputeCommandEncoder.
+ * // Creating any other type of encoder in the MPSCopyAllocator will probably cost
+ * // an additional 0.5 ms of both CPU _AND_ GPU time (or more!) due to a double
+ * // mode switch penalty.
+ *
+ * return result;
+ * // d is autoreleased
+ * };
+ * @endcode
+ *
+ * filter A valid pointer to the MPSKernel that is calling the MPSCopyAllocator. From
+ * it you can get the clipRect of the intended operation.
+ * cmdBuf A valid MTLCommandBuffer. It can be used to obtain the device against
+ * which to allocate the new texture. You may also enqueue operations on
+ * the commandBuffer to initialize the texture. You may not submit, enqueue
+ * or wait for completion of the command buffer.
+ * sourceTexture The texture that is providing the source image for the filter. You may
+ * wish to copy its size and MTLPixelFormat for the new texture, but it is
+ * not requred.
+ *
+ * return A new valid MTLTexture to use as the destination for the MPSKernel.
+ * The format of the returned texture does not need to match sourceTexture.
+ *
+ * @section section_mpsnngraph The MPSNNGraph
+ * New for macOS 10.13, iOS/tvOS 11 is a higher level graph API, intended to simplify the creation of
+ * neural networks. The graph is a network of MPSNNFilterNodes, MPSNNImageNodes and MPSNNStateNodes.
+ * MPSNNImageNodes represent MPSImages or MPSTemporaryImages. MPSNNFilterNodes represent MPSCNNKernel
+ * objects -- each of the lower level MPSCNNKernel subclasses has a sister object that is a
+ * subclass of the MPSNNFilterNode. Finally, MPSStateNodes stand in for MPSState objects.
+ *
+ * MPSState objects are also new for macOS 10.13, iOS/tvOS 11. They stand in for bits of opaque state that
+ * need to be handed between filter nodes. For example, a MPSCNNConvolutionTranspose filter may need to
+ * know the original size of the filter passed into the corresponding MPSCNNConvolution node farther up the
+ * tree. There is a corresponding MPSCNNConvolutionState object that tracks this information. You will
+ * encounter state objects only infrequently. Most graphs are made up of images and filters.
+ *
+ * To represent a graph, one usually first creates a MPSNNImageNode. This represents the input image or
+ * tensor into the graph. Next one creates a the first filter node to process that input image. For example,
+ * we may make a MPSCNNNeuronLinearNode to normalize the data before the rest of the graph sees it. (y = 2x-1)
+ * Then, we can add our first convolution in the graph.
+ * @code
+ * // Graph: [Image] -> [Linear neuron filter] -> (result image) -> [convolution filter] -> (result image)...
+ * MPSNNImageNode *startNode = [MPSImageNode nodeWithHandle: nil];
+ * MPSCNNNeuronLinearNode *norm = [MPSCNNNeuronLinearNode nodeWithSource: startNode a: 2.0f b: -1.0f ];
+ * MPSCNNConvolutionNode *c = [MPSCNNConvolutionNode nodeWithSource: norm.resultImage
+ * weights: [[MyWeights alloc] initWithPath: "weights1.dat"]];
+ * ...
+ * @endcode
+ * There are some features to notice about each object. First of all, each image node can have a handle
+ * associated with it. The handle is your object that you write. It should conform to the <MPSHandle>
+ * protocol, which specifies that the object should have a label and conform to NSSecureCoding. (The MTLTexture
+ * does have a label property but doesn't conform to NSSecureCoding.) NSSecureCoding is used when you
+ * save the graph to disk using a NSCoder. It isn't used otherwise. You can use a MTLResource here if
+ * you don't plan to save the graph to disk. What is the handle for? When the MPSNNGraph object is
+ * constructed -- the MPSNNGraph takes the network of filter, state and image nodes and rolls it into
+ * an object that can actually encode work to a MTLCommandBuffer -- the graph object will traverse the
+ * graph from back to front and determine which image nodes are not produced by filters in the graph.
+ * These, it will inteprety to be graph input images. There may be input states too. When it does so,
+ * it will represent these image and state nodes as the handles you attach to them. Therefore, the handles
+ * probably should be objects of your own making that refer back to your own data structures that identify
+ * various images that you know about.
+ *
+ * Continuing on to the neuron filter, which we are using to just take the usual image [0,1] range and stretch
+ * to [-1,1] before the rest of the graph sees it, we see we can pass in the linear filter parameters when constructing
+ * it here. All filter nodes also produce a result image. This is used as the argument when constructing the
+ * convolution filter node next, to show that the product of the neuron filter is the input to the convolution
+ * filter.
+ *
+ * The convolution object constructor also takes a weights object. The weights object is an object that you write
+ * that conforms to the MPSCNNConvolutionDataSource protocol. MPS does not provide an object that conforms
+ * to this protocol, though you can see some examples in sample code that use this interface. The convolution
+ * data source object is designed to provide for deferred loading of convolution weights. Convolution weights
+ * can be large. In aggregate, the storage for all the weights in the MPSNNGraph, plus the storage for your
+ * copy of them might start to approach the storage limits of the machine for larger graphs. In order to lessen
+ * this impact, the convolution weights are unpacked for each convolution in turn and then purged from memory
+ * so that only the single MPSNNGraph copy remains. This happens when the MPSCNNConvolutionDataSource load
+ * and purge methods are called. You should not load the weights until -load is called. (You probably should
+ * however verify that the file, if any, is there and is well formed in the object -init method.) When -purge
+ * is called, you should release any bulky storage that the object owns and and make the object as light weight
+ * as is reasonable. The MPSCNNConvolutionDataSource.descriptor may include a neuron filter operation.
+ *
+ * Other object types should be straightforward.
+ *
+ * @section subsection_mpsnngraph_usage MPSNNGraph usage
+ * Once the network of MPSNNFilterNodes, MPSNNImageNodes and MPSNNStateNodes is created, the next
+ * step is to identify the MPSNNImageNode that contains the result of your graph -- typically, this
+ * is the last one you made -- and make a MPSNNGraph with it:
+ * @code
+ * MPSNNGraph *graph = [[MPSNNGraph alloc] initWithDevice: mtlDevice
+ * resultImage: resultImage];
+ * @endcode
+ * If graph creation fails, nil will be returned here. When it is constructed, the graph iterates over
+ * the network of nodes, starting at the result image and working backwards. Any MPSNNImageNodes and states
+ * that are used that are not created by a MPSNNFilterNode are interpreted to be graph inputs. The
+ * identity of these are given by the MPSNNGraph.sourceImageHandles and MPSNNGraph.sourceStateHandles.
+ * Each handle is your object that refers back to a particular image or state node. The order of the handles
+ * matches the order of the images or states that should be passed to the [MPSNNGraph encodeToCommandBuffer:...]
+ * call. Similarly, you can get the identity of any intermediate images that you requested to see (See
+ * MPSNNImageNode.exportFromGraph property) and the identity of any result MPSStates that are produced
+ * by the graph that are not used. The graph has a destinationImageAllocator that overrides the
+ * MPSNNImageNode.destinationImageAllocator. (see subsection MPSNNGraph intermediate image allocation)
+ * Typically, this serves to make a default temporary image into a normal image, as a convenience.
+ *
+ * When you are ready to encode a graph to a command buffer, the operation follows as per much of the
+ * rest of MPS.
+ * @code
+ * id <MTLDevice> mtlDevice = MTLCreateSystemDefaultDevice();
+ * id <MTLCommandQueue> mtlCommandQueue.commandBuffer = mtlDevice.newCommandQueue;
+ * id <MTLCommandBuffer> cmdBuf = mtlCommandQueue.commandBuffer;
+ * MPSImage *inputImage = [[MPSImage alloc] initWithDevice: mtlDevice imageDescriptor: myDescriptor];
+ * // put some data into the input image here. See MTLTexture.replaceBytes...
+ * MPSImage * result = [myGraph encodeToCommandBuffer: cmdBuf sourceImages: @[inputImage] ];
+ * [cmdBuf commit];
+ * [cmdBuf waitForCompletion];
+ * @endcode
+ * Obviously, if you have more work to do before or after the graph, it might be better to add it to the
+ * command buffer before committing it, rather than paying for an extra synchronization from
+ * [id <MTLCommandBuffer> waitForCompletion].
+ *
+ * @section subsection_mpsnngraph_sizing MPSNNGraph intermediate image sizing and centering
+ * The MPSNNGraph will automatically size and center the intermediate images that appear in the graph.
+ * However, different neural network frameworks do so differently. In addition, some filters may
+ * at times operate on only valid pixels in the source image, whereas others may "look beyond the
+ * edges" so as to keep the result image size the same as the input. Occasionally some filters will
+ * want to produce results for which any input is valid. Perhaps some want to behave in between. Torch
+ * has some particularly inventive edging policies for pooling that have valid invalid regions and
+ * invalid invalid regions beyond the edges of the image.
+ *
+ * Whatever the behavior, you will use the MPSNNFilter.paddingPolicy property to configure behavior.
+ * In its simplest form, a paddingPolicy is a object (possibly written by you, though MPS provides some)
+ * that conforms to the MPSNNPadding protocol. It should at minimum provide a padding method, which codes
+ * for common methods to size the result image, how to center it on the input image and where to place
+ * the remainder in cases where the image size isn't exactly divisible by the stride. This is a bitfield.
+ * You can use:
+ * @code
+ * [MPSNNDefaultPadding paddingWithMethod: MPSNNPaddingMethodAlign... | MPSNNPaddingMethodAddRemainderTo...
+ * MPSNNPaddingMethodSize... ];
+ * @endcode
+ * To quickly configure one of these. The filters also have a default padding policy, which may be
+ * appropriate most of the time.
+ *
+ * Occasionally, something fancy needs to be done. In that case, the padding policy should set the
+ * MPSNNPaddingMethodCustom bit and implement the optional destinationImageDescriptorForSourceImages:
+ * sourceStates:forKernel:suggestedDescriptor: method. The MPSNNGraph will use the MPSNNPadding.paddingMethod
+ * to generate an initial guess for the configuration of the MPSCNNKernel.offset and the size and formatting
+ * of the result image and hand that to you in the form of a MPSImageDescriptor. You can modify the descriptor
+ * or the kernel (also passed to you) in your custom destinationImageDescriptorForSourceImages:sourceStates:
+ * forKernel:suggestedDescriptor: method, or just ignore it and make a new descriptor.
+ *
+ * @section subsection_mpsnngraph_sizing MPSNNGraph intermediate image allocation
+ * Typically the graph will make MPSTemporaryImages for these, based on the MPSImageDescriptor obtained
+ * from the padding policy. Temporary images alias one another and can be used to save a lot of memory,
+ * in the same way that malloc saves memory in your application by allowing you to reserve memory for
+ * a time, use it, then free it for reuse for something else. Ideally, most of the storage in your graph
+ * should be temporary images.
+ *
+ * Because temporary images don't (shouldn't) last long, and can't be read by the CPU, some images
+ * probably can't be temporary. By default, the final image returned from the graph is not temporary.
+ * (See MPSNNGraph.destinationImageAllocator to adjust). Also, you may request that certain intermediate
+ * images be non-temporary so that you can access their contents from outside the graph using the
+ * MPSNNImageNode.exportFromGraph property.
+ *
+ * Temporary images often take up almost no additional memory. Regular images always do. Some large graphs will only
+ * be able to run using temporary memory, as regular images would overwhelm the machine. Even if you allocate
+ * all your images up front and reuse them over and over, you will still very likely use much more memory with
+ * regular images, than if you just allocate temporary images as needed. Because temporary images do not
+ * generally allocate large amounts of storage, they are much cheaper and faster to use.
+ *
+ * What kind of image is created after each filter node can be adjusted using the MPSNNImageNode.imageAllocator
+ * property. Two standard allocators are provided as MPSImage.defaultAllocator and MPSTemporaryImage.defaultAllocator.
+ * You may of course write your own. This might be necessary for example if you wish to maintain your own
+ * MTLHeap and allocate from it.
+ *
+ * @section subsection_mpsnngraph_debugging MPSNNGraph debugging tips
+ * In typical usage, some refinement, especially of padding policies, may be required to get the expected answer
+ * from MPS. If the result image is the wrong size, padding is typically the problem. When the answers are incorrect,
+ * the MPSCNNKernel.offset or other property may be incorrectly configured at some stage. As the graph is generated
+ * starting from an output image node, you may create other graphs starting at any image node within the graph.
+ * This will give you a view into the result produced from each intermediate layer with a minimum of fuss. In
+ * addition, the usual NSObject -debugDescription method is available to inspect objects to make sure they conform
+ * to expectation.
+ *
+ * Note that certain operations such as neuron filters that follow convolution filters and image concatenation
+ * may be optimized away by the MPSNNGraph when it is constructed. The convolution can do neuron operations as
+ * part of its operation. Concatenation is best done by writing the result of earlier filter passes in the right
+ * place using MPSCNNKernel.destinationFeatureChannelOffset rather than by adding an extra copy. Other optimizations
+ * may be added as framework capabilities improve.
+ *
+ * @section section_samplecode Sample Code
+ * @code
+ * #import <MetalPerformanceShaders/MetalPerformanceShaders.h>
+ *
+ * // Blur the input texture (in place if possible) on MTLCommandQueue q, and return the new texture.
+ * // This is a trivial example. It is not necessary or necessarily advised to enqueue a MPSKernel on
+ * // its own MTLCommandBuffer or using its own MTLComputeCommandEncoder. Group work together.
+ * //
+ * // Here we assume that you have already gotten a MTLDevice using MTLCreateSystemDefaultDevice() or
+ * // MTLCopyAllDevices(), used it to create a MTLCommandQueue with MTLDevice.newCommandQueue, and
+ * // similarly made textures with the device as needed.
+ * void MyBlurTextureInPlace( id <MTLTexture> __strong *inTexture, float blurRadius, id <MTLCommandQueue> q)
+ * {
+ * // Create "the usual Metal objects".
+ * // MPS does not need a dedicated MTLCommandBuffer or MTLComputeCommandEncoder.
+ * // This is a trivial example. You should reuse the MTL objects you already have, if you have them.
+ * id <MTLDevice> device = q.device;
+ * id <MTLCommandBuffer> buffer = [q commandBuffer];
+ *
+ * // Create a MPS filter.
+ * MPSImageGaussianBlur *blur = [[MPSImageGaussianBlur alloc] initWithDevice: device];
+ * if( nil == blur )
+ * MyHandleError(kOutOfMemory);
+ *
+ * // Set all MPSKernel properties to taste.
+ * blur.sigma = blurRadius;
+ * // defaults are okay here for other MPSKernel properties. (clipRect, origin, edgeMode)
+ *
+ * // Attempt to do the work in place. Since we provided a copyAllocator as an out-of-place
+ * // fallback, we don't need to check to see if it succeeded or not.
+ * [ blur encodeToCommandBuffer: commandBuffer
+ * inPlaceTexture: inTexture // may replace *inTexture
+ * copyAllocator: myAllocator ]; // See MPSCopyAllocator definition for a sample myAllocator
+ * [ blur release];
+ *
+ * // the usual metal enqueue process
+ * [buffer waitUntilCompleted]; // slow! Try enqueing more work on this or the next
+ * // command buffer instead of waiting every time.
+ *
+ * return result;
+ * }
+ *
+ * @endcode
+ *
+ * @section section_tuning MPS Tuning Hints
+ * MPS has been tuned for excellent performance across a diversity of devices
+ * and filter parameters. The tuning process focuses on minimizing both CPU and GPU
+ * latency for back to back calls on the same MTLCommmandBuffer. It is possible,
+ * however, to inadvertently undo this optimization effort by introducing costly
+ * operations into the pipeline around the MPS filter, leading to
+ * disappointing overall results.
+ *
+ * Here are some elements of good practice to avoid common pitfalls:
+ *
+ * -# Don't wait for results to complete before enqueuing more work.
+ * There can be a significant delay (up to 2.5 ms) just to get
+ * an empty MTLCommandBuffer through the pipeline to where
+ * [MTLCommandBuffer waitUntilCompleted] returns. Instead,
+ * start encoding the next command buffer(s) while you wait
+ * for the first one to complete. Enqueue them too, so they can
+ * start immediately after the previous one exits the GPU. Don't
+ * wait for the CPU kernel to notice the first command buffer
+ * is done and start taking it apart and eventually make a callback
+ * to userland before beginning work on encoding the next one.
+ * By allowing the CPU and GPU to work concurrently in this way,
+ * throughput can be enhanced by up to a factor of ten.
+ *
+ * -# There is a large cost to allocating buffers and textures.
+ * The cost can swamp the CPU, preventing you from keeping
+ * the GPU busy. Try to preallocate and reuse MTLResource
+ * objects as much as possible. The MPSTemporaryImage may be
+ * used instead for short-lived dynamic allocations.
+ *
+ * -# There is a cost to switching between render and compute
+ * encoders. Each time a new render encoder is used, there
+ * can be a substantial GPU mode switch cost that may
+ * undermine your throughput. To avoid the cost, try to
+ * batch compute work together. Since making a new MTLCommandBuffer
+ * forces you to make a new MTLCommandEncoder too, try to
+ * do more work with fewer MTLCommandBuffers.
+ *
+ * -# On currently available iOS devices we find that for some
+ * image operations, particularly those involving multiple passes -
+ * for example, if you are chaining multiple MPS image filters
+ * together — performance can be improved by up a factor of two
+ * by breaking the work into tiles about 512 kB in size. Use
+ * -sourceRegionForDestinationSize: to find the MPSRegion needed
+ * for each tile.
+ */
+
+#ifdef __cplusplus
+}
+#endif
+