MicrosoftDocs · michael-hawker · Jun 5, 2020 · Mar 9, 2020 · Mar 9, 2020 · Mar 9, 2020
diff --git a/docs/high-performance/Introduction.md b/docs/high-performance/Introduction.md
@@ -0,0 +1,43 @@
+---
+title: Introduction to the High Performance package
+author: Sergio0694
+description: An overview of how to get started with High Performance package and to the APIs it contains
+keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, get started, visual studio, high performance, net core, net standard
+---
+
+# Introduction to the High Performance package
+
+This package can be installed through NuGet, and it multi-targets .NET Standard 2.0 and .NET Standard 2.1. This means that you can use it from both UWP apps, as well as modern .NET Core 3.0 applications. The API surface is almost identical in both cases, and lots of work has been put into backporting as many features as possible to .NET Standard 2.0 as well. Except for some minor differences, you can expect the same APIs to be available on both target frameworks.
+
+Follow these steps to install the High Performance package:
+
+1. Open an existing project in Visual studio, targeting any of the following:
+    - UWP (SDK >= 16299)
+    - .NET Standard (>= 2.0)
+    - .NET Core (>= 2.1)
+    - Any other framework supporting .NET Standard 2.0 and up
+
+2. In Solution Explorer panel, right click on your project name and select **Manage NuGet Packages**. Search for **Microsoft.Toolkit.HighPerformance** and install it.
+
+    ![NuGet Packages](../resources/images/ManageNugetPackages.png "Manage NuGet Packages Image")
+
+3. Add a using directive in your C# files to use the new APIs:
+
+    ```c#
+    using Microsoft.Toolkit.HighPerformance;
+    ```
+
+4. If you want so see some code samples, you can either read through the other docs pages for the High Performance package, or have a look at the various [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Helpers) for the project.
+
+## When should I use this package?
+
+As the name suggests, the High Performance package contains a set of APIs that are heavily focused on optimization. All the new APIs have been carefully crafted to achieve the best possible performance when using them, either through reduced memory allocation, micro-optimizations at the assembly level, or by structuring the APIs in a way that facilitates writing performance oriented code in general.
+
+This package makes heavy use of APIs such as:
+- [`System.Buffers.ArrayPool<T>`](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1)
+- [`System.Runtime.CompilerServices.Unsafe`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.unsafe)
+- [`System.Runtime.InteropServices.MemoryMarshal`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.memorymarshal)
+- [`System.Threading.Tasks.Parallel`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel)
+
+If you are already familiar with these APIs or even if you're just getting started with writing high performance code in C# and want a set of well tested helpers to use in your own projects, have a look at what's included in this package to see how you can use it in your own projects!
+
diff --git a/docs/high-performance/MemoryOwner.md b/docs/high-performance/MemoryOwner.md
@@ -0,0 +1,72 @@
+---
+title: MemoryOwner&lt;T>
+author: Sergio0694
+description: A buffer type implementing `IMemoryOwner<T>` that rents memory from a shared pool
+keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, parallel, high performance, net core, net standard
+dev_langs:
+  - csharp
+---
+
+# MemoryOwner&lt;T>
+
+The [MemoryOwner&lt;T>](https://docs.microsoft.com/dotnet/api/microsoft.toolkit.highperformance.buffers.memoryowner-1) is a buffer type implementing `IMemoryOwner<T>`, an embedded length property and a series of performance oriented APIs. It is essentially a lightweight wrapper around the `ArrayPool<T>` type, with some additional helper utilities.
+
+## How it works
+
+`MemoryOwner<T>` has the following main features:
+
+- One of the main issues of arrays returned by the `ArrayPool<T>` APIs and of the `IMemoryOwner<T>` instances returned by the `MemoryPool<T>` APIs is that the size specified by the user is only being used as a _minum_ size: the actual size of the returned buffers might actually be greater. `MemoryOwner<T>` solves this by also storing the original requested size, so that `Memory<T>` and `Span<T>` instances retrieved from it will never need to be manually sliced.
+- When using `IMemoryOwner<T>`, getting a `Span<T>` for the underlying buffer requires first to get a `Memory<T>` instance, and then a `Span<T>`. This is fairly expensive, and often unnecessary, as the intermediate `Memory<T>` might actually not be needed at all. `MemoryOwner<T>` instead has an additional `Span` property which is extremely lightweight, as it directly wraps the internal `T[]` array being rented from the pool.
+- Buffers rented from the pool are not cleared by default, which means that if they were not cleared when being previous returned to the pool, they might contain garbage data. Normally, users are required to clear these rented buffers manually, which can be verbose especially when done frequently. `MemoryOwner<T>` has a more flexible approach to this, through the `Allocate(int, AllocationMode)` API. This method not only allocates a new instance of exactly the requested size, but can also be used to specify which allocation mode to use: either the same one as `ArrayPool<T>`, or one that automatically clears the rented buffer.
+- There are cases where a buffer might be rented with a greater size than what is actually needed, and then resized afterwards. This would normally require users to rent a new buffer and copy the region of interest from the old buffer. Instead, `MemoryOwner<T>` exposes a `Slice(int, int)` API that simply return a new instance wrapping the specified area of interest. This allows to skip renting a new buffer and copying the items entirely.
+
+## Syntax
+
+Here is an example of how to rent a buffer and retrieve a `Memory<T>` instance:
+
+```csharp
+// Be sure to include this using at the top of the file:
+using Microsoft.Toolkit.HighPerformance.Buffers;
+
+using (MemoryOwner<int> buffer = MemoryOwner<int>.Allocate(42))
+{
+    // Buffer has exactly 42 items
+    Memory<int> memory = buffer.Memory;
+    Span<int> span = buffer.Span;
+}
+```
+
+In this example, we used a `using` block to declare the `MemoryOwner<T>` buffer: this is particularly useful as the underlying array will automatically be returned to the pool at the end of the block. If instead we don't have direct control over the lifetime of a `MemoryOwner<T>` instance, the buffer will simply be returned to the pool when the object is finalized by the garbage collector. In both cases, rented buffers will always be correctly returned to the shared pool.
+
+## Properties
+
+| Property | Return Type | Description |
+| -- | -- | -- |
+| Length | int | Gets the number of items in the current instance |
+| Memory | System.Memory&lt;T> | Gets the memory belonging to this owner |
+| Span | System.Span&lt;T> | Gets a span wrapping the memory belonging to the current instance |
+| Empty | MemoryOwner&lt;T> | Gets an empty `MemoryOwner<T>` instance |
+
+## Methods
+
+| Method | Return Type | Description |
+| -- | -- | -- |
+| Allocate(int) | Memory&lt;T> | Creates a new `MemoryOwner<T>` instance with the specified parameters |
+| Allocate(int, AllocationMode) | Memory&lt;T> | Creates a new `MemoryOwner<T>` instance with the specified parameters |
+| DangerousGetReference() | ref T | Returns a reference to the first element within the current instance, with no bounds check |
+| Slice(int, int) | MemoryOwner&lt;T> | Slices the buffer currently in use and returns a new `MemoryOwner<T>` instance |
+
+## Sample Code
+
+You can find more examples in our [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Buffers)
+
+## Requirements
+
+| Device family | Universal, 10.0.16299.0 or higher |
+| --- | --- |
+| Namespace | Microsoft.Toolkit.HighPerformance |
+| NuGet package | [Microsoft.Toolkit.HighPerformance](https://www.nuget.org/packages/Microsoft.Toolkit.HighPerformance/) |
+
+## API
+
+* [MemoryOwner&lt;T> source code](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/Microsoft.Toolkit.HighPerformance/Buffers)
diff --git a/docs/high-performance/ParallelHelper.md b/docs/high-performance/ParallelHelper.md
@@ -0,0 +1,117 @@
+---
+title: ParallelHelper
+author: Sergio0694
+description: Helpers to work with parallel code in a highly optimized manner
+keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, parallel, high performance, net core, net standard
+dev_langs:
+  - csharp
+---
+
+# ParallelHelper
+
+The [ParallelHelper](https://docs.microsoft.com/dotnet/api/microsoft.toolkit.highperformance.helpers.parallelhelper) contains high performance APIs to work with parallel code. It contains performance oriented methods that can be used to quickly setup and execute paralell operations over a given data set or iteration range or area.
+
+## How it works
+
+`ParallelHelper` is built around three main concepts:
+
+- It performs automatic batching over the target iteration range. This means that it automatically schedules the right number of working units based on the number of available CPU cores. This is done to reduce the overhead of invoking the parallel callback once for every single parallel iteration.
+- It heavily leverages the way generic types are implemented in C#, and uses `struct` types implementing specific interfaces instead of delegates like `Action<T>`. This is done so that the JIT compiler will be able to "see" each individual callback type being used, which allows it to inline the callback entirely, when possible. This can greatly reduce the overhead of each parallel iteration, especially when using very small callbacks, which would have a trivial cost with respect to the delegate invocation alone. Additionally, using a `struct` type as callback requires developers to manually handle variables that are being captured in the closure, which prevents accidental captures of the `this` pointer from instance methods and other values that could considerably slowdown each callback invocation. This is the same approach that is used in other performance-oriented libraries such as [`ImageSharp`](https://github.com/SixLabors/ImageSharp).
+- It exposes 4 types of APIs that represent 4 different types of iterations: 1D and 2D loops, items iteration with side effect and items iteration without side effect. Each type of action has a corresponding `interface` type that needs to be applied to the `struct` callbacks being passed to the `ParallelHelper` APIs: these are `IAction`, `IAction2D`, `IRefAction<T>` and `IInAction<T>`. This helps developers to write code that is clearer regarding its intent, and allows the APIs to perform further optimizations internally.
+
+## Syntax
+
+Let's say we're interested in processing all the items in some `float[]` array, and to multiply each of them by `2`. In this case we don't need to capture any variables: we can just use the `IRefAction<T>` `interface` and `ParallelHelper` will load each item to feed to our callback automatically. All that's needed is to define our callback, that will receive a `ref float` argument and perform the necessary operation:
+
+```csharp
+// Be sure to include this using at the top of the file:
+using Microsoft.Toolkit.HighPerformance.Helpers;
+
+// First declare the struct callback
+public readonly struct ByTwoMultiplier : IRefAction<float>
+{
+    public void Invoke(ref float x) => x *= 2;
+}
+
+// Create an array and run the callback
+float[] array = new float[10000];
+
+ParallelHelper.ForEach<float, ByTwoMultiplier>(array);
+```
+
+With the `ForEach` API, we don't need to specify the iteration ranges: `ParallelHelper` will batch the collection and process each input item automatically. Furthermore, in this specific example we didn't even have to pass our `struct` as an argument: since it didn't contain any fields we needed to initialize, we could just specify its type as a type argument when invoking `ParallelHelper.ForEach`: that API will then create a new instance of that `struct` on its own, and use that to process the various items.
+
+To introduce the concept of closures, suppose we want to multiply the array elements by a value that is specified at runtime. To do so, we need to "capture" that value in our callback `struct` type. We can do that like so:
+
+```csharp
+public readonly struct ItemsMultiplier : IRefAction<float>
+{
+    private readonly float factor;
+
+    public ItemsMultiplier(float factor)
+    {
+        this.factor = factor;
+    }
+
+    public void Invoke(ref float x) => x *= this.factor;
+}
+
+// ...
+
+ParallelHelper.ForEach(array, new ItemsMultiplier(3.14f));
+```
+
+We can see that the `struct` now contains a field that represents the factor we want to use to multiply elements, instead of using a constant. And when invoking `ForEach`, we're explicitly creating an instance of our callback type, with the factor we're interested in. Furthermore, in this case the C# compiler is also able to automatically recognize the type arguments we're using, so we can omit them together from the method invocation.
+
+This approach of creating fields for values we need to access from a callback lets us explicitly declare what values we want to capture, which helps makes the code more expressive. This is exactly the same thing that the C# compiler does behind the scenes when we declare a lambda function or local function that accesses some local variable as well.
+
+Here is another example, this time using the `For` API to initialize all the items of an array in parallel. Note how this time we're capturing the target array directly, and we're using the `IAction` `interface` for our callback, which gives our method the current parallel iteration index as argument:
+
+```csharp
+public readonly struct ArrayInitializer : IAction
+{
+    private int[] array;
+
+    public ArrayInitializer(int[] array)
+    {
+        this.array = array;
+    }
+
+    public void Invoke(int i)
+    {
+    	this.array[i] = i;
+    }
+}
+
+// ...
+
+ParallelHelper.For(0, array.Length, new ArrayInitializer(array));
+```
+
+**NOTE:** since the callback types are `struct`-s, they're passed _by copy_ to each thread running parallel, not by reference. This means that value types being stored as fields in a callback types will be copied as well. A good practice to remember that detail and avoid errors is to mark the callback `struct` as `readonly`, so that the C# compiler will not let us modify the values of its fields. This only applies to _instance_ fields of a value type: if a callback `struct` has a `static` field of any type, or a reference field, then that value will correctly be shared between parallel threads.
+
+## Methods
+
+These are the 4 main APIs exposed by `ParallelHelper`, corresponding to the `IAction`, `IAction2D`, `IRefAction<T>` and `IInAction<T>` interfaces. The `ParallelHelper` type also exposes a number of overloads for these methods, that offer a number of ways to specify the iteration range(s), or the type of input callback.
+
+| Method | Return Type | Description |
+| -- | -- | -- |
+| For&lt;TAction>(int, int, in TAction) | void | Executes a specified action in an optimized parallel loop |
+| For2D&lt;TAction>(int, int, int, int, in TAction) | void | Executes a specified action in an optimized parallel loop |
+| ForEach&lt;TItem,TAction>(Memory<TItem>, in TAction) | void | Executes a specified action in an optimized parallel loop over the input data |
+| ForEach&lt;TItem,TAction>(ReadOnlyMemory<TItem>, in TAction) | void | Executes a specified action in an optimized parallel loop over the input data |
+
+## Sample Code
+
+You can find more examples in our [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Helpers)
+
+## Requirements
+
+| Device family | Universal, 10.0.16299.0 or higher |
+| --- | --- |
+| Namespace | Microsoft.Toolkit.HighPerformance |
+| NuGet package | [Microsoft.Toolkit.HighPerformance](https://www.nuget.org/packages/Microsoft.Toolkit.HighPerformance/) |
+
+## API
+
+* [ParallelHelper source code](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/Microsoft.Toolkit.HighPerformance/Helpers)