Skip to content
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
e4446f4
Added ParallelHelper docs page
Sergio0694 Mar 9, 2020
5d7538a
Fixed a typo
Sergio0694 Mar 9, 2020
4e759c1
Minor tweaks
Sergio0694 Mar 9, 2020
3e7e5ed
Fixed displaying of '<' character
Sergio0694 Mar 12, 2020
9ec4947
Created high-performance section, moved ParallelHelper docs
Sergio0694 Mar 13, 2020
d7ffa6f
Added introduction for the package
Sergio0694 Mar 13, 2020
74ccfba
Improved ParallelHelper docs
Sergio0694 Mar 13, 2020
b4bae08
Added MemoryOwner<T> docs page
Sergio0694 Mar 13, 2020
ec15797
Removed SpanOwner<T> mention from MemoryOwner<T> page
Sergio0694 Mar 13, 2020
96d2342
Added SpanOwner<T> docs page
Sergio0694 Mar 13, 2020
c1df57c
Adjusted docs structure
Sergio0694 Mar 13, 2020
cbb6954
Added ByReference<T> docs page
Sergio0694 Mar 13, 2020
c8e6e87
Minor tweaks
Sergio0694 Mar 14, 2020
0b822c7
Fixed a typo
Sergio0694 Mar 18, 2020
422311d
Minor fixes
Sergio0694 Mar 18, 2020
4662ad1
Added equivalent C# sample for SpanOwner<T>
Sergio0694 Mar 18, 2020
989a52b
Fixed typo in ParallelHelper.md
Sergio0694 May 5, 2020
f335da9
Renamed ByReference<T> APIs to Ref<T>
Sergio0694 May 5, 2020
4d7114f
Added !NOTE and !WARNING blocks
Sergio0694 May 5, 2020
ea6727b
Fixed a typo
Sergio0694 May 5, 2020
1bb7d26
Added bullet list
Sergio0694 May 5, 2020
52a4752
Added link to ImageSharp
Sergio0694 May 5, 2020
df056cd
Updated introduction with multi-targeting info
Sergio0694 May 13, 2020
eb8e2e1
Minor tweaks
Sergio0694 May 13, 2020
2b2e8d1
Improved Ref<T> docs
Sergio0694 May 13, 2020
1a57734
Added MemoryOwner<T> sample and more info
Sergio0694 May 13, 2020
e0b7c06
SpanOwner<T> docs improved
Sergio0694 May 13, 2020
8a0d6b8
Improved code sample for MemoryOwner<T>
Sergio0694 May 14, 2020
4d21229
Fixed <T> display in toc
Sergio0694 May 21, 2020
aafd2ed
Added some suggested APIs to the introduction
Sergio0694 May 21, 2020
7418fd5
Fixed file paths
Sergio0694 May 21, 2020
34abbeb
Update docs/high-performance/ParallelHelper.md
michael-hawker Jun 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docs/high-performance/Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: Introduction to the High Performance package
author: Sergio0694
description: An overview of how to get started with High Performance package and to the APIs it contains
keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, get started, visual studio, high performance, net core, net standard
---

# Introduction to the High Performance package

This package can be installed through NuGet, and it multi-targets .NET Standard 2.0 and .NET Standard 2.1. This means that you can use it from both UWP apps, as well as modern .NET Core 3.0 applications. The API surface is almost identical in both cases, and lots of work has been put into backporting as many features as possible to .NET Standard 2.0 as well. Except for some minor differences, you can expect the same APIs to be available on both target frameworks.

Follow these steps to install the High Performance package:

1. Open an existing project in Visual studio, targeting any of the following:
- UWP (SDK >= 16299)
- .NET Standard (>= 2.0)
- .NET Core (>= 2.1)
- Any other framework supporting .NET Standard 2.0 and up

2. In Solution Explorer panel, right click on your project name and select **Manage NuGet Packages**. Search for **Microsoft.Toolkit.HighPerformance** and install it.

![NuGet Packages](../resources/images/ManageNugetPackages.png "Manage NuGet Packages Image")

3. Add a using directive in your C# files to use the new APIs:

```c#
using Microsoft.Toolkit.HighPerformance;
```

4. If you want so see some code samples, you can either read through the other docs pages for the High Performance package, or have a look at the various [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Helpers) for the project.

## When should I use this package?

As the name suggests, the High Performance package contains a set of APIs that are heavily focused on optimization. All the new APIs have been carefully crafted to achieve the best possible performance when using them, either through reduced memory allocation, micro-optimizations at the assembly level, or by structuring the APIs in a way that facilitates writing performance oriented code in general.

This package makes heavy use of APIs such as:
- [`System.Buffers.ArrayPool<T>`](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1)
- [`System.Runtime.CompilerServices.Unsafe`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.unsafe)
- [`System.Runtime.InteropServices.MemoryMarshal`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.memorymarshal)
- [`System.Threading.Tasks.Parallel`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel)

If you are already familiar with these APIs or even if you're just getting started with writing high performance code in C# and want a set of well tested helpers to use in your own projects, have a look at what's included in this package to see how you can use it in your own projects!

72 changes: 72 additions & 0 deletions docs/high-performance/MemoryOwner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: MemoryOwner&lt;T>
author: Sergio0694
description: A buffer type implementing `IMemoryOwner<T>` that rents memory from a shared pool
keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, parallel, high performance, net core, net standard
dev_langs:
- csharp
---

# MemoryOwner&lt;T>

The [MemoryOwner&lt;T>](https://docs.microsoft.com/dotnet/api/microsoft.toolkit.highperformance.buffers.memoryowner-1) is a buffer type implementing `IMemoryOwner<T>`, an embedded length property and a series of performance oriented APIs. It is essentially a lightweight wrapper around the `ArrayPool<T>` type, with some additional helper utilities.

## How it works

`MemoryOwner<T>` has the following main features:

- One of the main issues of arrays returned by the `ArrayPool<T>` APIs and of the `IMemoryOwner<T>` instances returned by the `MemoryPool<T>` APIs is that the size specified by the user is only being used as a _minum_ size: the actual size of the returned buffers might actually be greater. `MemoryOwner<T>` solves this by also storing the original requested size, so that `Memory<T>` and `Span<T>` instances retrieved from it will never need to be manually sliced.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be handy for each of these call outs to the BCL to have examples showing 'if you did this before, do this now...' and get XYZ benefit...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the sample for MemoryOwner<T> in 8a0d6b8, let me know what you think! 😊

- When using `IMemoryOwner<T>`, getting a `Span<T>` for the underlying buffer requires first to get a `Memory<T>` instance, and then a `Span<T>`. This is fairly expensive, and often unnecessary, as the intermediate `Memory<T>` might actually not be needed at all. `MemoryOwner<T>` instead has an additional `Span` property which is extremely lightweight, as it directly wraps the internal `T[]` array being rented from the pool.
- Buffers rented from the pool are not cleared by default, which means that if they were not cleared when being previous returned to the pool, they might contain garbage data. Normally, users are required to clear these rented buffers manually, which can be verbose especially when done frequently. `MemoryOwner<T>` has a more flexible approach to this, through the `Allocate(int, AllocationMode)` API. This method not only allocates a new instance of exactly the requested size, but can also be used to specify which allocation mode to use: either the same one as `ArrayPool<T>`, or one that automatically clears the rented buffer.
- There are cases where a buffer might be rented with a greater size than what is actually needed, and then resized afterwards. This would normally require users to rent a new buffer and copy the region of interest from the old buffer. Instead, `MemoryOwner<T>` exposes a `Slice(int, int)` API that simply return a new instance wrapping the specified area of interest. This allows to skip renting a new buffer and copying the items entirely.

## Syntax

Here is an example of how to rent a buffer and retrieve a `Memory<T>` instance:

```csharp
// Be sure to include this using at the top of the file:
using Microsoft.Toolkit.HighPerformance.Buffers;

using (MemoryOwner<int> buffer = MemoryOwner<int>.Allocate(42))
{
// Buffer has exactly 42 items
Memory<int> memory = buffer.Memory;
Span<int> span = buffer.Span;
}
```

In this example, we used a `using` block to declare the `MemoryOwner<T>` buffer: this is particularly useful as the underlying array will automatically be returned to the pool at the end of the block. If instead we don't have direct control over the lifetime of a `MemoryOwner<T>` instance, the buffer will simply be returned to the pool when the object is finalized by the garbage collector. In both cases, rented buffers will always be correctly returned to the shared pool.

## Properties

| Property | Return Type | Description |
| -- | -- | -- |
| Length | int | Gets the number of items in the current instance |
| Memory | System.Memory&lt;T> | Gets the memory belonging to this owner |
| Span | System.Span&lt;T> | Gets a span wrapping the memory belonging to the current instance |
| Empty | MemoryOwner&lt;T> | Gets an empty `MemoryOwner<T>` instance |

## Methods

| Method | Return Type | Description |
| -- | -- | -- |
| Allocate(int) | Memory&lt;T> | Creates a new `MemoryOwner<T>` instance with the specified parameters |
| Allocate(int, AllocationMode) | Memory&lt;T> | Creates a new `MemoryOwner<T>` instance with the specified parameters |
| DangerousGetReference() | ref T | Returns a reference to the first element within the current instance, with no bounds check |
| Slice(int, int) | MemoryOwner&lt;T> | Slices the buffer currently in use and returns a new `MemoryOwner<T>` instance |

## Sample Code

You can find more examples in our [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Buffers)

## Requirements

| Device family | Universal, 10.0.16299.0 or higher |
| --- | --- |
| Namespace | Microsoft.Toolkit.HighPerformance |
| NuGet package | [Microsoft.Toolkit.HighPerformance](https://www.nuget.org/packages/Microsoft.Toolkit.HighPerformance/) |

## API

* [MemoryOwner&lt;T> source code](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/Microsoft.Toolkit.HighPerformance/Buffers)
117 changes: 117 additions & 0 deletions docs/high-performance/ParallelHelper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: ParallelHelper
author: Sergio0694
description: Helpers to work with parallel code in a highly optimized manner
keywords: windows 10, uwp, windows community toolkit, uwp community toolkit, uwp toolkit, parallel, high performance, net core, net standard
dev_langs:
- csharp
---

# ParallelHelper

The [ParallelHelper](https://docs.microsoft.com/dotnet/api/microsoft.toolkit.highperformance.helpers.parallelhelper) contains high performance APIs to work with parallel code. It contains performance oriented methods that can be used to quickly setup and execute paralell operations over a given data set or iteration range or area.

## How it works

`ParallelHelper` is built around three main concepts:

- It performs automatic batching over the target iteration range. This means that it automatically schedules the right number of working units based on the number of available CPU cores. This is done to reduce the overhead of invoking the parallel callback once for every single parallel iteration.
- It heavily leverages the way generic types are implemented in C#, and uses `struct` types implementing specific interfaces instead of delegates like `Action<T>`. This is done so that the JIT compiler will be able to "see" each individual callback type being used, which allows it to inline the callback entirely, when possible. This can greatly reduce the overhead of each parallel iteration, especially when using very small callbacks, which would have a trivial cost with respect to the delegate invocation alone. Additionally, using a `struct` type as callback requires developers to manually handle variables that are being captured in the closure, which prevents accidental captures of the `this` pointer from instance methods and other values that could considerably slowdown each callback invocation. This is the same approach that is used in other performance-oriented libraries such as [`ImageSharp`](https://github.com/SixLabors/ImageSharp).
- It exposes 4 types of APIs that represent 4 different types of iterations: 1D and 2D loops, items iteration with side effect and items iteration without side effect. Each type of action has a corresponding `interface` type that needs to be applied to the `struct` callbacks being passed to the `ParallelHelper` APIs: these are `IAction`, `IAction2D`, `IRefAction<T>` and `IInAction<T>`. This helps developers to write code that is clearer regarding its intent, and allows the APIs to perform further optimizations internally.

## Syntax

Let's say we're interested in processing all the items in some `float[]` array, and to multiply each of them by `2`. In this case we don't need to capture any variables: we can just use the `IRefAction<T>` `interface` and `ParallelHelper` will load each item to feed to our callback automatically. All that's needed is to define our callback, that will receive a `ref float` argument and perform the necessary operation:

```csharp
// Be sure to include this using at the top of the file:
using Microsoft.Toolkit.HighPerformance.Helpers;

// First declare the struct callback
public readonly struct ByTwoMultiplier : IRefAction<float>
{
public void Invoke(ref float x) => x *= 2;
}

// Create an array and run the callback
float[] array = new float[10000];

ParallelHelper.ForEach<float, ByTwoMultiplier>(array);
```

With the `ForEach` API, we don't need to specify the iteration ranges: `ParallelHelper` will batch the collection and process each input item automatically. Furthermore, in this specific example we didn't even have to pass our `struct` as an argument: since it didn't contain any fields we needed to initialize, we could just specify its type as a type argument when invoking `ParallelHelper.ForEach`: that API will then create a new instance of that `struct` on its own, and use that to process the various items.

To introduce the concept of closures, suppose we want to multiply the array elements by a value that is specified at runtime. To do so, we need to "capture" that value in our callback `struct` type. We can do that like so:

```csharp
public readonly struct ItemsMultiplier : IRefAction<float>
{
private readonly float factor;

public ItemsMultiplier(float factor)
{
this.factor = factor;
}

public void Invoke(ref float x) => x *= this.factor;
}

// ...

ParallelHelper.ForEach(array, new ItemsMultiplier(3.14f));
```

We can see that the `struct` now contains a field that represents the factor we want to use to multiply elements, instead of using a constant. And when invoking `ForEach`, we're explicitly creating an instance of our callback type, with the factor we're interested in. Furthermore, in this case the C# compiler is also able to automatically recognize the type arguments we're using, so we can omit them together from the method invocation.

This approach of creating fields for values we need to access from a callback lets us explicitly declare what values we want to capture, which helps makes the code more expressive. This is exactly the same thing that the C# compiler does behind the scenes when we declare a lambda function or local function that accesses some local variable as well.

Here is another example, this time using the `For` API to initialize all the items of an array in parallel. Note how this time we're capturing the target array directly, and we're using the `IAction` `interface` for our callback, which gives our method the current parallel iteration index as argument:

```csharp
public readonly struct ArrayInitializer : IAction
{
private int[] array;

public ArrayInitializer(int[] array)
{
this.array = array;
}

public void Invoke(int i)
{
this.array[i] = i;
}
}

// ...

ParallelHelper.For(0, array.Length, new ArrayInitializer(array));
```

**NOTE:** since the callback types are `struct`-s, they're passed _by copy_ to each thread running parallel, not by reference. This means that value types being stored as fields in a callback types will be copied as well. A good practice to remember that detail and avoid errors is to mark the callback `struct` as `readonly`, so that the C# compiler will not let us modify the values of its fields. This only applies to _instance_ fields of a value type: if a callback `struct` has a `static` field of any type, or a reference field, then that value will correctly be shared between parallel threads.

## Methods

These are the 4 main APIs exposed by `ParallelHelper`, corresponding to the `IAction`, `IAction2D`, `IRefAction<T>` and `IInAction<T>` interfaces. The `ParallelHelper` type also exposes a number of overloads for these methods, that offer a number of ways to specify the iteration range(s), or the type of input callback.

| Method | Return Type | Description |
| -- | -- | -- |
| For&lt;TAction>(int, int, in TAction) | void | Executes a specified action in an optimized parallel loop |
| For2D&lt;TAction>(int, int, int, int, in TAction) | void | Executes a specified action in an optimized parallel loop |
| ForEach&lt;TItem,TAction>(Memory<TItem>, in TAction) | void | Executes a specified action in an optimized parallel loop over the input data |
| ForEach&lt;TItem,TAction>(ReadOnlyMemory<TItem>, in TAction) | void | Executes a specified action in an optimized parallel loop over the input data |

## Sample Code

You can find more examples in our [unit tests](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/UnitTests/UnitTests.HighPerformance.Shared/Helpers)

## Requirements

| Device family | Universal, 10.0.16299.0 or higher |
| --- | --- |
| Namespace | Microsoft.Toolkit.HighPerformance |
| NuGet package | [Microsoft.Toolkit.HighPerformance](https://www.nuget.org/packages/Microsoft.Toolkit.HighPerformance/) |

## API

* [ParallelHelper source code](https://github.com/Microsoft/WindowsCommunityToolkit//blob/master/Microsoft.Toolkit.HighPerformance/Helpers)
Loading