-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Type Check Inlining #7651
Comments
As already mentioned by @mikedn, CoreCLR already does this kind of optimization for value types and intentionally doesn't do it for reference types. So, what is actually your proposal? To stop sharing implementations of generic methods for reference types? The implementation does that for a reason (to avoid code explosion) and I doubt such change would happen, especially considering that what you're doing is rare and can be worked around easily (by using value types). To show that the optimization is done, consider this code: public interface IBool { }
public struct True : IBool { }
public struct False : IBool { }
[MethodImpl(MethodImplOptions.NoInlining)]
static bool IsTrue<T>() where T : struct, IBool => typeof(T) == typeof(True); Its disassembly shows it is optimized the way you want: ; IsTrue<True>
mov eax,1
ret
; IsTrue<False>
xor eax,eax
ret (This is on .Net Core 1.1, I guess I didn't encounter the bug mentioned by @mikedn for some reason?) Also, the same code could look somewhat better using the potential future feature Shapes dotnet/csharplang#164: shape SBool
{
bool IsTrue { get; }
}
struct True : SBool
{
bool IsTrue => true;
}
struct False : SBool
{
bool IsTrue => false;
}
static bool IsTrue<T>() where T : SBool => T.IsTrue; |
While it's true that in a shared generic method the exact type for shared parameters or locals is not known to the jit, there are still opportunities to optimize. There are various ways this can happen:
The challenge in depending on specialization via inlining is that the inliner is trying to balance several concerns -- code size, jit time, and performance impact -- and so may not be able to tell that a large method with lots of type-specialization opportunities will end up having only a small code size impact and big performance boost once inlined and optimized. These are more or less the same issues you'd struggle with trying to do this by hand, but the inliner can't be as certain as you can be about where the bets will pay off, so it tends to be conservative. We are working on improving the ability of the inliner to spot such opportunities but for now in perf sensitive cases you may need to use the aggressive inline attribute. Also there still some cases where methods can't be inlined at all (eg if the method has EH) or the idiomatic type check optimization doesn't kick in. Please let me know if you run into these. |
@svick The shape idea is very interesting. It seems to go in the direction of using types as objects with properties. For example this would be very useful in imposing compile-time constraints on object types. When working with 3D scenes, it's very important to have a camera which has a position and view direction. public class Camera
{
public Vector<Bool, Bool> position;
public Vector<True, Bool> lookDirection;
public Vector<True, True> upDirection;
}
public struct Vector<IsNormalized, IsAxisAligned>
where IsNormalized : Bool
where IsAxisAligned : Bool
{
public float X;
public float Y;
public float Z;
}
public static class VectorMethods
{
public static Vector<True, Bool> normalize(Vector<Bool, Bool> vector)
{
var len = length(vector);
return new Vector<True, Bool> { X = vector.X / len, Y = vector.Y / len, Z = vector.Z / len };
}
public static float length(Vector<Bool, Bool> vector)
{
return (float)Math.Sqrt(vector.X * vector.X + vector.Y * vector.Y + vector.Z * vector.Z);
}
} This way trying to apply a vector without first normalizing it will generate a compile time error, which is better than a runtime error. Of course we could just have 3 structs |
@AndyAyersMS what about inlining For example if we wrote the code above as: public class Effect
{
private static void Apply(Bitmap bmp, Func<Color, Color> pixelMapper)
{
// read bitmap data
int w = bmp.Width, h = bmp.Height;
var data = bmp.LockBits(new Rectangle(0, 0, w, h), ImageLockMode.ReadWrite, bmp.PixelFormat);
if (bmp.PixelFormat != PixelFormat.Format32bppArgb)
throw new InvalidOperationException($"Unsupported pixel format: {bmp.PixelFormat}");
var s = data.Stride;
unsafe
{
var ptr = (byte*)data.Scan0;
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
// read RGB (not quite optimized, but that's not the point)
int offset = y * s + x;
byte a = ptr[offset + 1];
byte r = ptr[offset + 1];
byte g = ptr[offset + 2];
byte b = ptr[offset + 3];
// apply effects per pixel
var mappedColor = pixelMapper(new Color(a, r, g, b));
// write RGB
ptr[offset + 0] = mappedColor.A;
ptr[offset + 1] = mappedColor.R;
ptr[offset + 2] = mappedColor.G;
ptr[offset + 3] = mappedColor.B;
}
}
}
bmp.UnlockBits(data);
}
} Will it inline the passed method? An image editing program might have dozens of such methods, and it would save a lot of code to be able to re-use this code and only pass the pixel-mapping code. Code duplication is very ok since the alternative would be to manually copy the code anyway. However since this is a performance-critical code it's extremely important to have some kind of guarantee that the code is inlined in all cases, or it will run very slowly at weird times. AggressiveInlining might help, but it sounds like it's not a guarantee, if it even works at all here. |
I don't think the JIT does anywhere near that level of dataflow analysis (to inline a delegate) and I'd be shocked if it ever did due to the throughput/perf impact. You could fallback to dynamic code generation to generate specialized code if you really needed to. |
Right now we can't do much with either |
Often speed-focused developers are forced to choose between proper code separation and code performance. This could be solved by a ForceInlining option on an Action or Func, I wonder how hard it would be to implement. |
At the "call sites" for |
@manixrock the struct trick will work as well here. You have a struct that implements an interface and you call it with the struct implementing the behavior you want. |
@manixrock commented on Sat Mar 18 2017
When writing performance-critical code it often leads to code duplication.
Let's say we wanted to make a method that applies an effect on an image, in our case we want to apply a gray-scale and an optional invert. The code could look like this:
However if we expect the invert to be only rarely used, that code is slower than it can be because of the constant
if (invert)
check inside the performance-critical inner loop. We could of course create another method that gets called wheninvert
is false, but that leads to code duplication, is harder to maintain, etc.What we would need to have both optimal performance and code reuse is a way to get the compiler to generate 2 methods at compile time depending on the value of
invert
. Without any new syntax the code might look like this:Now that check if a compile-time constant, so the compiler could remove the type-condition and its block away when
invert
isFalse
, and remove the type-condition but leave its block whenTrue
, leading to performance optimal code in both cases without code duplication.However does the compiler (or even the JIT) do that? According to this stackoverflow answer it currently does not.
This is a proposal to improve the compiler (or JIT) to do that sort of code inlining (through method duplication) for compile-time constant checks.
If this were implemented, we can optimize the code even further by doing the same with the
grayscaleMethod
parameter:Doing the same optimization through code duplication would require 6 methods, and the number would increase exponentially with the number of parameters. However the compiler would know to only generate the methods which are actually used in the code.
@mikedn commented on Sat Mar 18 2017
This is a runtime/compiler/language issue (coreclr/roslyn repositories), the framework (corefx) doesn't have much to do with this.
The .NET JIT usually does that if you use value types. As in:
When generic arguments are value types the JIT has to generate specialized code for each value type. That enables some optimization including recognizing that
typeof(invert) == typeof(True)
is always true wheninvert
=True
.Though there recently a bug was introduced that prevented this optimization from working. It's fixed now in the latest CoreCLR builds but it's still present in some .NET Framework builds (e.g. the one that comes with the current Win 10 Preview).
That's why when the code is shared between instantiations when reference types are used.
Well, if you call all variants it will still have to generate code for all of them. It is what it is, a trade off between code size and performance.
category:cq
theme:generics
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: