Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose general purpose Crc32 APIs #2036

Closed
tannergooding opened this issue Jan 22, 2020 · 49 comments · Fixed by #61558
Closed

Expose general purpose Crc32 APIs #2036

tannergooding opened this issue Jan 22, 2020 · 49 comments · Fixed by #61558
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics Cost:S Work that requires one engineer up to 1 week help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@tannergooding
Copy link
Member

Rationale

Computing a Cyclic Redundancy Check (CRC) is a common algorithm used for error detection in things like networks or storage. Additionally, modern hardware provides instruction level support for computing these values. It would be beneficial if we exposed a set of general-purpose methods which allow iterative CRC32 computation.

Proposed API

namespace System.Numerics
{
    public static class BitOperations
    {
        public static uint Crc32(uint crc, byte data);
        public static uint Crc32(uint crc, ushort data);
        public static uint Crc32(uint crc, uint data);
        public static uint Crc32(uint crc, ulong data);
    }
}

Open Questions

The output is generally treated as unsigned, however .NET considers unsigned integers (other than byte) as non-CLS compliant. It would be possible to expose both versions (those that take/return int and those that take/return uint). It would likewise be good on the input data to determine if they should be signed, unsigned, or both.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 22, 2020
@tannergooding tannergooding added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics labels Jan 22, 2020
@Wraith2
Copy link
Contributor

Wraith2 commented Jan 22, 2020

Is this using a specific polynomial? and if so which one? and can there be a way for the user to choose?

@tannergooding
Copy link
Member Author

can there be a way for the user to choose?

Not if we want these accelerated by the underlying hardware.

  • X86 uses: 0x1EDC6F41
  • ARM uses: 0x04C11DB7 or 0x1EDC6F41 (depending on the instruction)

@Wraith2
Copy link
Contributor

Wraith2 commented Jan 23, 2020

So if someone is dealing with zip or something using the old polynomial they'll still need to make their own. That's a bit of a shame.

Given that at least half of the time you want to calculate a crc you will be validating incoming data it might make sense to have a ReadOnlySpan<byte> overload (that works on a fixed length, leave looping to users).

@GrabYourPitchforks
Copy link
Member

If we were to expose a CRC32 API over a buffer rather than an integral type I'd probably push for it not to be on the BitOperations class.

@Wraith2
Copy link
Contributor

Wraith2 commented Jan 24, 2020

If BitOperations and these methods only exist to provide access to the hardware intrinsics I agree.
It would be useful to have a CRC32HashAlgorithm available which uses these operations if they are available.

@tannergooding
Copy link
Member Author

If BitOperations and these methods only exist to provide access to the hardware intrinsics I agree

Right, this is namely meant to be a "cross-platform" version of the hardware intrinsics (much like the other functions on System.Numerics.BitOperations) and should allow users to easily get:

if (X86.Sse42.IsSupported) { }
else if (Arm.Crc32.IsSupported) { }
else { /* software fallback */ }

I agree that having a more general purpose class that allows polynomial customization and trivial operation over spans of data would be useful, but that isn't the immediate goal of this API.

@tannergooding tannergooding added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Feb 11, 2020
@tannergooding tannergooding added this to the Future milestone Jun 23, 2020
@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020
@danmoseley
Copy link
Member

If I understand right, given the caller can't choose the polynomial, and we will use the fixed CPU specific algorithm if available, this API would only be useful for callers that are (a) not interoperating with some other format that has a chosen polynomial (such as zlib), and (b) do not need to exchange the value across architectures. Is that right?

@tannergooding
Copy link
Member Author

We likely could provide both and we might even be able to expose the API in such a way that the JIT specially handles the method if the polynomial given is constant.

But as proposed, it would be limited to the given polynomials.

@danmoseley
Copy link
Member

I guess I'm wondering whether there's examples when this API with the "undefined" polynomial would be useful. But maybe that's not that different to asking when the intrinsics we already have would be useful. It would have to be when you're producing/consuming on the same machine. (Unless I misunderstand)

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 6, 2020
@terrajobst
Copy link
Member

terrajobst commented Aug 6, 2020

Video

  • Looks good as proposed
  • We can add overloads with custom polynomials later, but we should document the ones that we're using here.
namespace System.Numerics
{
    public static class BitOperations
    {
        public static uint Crc32(uint crc, byte data);
        public static uint Crc32(uint crc, ushort data);
        public static uint Crc32(uint crc, uint data);
        public static uint Crc32(uint crc, ulong data);
    }
}

@danmoseley
Copy link
Member

danmoseley commented Aug 6, 2020

I watched the video and I still need clarification on my questions I think 😄

  1. How will we choose the polynomial?
  2. Whatever we pick, it it will only be hardware accelerated on x64/x86 OR ARMxx, or neither, but not both, right? But it will give the same results on all CPU/platforms?
  3. Are we targeting a specific standard (eg., zlib, or some other RFC or file format) or is this primarily for when I don't care about the polynomial because I'm both calculating the CRC and verifying the CRC in my own app?

Also, it would be good to have an example of usage, especially since this API is expected to be used to checksum an array or stream. How does that look?

@tannergooding
Copy link
Member Author

The polynomial is 0x1EDC6F41, it is accelerated on both x86/x64 and ARM64 (I'm not sure about ARM32). ARM64 has an additional polynomial that it can also accelerate, 0x04C11DB7, which would require something like Crc32(uint crc, byte data, uint polynomial) to support. The JIT could then optimize it when polynomial is a constant of that value.

The polynomial being used is for CRC32-Castagnoli, which is fairly popular/prevalent and adopted in several standards; but which is not universaly used.

@tannergooding
Copy link
Member Author

In particular, wikipedia lists Castagnoli as used by iSCSI, SCTP, G.hn payload, SSE4.2, Btrfs, ext4, Ceph. Arm64's other polynomial is used by ISO 3309 (HDLC), ANSI X3.66 (ADCCP), FIPS PUB 71, FED-STD-1003, ITU-T V.42, ISO/IEC/IEEE 802-3 (Ethernet), SATA, MPEG-2, PKZIP, Gzip, Bzip2, POSIX cksum,[52] PNG,[53] ZMODEM, many others, but has no built-in acceleration for x86/x64.

@danmoseley
Copy link
Member

I see. That also seems to be the one used by the most popular package (CRC32C.NET)

@shaggygi
Copy link
Contributor

shaggygi commented Aug 10, 2020

Any plans to add Crc8 and Crc16 support, as well?

Used in protocol messages (https://sourceforge.net/p/bacnet/mailman/message/1259086/).

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 10, 2020

Not as part of this work because I don't believe there is hardware intrinsic that provides either of those.
There does seem to be some interest in having CRC algorithms in general but I think that would need a separate api proposal and it's too late in the release cycle to get them into 5 even if approved.

@tannergooding tannergooding added the help wanted [up-for-grabs] Good issue for external contributors label Aug 27, 2020
@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Sep 9, 2020

Could I give this a try?

@jkotas
Copy link
Member

jkotas commented Jan 31, 2021

That optimization only kicks in for byte and sbyte arrays. You would need to store the table as byte array.

I do not think the precomputed table makes sense for this. Computing the table on the fly for the fallback case should be just fine.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Jan 31, 2021

That optimization only kicks in for byte and sbyte arrays. You would need to store the table as byte array.

I do not think the precomputed table makes sense for this. Computing the table on the fly for the fallback case should be just fine.

The description of that PR says arrays of primitive literals. so I assumed that uints would also count.
EDIT: When reading the comments on that PR I saw that it only works with 1B elements due to endianness, ok then.

@MichalPetryka
Copy link
Contributor

I've changed the table to be stored in a uint[] and added a benchmark using ??=, which seems to have only a minimal impact on performance. As we can see, all table based implementations are roughly the same, while the no table one (which I've taken from https://medium.com/@nxtchg/crc-32-without-lookup-tables-1682405aa048 and adapted into C#) is noticeably slower.

BenchmarkDotNet=v0.12.1, OS=Windows 8.1 (6.3.9600.0)
Intel Core i3-4130 CPU 3.40GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3319351 Hz, Resolution=301.2637 ns, Timer=TSC
.NET Core SDK=5.0.102
  [Host]        : .NET Core 5.0 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0  
Method Mean Error StdDev Ratio RatioSD
SSE 0.0468 ns 0.0103 ns 0.0091 ns 0.008 0.00
'Precomputed Table' 5.7565 ns 0.0564 ns 0.0528 ns 1.000 0.00
'Static constructor table' 5.7247 ns 0.0299 ns 0.0265 ns 0.995 0.01
'Lazy generated table' 5.8886 ns 0.0319 ns 0.0283 ns 1.023 0.01
'No table' 90.6089 ns 0.1864 ns 0.1653 ns 15.743 0.16

@tannergooding
Copy link
Member Author

tannergooding commented Jan 31, 2021

@MichalPetryka
Copy link
Contributor

Yeah, noticed after reading the commets on that PR, was my fault for not checking them.

@tannergooding
Copy link
Member Author

This came up as part of #24328, and we probably want to use Crc32C as the name to match what will likely be exposed in the more general-purpose hashing algorithms:

namespace System.Numerics
{
    public static class BitOperations
    {
        public static uint Crc32C(uint crc, byte data);
        public static uint Crc32C(uint crc, ushort data);
        public static uint Crc32C(uint crc, uint data);
        public static uint Crc32C(uint crc, ulong data);
    }
}

CC. @bartonjs, @terrajobst, @GrabYourPitchforks could we confirm this is desired and that the above is the desired implementation shape?

@terrajobst
Copy link
Member

Makes sense to us (@dotnet/fxdc)

@wiz0u
Copy link

wiz0u commented Aug 24, 2021

Can you expose as well a method for Crc32 computation on buffers ? I don't think .NET has a public one.

For example, simply turning System.IO.Compression.Crc32Helper public would be nice.

@tannergooding
Copy link
Member Author

@wiz0u have you seen #24328? This includes general purpose Crc32 APIs and will be in .NET 6 (under System.IO.Hashing: https://source.dot.net/#System.IO.Hashing/System/IO/Hashing/Crc32.cs,6471de2466bb4aa7).

@wiz0u
Copy link

wiz0u commented Aug 24, 2021

@wiz0u have you seen #24328? This includes general purpose Crc32 APIs and will be in .NET 6 (under System.IO.Hashing: https://source.dot.net/#System.IO.Hashing/System/IO/Hashing/Crc32.cs,6471de2466bb4aa7).

ah great! thanks!

@deeprobin
Copy link
Contributor

I would suggest adding another method to the proposal with the nuint parameter type.

@MichalPetryka
Copy link
Contributor

I would suggest adding another method to the proposal with the nuint parameter type.

I wouldn't find that necessarily useful, the point of the overloads is to hash multiple bytes at once so it usually involves reading byte array values as different types, which gets a bit unclean when the type we read isn't constant size.

@tannergooding
Copy link
Member Author

tannergooding commented Sep 13, 2021

I wouldn't find that necessarily useful, the point of the overloads is to hash multiple bytes at once so it usually involves reading byte array values as different types, which gets a bit unclean when the type we read isn't constant size.

The type is a constant size for a given run (32-bits on 32-bit machines and 64-bits on 64-bit machines), as is the result (32-bits). It only makes a difference on how efficiently you can process the bits for large inputs (nuint allowing you to process a full register of bits every iteration, incrementing is still sizeof(nuint) bytes per iteration).

@bartonjs
Copy link
Member

@tannergooding I can't tell if you're arguing for, against, or just providing data 😄.

        public static uint Crc32C(uint crc, nuint data);

seems like a reasonable addition on the surface of things, but won't it produce different answers on 32-bit and 64-bit executions for the same 32-bit compatible value, since a 64-bit run sees 4 extra bytes in the payload?

I'm probably leaning toward not having it and making the caller be clear if they always want 64-bit semantics or if they wanted platform native, on the assumption that nuint val = 5; return Crc32C(0, val); would give different answers on different CPUs.

@MichalPetryka
Copy link
Contributor

@tannergooding I can't tell if you're arguing for, against, or just providing data .

        public static uint Crc32C(uint crc, nuint data);

seems like a reasonable addition on the surface of things, but won't it produce different answers on 32-bit and 64-bit executions for the same 32-bit compatible value, since a 64-bit run sees 4 extra bytes in the payload?

I'm probably leaning toward not having it and making the caller be clear if they always want 64-bit semantics or if they wanted platform native, on the assumption that nuint val = 5; return Crc32C(0, val); would give different answers on different CPUs.

It will produce different values.

@tannergooding
Copy link
Member Author

@bartonjs, these are the primitive building blocks and so it's up the the author of a more general Crc32 loop to correctly handle any trailing bytes.\

I would expect, for example, if you have 11 bytes that you'd do: 1 or 2 iterations using nuint and then handle the remaining 3 bytes independently of the main loop, likely using the ushort or byte overloads, rather than explicitly loading up a nuint sized value. The same goes for if you were using uint or ulong directly.

The only difference between all of these is how many bytes get operated on "per iteration".

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 14, 2021
@kasperk81
Copy link
Contributor

inconsistency between System.Numerics.BitOperations.Crc32C(uint crc, byte data) and System.IO.Hashing.Crc32C.Hash(byte[] data) is the initial value of crc is hardcoded (0xFFFF_FFFFu) in System.IO.Hashing one. should there be a matching overload System.IO.Hashing.Crc32C.Hash(uint crc, byte[] data)?

@Wraith2
Copy link
Contributor

Wraith2 commented Nov 14, 2021

I don't think that's required. The System.IO.Hashing is a oneshot not an additive so you pass it the total data and then it computes it all which will include the final mask which is normally used to tell the difference between a hash of 0 entries and an empty hash. The intrinsic is just the underlying platform instruction being exposed for use.

@bartonjs
Copy link
Member

In addition to the things @Wraith2 already pointed out, the System.IO.Hashing.Crc32 type is computing a different 32-bit CRC than this primitive. System.IO.Hashing.Crc32 computes the same CRC-32 used for Ethernet; this one is used for "CRC-32C", which is used in ext4.

https://stackoverflow.com/questions/26429360/crc32-vs-crc32c

CRC-32: https://reveng.sourceforge.io/crc-catalogue/17plus.htm#crc.cat.crc-32-iso-hdlc
CRC-32C: https://reveng.sourceforge.io/crc-catalogue/17plus.htm#crc.cat.crc-32-iscsi

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 12, 2022
@deeprobin
Copy link
Contributor

This issue is blocked by #62692.

@tannergooding can you please remove the up-for-grabs label 😄?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 29, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 3, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics Cost:S Work that requires one engineer up to 1 week help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.