Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: make polynomial tables optional #57

Closed
Xiretza opened this issue Feb 9, 2021 · 6 comments
Closed

Feature request: make polynomial tables optional #57

Xiretza opened this issue Feb 9, 2021 · 6 comments

Comments

@Xiretza
Copy link

Xiretza commented Feb 9, 2021

It would be great to be able to omit the allocation of the tables in memory-constrained environments and instead fall back to generating the value for each byte on demand.

@akhilles
Copy link
Collaborator

Any thoughts on how to expose this in the public API? Maybe Crc can be generic over the update implementation:

struct Crc<W: Width, U: CrcUpdate<W>> {
    pub algorithm: &'static Algorithm<W>,
    crc_update: U,
}

trait CrcUpdate<W: Width> {
    fn update(&self, crc: W, bytes: &[u8]) -> W;
}

struct Table<W: Width>{
    algorithm: &'static Algorithm<W>,
    table: [W; 256]
}
impl<W: Width> CrcUpdate<W> for Table<W> {
    fn update(&self, crc: W, bytes: &[u8]) -> W {
        todo!()
    }
}

Then, there can be a low-memory usage implementation of CrcUpdate. Crc<u32, Table<u32>> is kind of verbose though, so we can set Table as the default type: struct Crc<W: Width, U: CrcUpdate<W> = Table<W>>. Then, Crc<u32> should work by defaulting to the table implementation.

@KillingSpark
Copy link
Contributor

I played around with @akhilles proposal because I was interested in the opposite direction: making this faster with bigger lookup tables.

The only drawback I can see is that it would have to drop a lot of the const-ness because implementing trait functions as const is not (yet?) possible.

The table generation is still const, just the update() calls would need to lose the const attribute.

Using the slice-by-16 algorithm for the u32 CRC I am able to achieve a 7x throughput increase with a 16x bigger lookup table (256 * 4 * 16 B = 16kB).

crc32/crc32_bytewise    time:   [1.7886 ms 1.7893 ms 1.7901 ms]
                        thrpt:  [532.74 MiB/s 532.97 MiB/s 533.21 MiB/s]

crc32/crc32_slice16     time:   [254.29 µs 254.37 µs 254.46 µs]
                        thrpt:  [3.6600 GiB/s 3.6613 GiB/s 3.6625 GiB/s]

@mrhooray what's your opinion on this? I'd be willing to prepare a PR with these changes but it's a bit of busywork to do this for every width so I'd like an approval before I start doing that.

@mrhooray
Copy link
Owner

mrhooray commented Jan 4, 2023

Thanks for looking into this - the speedup is impressive and different scenarios could benefit from it.
I'm less familiar with the language and use cases in different environments nowadays, to fully understand the tradeoffs of the proposal/losing const on update.
Perhaps @akhilles who's been maintaining the project can give better advice.

@akhilles
Copy link
Collaborator

Hmm, the perf gains do look promising. I'm wondering if there's a way to configure table size while preserving API compatibility. Maybe using feature flags?

@akhilles
Copy link
Collaborator

Ok, so I think it might be possible to support bigger (and smaller) lookup tables while keeping const and not breaking API. I have a proof of concept here: #76. Basically, we can add support for Crc<Fast<u32>> in addition to Crc<u32>.

Just need to confirm that switching from Crc<W: Width> to Crc<I: Implementation> is not a breaking change. I don't think it is because anything that implements Width would automatically implement Implementation.

@KillingSpark
Copy link
Contributor

KillingSpark commented Jan 11, 2023

Oh that's clever! I am not completely sure whether this is a breaking change either. But as long as Crc<uXXX> still works it shouldn't be a breaking change.

So for me to move forward I'd wait until #76 is merged (if it works out) and would then make a PR that introduces something like this:

struct Slice16U32{}

impl Implementation for Slice16U32 {
    ....
}

impl Crc<Slice16u32> {
...
}

That would be cool and would allow for gradual addition of optimizations for separate widths

@KillingSpark KillingSpark mentioned this issue Jan 16, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants