Skip to content

Commit fec818d

Browse files
authored
Constant division templates (ridiculousfish#89)
* AVR uses C++11 * Generate template based divisors * Fix AVR modulus target * Fix python lint warnings * Avoid taking the address of a static constexpr struct. It may cause the struct to be allocated RAM. * Update documentation * C++ constant div consumes C macros So single source of truth for 16-bit libdivide constants
1 parent afb8a8a commit fec818d

8 files changed

+65766
-29
lines changed

README.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ vector division which provides an even larger speedup. You can test how much
1919
speedup you can achieve on your CPU using the [benchmark](#benchmark-program)
2020
program.
2121

22-
libdivide is compatible with 8-bit microcontrollers, such as the AVR series: [the CI build includes a AtMega2560 target](test/avr/readme.md). Since low end hardware such as this often do not include a hardware divider, libdivide is particulary useful. In addition to the runtime [C](https://github.com/ridiculousfish/libdivide/blob/master/doc/C-API.md) & [C++](https://github.com/ridiculousfish/libdivide/blob/master/doc/CPP-API.md) APIs, a set of [predefined macros](constant_fast_div.h) is included to speed up division by 16-bit constants: division by a 16-bit constant is [not optimized by avr-gcc on 8-bit systems](https://stackoverflow.com/questions/47994933/why-doesnt-gcc-or-clang-on-arm-use-division-by-invariant-integers-using-multip).
22+
libdivide is compatible with 8-bit microcontrollers, such as the AVR series: [the CI build includes a AtMega2560 target](test/avr/readme.md). Since low end hardware such as this often do not include a hardware divider, libdivide is particulary useful. In addition to the runtime [C](https://github.com/ridiculousfish/libdivide/blob/master/doc/C-API.md) & [C++](https://github.com/ridiculousfish/libdivide/blob/master/doc/CPP-API.md) APIs, a set of [predefined macros](constant_fast_div.h) and [templates](constant_fast_div.hpp) is included to speed up division by 16-bit constants: division by a 16-bit constant is [not optimized by avr-gcc on 8-bit systems](https://stackoverflow.com/questions/47994933/why-doesnt-gcc-or-clang-on-arm-use-division-by-invariant-integers-using-multip).
2323

2424
See https://libdivide.com for more information on libdivide.
2525

@@ -83,7 +83,8 @@ void divide(int64_t *array, size_t size, int64_t divisor)
8383
8484
* [C API](https://github.com/ridiculousfish/libdivide/blob/master/doc/C-API.md)
8585
* [C++ API](https://github.com/ridiculousfish/libdivide/blob/master/doc/CPP-API.md)
86-
* [Invariant Division](constant_fast_div.h)
86+
* [Macro Invariant Division](constant_fast_div.h)
87+
* [Template Based Invariant Division](constant_fast_div.hpp)
8788
8889
# Branchfull vs branchfree
8990

constant_fast_div.hpp

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
/*
2+
* When dividing by a known compile time constant, the division can be replaced
3+
* by a multiply+shift operation. GCC will do this automatically,
4+
* *BUT ONLY FOR DIVISION OF REGISTER-WIDTH OR NARROWER*.
5+
*
6+
* So on an 8-bit system, 16-bit divides will *NOT* be optimised.
7+
*
8+
* The templates here manually apply the multiply+shift operation for 16-bit numbers.
9+
*/
10+
11+
#pragma once
12+
#include "libdivide.h"
13+
#include "u16_ldparams.h"
14+
#include "s16_ldparams.h"
15+
16+
#ifdef __cplusplus
17+
namespace libdivide {
18+
19+
// Implementation details
20+
namespace detail {
21+
22+
// Specialized templates containing precomputed libdivide constants
23+
// Primary template for pre-generated libdivide constants
24+
template<typename IntT, IntT divisor> struct libdivide_constants {};
25+
#include "u16_ldparams.hpp"
26+
#include "s16_ldparams.hpp"
27+
28+
// Primary template - divide as normal. Performant for divisors that are a power of 2
29+
template <typename T, T divisor, bool is_power2>
30+
struct fast_divide_t {
31+
static LIBDIVIDE_INLINE T divide(T n) { return n/divisor; }
32+
};
33+
34+
// Divide by 1 - no-op
35+
template <bool is_power2>
36+
struct fast_divide_t<uint16_t, 1U, is_power2> {
37+
static LIBDIVIDE_INLINE uint16_t divide(uint16_t n) { return n; }
38+
};
39+
template <bool is_power2>
40+
struct fast_divide_t<int16_t, 1, is_power2> {
41+
static LIBDIVIDE_INLINE int16_t divide(int16_t n) { return n; }
42+
};
43+
44+
// Specialzed template for non-power of 2 uint16_t divisors
45+
template<uint16_t divisor>
46+
struct fast_divide_t<uint16_t, divisor, false> {
47+
static LIBDIVIDE_INLINE uint16_t divide(uint16_t n) {
48+
return libdivide_u16_do_raw(n, libdivide_constants<uint16_t, divisor>::libdivide.magic,
49+
libdivide_constants<uint16_t, divisor>::libdivide.more);
50+
}
51+
};
52+
53+
// Specialzed template for non-power of 2 int16_t divisors
54+
template<int16_t divisor>
55+
struct fast_divide_t<int16_t, divisor, false> {
56+
static LIBDIVIDE_INLINE int16_t divide(int16_t n) {
57+
return libdivide_s16_do_raw(n, libdivide_constants<int16_t, divisor>::libdivide.magic,
58+
libdivide_constants<int16_t, divisor>::libdivide.more);
59+
}
60+
};
61+
62+
// Power of 2 test
63+
template <typename T, T N>
64+
struct is_power_of_two {
65+
static constexpr bool val = N!=0 && (N & (N - 1))==0;
66+
};
67+
}
68+
69+
// Public API.
70+
template <typename T, T divisor>
71+
LIBDIVIDE_INLINE T fast_divide(T n) {
72+
return detail::fast_divide_t<T, divisor, detail::is_power_of_two<T, divisor>::val>::divide(n);
73+
}
74+
75+
}
76+
#endif

0 commit comments

Comments
 (0)