Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GpioGroup for efficient access to GPIO registers #19

Merged
merged 6 commits into from
May 25, 2018

Conversation

salkinium
Copy link
Member

@salkinium salkinium commented May 19, 2018

This provides a base class for providing arbitrary sets of pins with an efficient access method to the GPIO port registers. All accesses are bundled, so ideally only one RMW access per port is required instead of per pin. A quick review of the assembly listing shows near optimal access for Group::setOutput().
Whenever the implementation is as efficient as the single Gpio class implementation, it is used instead to limit code duplication.

GpioGroup does not understand pin order, it is just performing the same action for all pins in a more efficient way.
SoftwareGpioPort will extend GpioGroup to provide methods for efficient reading and writing of ordered pin groups.

This requires C++17 due to constexpr if and fold expressions used, see #18.

  • Implement and test for STM32
  • Implement for AVR
  • Test for AVR
  • Document classes
  • Expose traits of GpioPort
  • Update CMSIS and enable C11, C++17
  • Enable AvrDude Tool again
  • Move CMSIS & C++17 commit to separate PR
  • Move AvrDude tool out of this PR
  • Unset mask in ASCR on disconnect
  • Unittests for STM32
  • Unittests for STM32F1
  • Unittests for AVR

@salkinium salkinium force-pushed the feature/gpio_group branch 7 times, most recently from bcfb710 to 4410657 Compare May 21, 2018 18:49
@salkinium
Copy link
Member Author

salkinium commented May 21, 2018

I've rewritten the port implementations from scratch now to be centered around reading the registers per port rather than per pin. Most functions you know from the Gpio classes are also available in the GpioSet class for unordered set of pins.
This will compress duplicate pins on the same port into just one port register access.
So in the best case, 16 pins from the same port will only have one register access, instead of 16.

For ordered access to pins use SoftwareGpioPort, which extends this behavior with a shift map to provide write(data), read() and isSet() functions:

using PinGroup = SoftwareGpioPort< GpioB7, GpioB6, GpioB5, GpioB4, GpioB3, GpioB2, GpioB1, GpioB0 >;
PinGroup::setInput();
PinGroup::read();

For the above example, the compiler generate this near-optimal code:

080006a8 GpioSet<GpioB7,GpioB6,GpioB5,GpioB4,GpioB3,GpioB2,GpioB1,GpioB0>:
			GPIOB->MODER &= ~mask2(1);
 80006a8:	4b06      	ldr	r3, [pc, #24]	; (80006c4)
 80006aa:	4807      	ldr	r0, [pc, #28]	; (80006c8)
 80006ac:	6819      	ldr	r1, [r3, #0]
 80006ae:	4001      	ands	r1, r0
 80006b0:	6019      	str	r1, [r3, #0]
			GPIOB->OTYPER &= ~mask(1);
 80006b2:	6859      	ldr	r1, [r3, #4]
 80006b4:	f021 01ff 	bic.w	r1, r1, #255	; 0xff
 80006b8:	6059      	str	r1, [r3, #4]
			GPIOB->OSPEEDR &= ~mask2(1);
 80006ba:	689a      	ldr	r2, [r3, #8]
 80006bc:	4002      	ands	r2, r0
 80006be:	609a      	str	r2, [r3, #8]
	}
 80006c0:	4770      	bx	lr
 80006c2:	bf00      	nop
 80006c4:	40020400 	.word	0x40020400
 80006c8:	ffff0000 	.word	0xffff0000

The read is just three inlined instructions. Cool.

 800074e:	4d86      	ldr	r5, [pc, #536]	; (8000968 <main+0x21c>)
 8000798:	6928      	ldr	r0, [r5, #16]
 800079c:	b2c0      	uxtb	r0, r0

The shift map for the above example is constructed at compile time, using fold expressions.
It looks like this:

A {   ,   ,   ,   ,   ,   ,   ,   }
B { 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 }
C {   ,   ,   ,   ,   ,   ,   ,   }
D {   ,   ,   ,   ,   ,   ,   ,   }
E {   ,   ,   ,   ,   ,   ,   ,   }
F {   ,   ,   ,   ,   ,   ,   ,   }
G {   ,   ,   ,   ,   ,   ,   ,   }
H {   ,   ,   ,   ,   ,   ,   ,   }
I {   ,   ,   ,   ,   ,   ,   ,   }
J {   ,   ,   ,   ,   ,   ,   ,   }
K {   ,   ,   ,   ,   ,   ,   ,   }

Let's have a look at a less optimal example:

using PinGroup = SoftwareGpioPort< 
		GpioInverted<GpioG6>, 
		GpioInverted<GpioD4>, 
		GpioInverted<GpioD5>, 
		GpioInverted<GpioK3>, 
		GpioInverted<GpioB7>, 
		GpioUnused, 
		GpioD3 >;
PinGroup::setOutput(modm::Gpio::High);
PinGroup::write(0b0001111);
PinGroup::read();

The shift map here looks less regular, note the empty sixth column (bit 1), due to GpioUnused.

A {   ,   ,   ,   ,   ,   ,   }
B {   , 7 ,   ,   ,   ,   ,   }
C {   ,   ,   ,   ,   ,   ,   }
D { 3 ,   ,   , 5 , 4 ,   ,   }
E {   ,   ,   ,   ,   ,   ,   }
F {   ,   ,   ,   ,   ,   ,   }
G {   ,   ,   ,   ,   ,   , 6 }
H {   ,   ,   ,   ,   ,   ,   }
I {   ,   ,   ,   ,   ,   ,   }
J {   ,   ,   ,   ,   ,   ,   }
K {   ,   , 3 ,   ,   ,   ,   }

An inversion map is also constructed, from the input:

A 0000000000000000
B 0000000010000000
C 0000000000000000
D 0000000000110000
E 0000000000000000
F 0000000000000000
G 0000000001000000
H 0000000000000000
I 0000000000000000
J 0000000000000000
K 0000000000001000

In the generated assembly for PinGroup::setOutput(modm::Gpio::High); first the output logic is set,
then the output mode is set.
Note that 6 pins map to only 4 accesses for each, since 3 pins are on Port D, and the pin mask 0x00300008
in the literal pool sets all three pins correctly regarding inversion.

		if constexpr (mask(3)) GPIOD->BSRR = (inverted(3) << 16) | (mask(3) & ~inverted(3));
 80007e8:	4e60      	ldr	r6, [pc, #384]	; (800096c <main+0x220>)
		if constexpr (mask(6)) GPIOG->BSRR = (inverted(6) << 16) | (mask(6) & ~inverted(6));
 80007ea:	f8df 8198 	ldr.w	r8, [pc, #408]	; 8000984 <main+0x238>
		if constexpr (mask(10)) GPIOK->BSRR = (inverted(10) << 16) | (mask(10) & ~inverted(10));
 80007ee:	4f60      	ldr	r7, [pc, #384]	; (8000970 <main+0x224>)
		if constexpr (mask(3)) GPIOD->BSRR = (inverted(3) << 16) | (mask(3) & ~inverted(3));
 80007f0:	4a60      	ldr	r2, [pc, #384]	; (8000974 <main+0x228>)
		if constexpr (mask(1)) GPIOB->BSRR = (inverted(1) << 16) | (mask(1) & ~inverted(1));
 80007f2:	f44f 0300 	mov.w	r3, #8388608	; 0x800000
 80007f6:	61ab      	str	r3, [r5, #24]
		if constexpr (mask(6)) GPIOG->BSRR = (inverted(6) << 16) | (mask(6) & ~inverted(6));
 80007f8:	f44f 0980 	mov.w	r9, #4194304	; 0x400000
		if constexpr (mask(10)) GPIOK->BSRR = (inverted(10) << 16) | (mask(10) & ~inverted(10));
 80007fc:	f44f 2300 	mov.w	r3, #524288	; 0x80000
		if constexpr (mask(3)) GPIOD->BSRR = (inverted(3) << 16) | (mask(3) & ~inverted(3));
 8000800:	61b2      	str	r2, [r6, #24]
 8000802:	9201      	str	r2, [sp, #4]
		if constexpr (mask(6)) GPIOG->BSRR = (inverted(6) << 16) | (mask(6) & ~inverted(6));
 8000804:	f8c8 9018 	str.w	r9, [r8, #24]
		if constexpr (mask(10)) GPIOK->BSRR = (inverted(10) << 16) | (mask(10) & ~inverted(10));
 8000808:	61bb      	str	r3, [r7, #24]
		if constexpr (mask(1)) GPIOB->MODER = (GPIOB->MODER & ~mask2(1)) | mask2(1, i(Mode::Output));
 800080a:	682b      	ldr	r3, [r5, #0]
 800080c:	f423 4340 	bic.w	r3, r3, #49152	; 0xc000
 8000810:	f443 4380 	orr.w	r3, r3, #16384	; 0x4000
 8000814:	602b      	str	r3, [r5, #0]
		if constexpr (mask(3)) GPIOD->MODER = (GPIOD->MODER & ~mask2(3)) | mask2(3, i(Mode::Output));
 8000816:	6833      	ldr	r3, [r6, #0]
 8000818:	f423 637c 	bic.w	r3, r3, #4032	; 0xfc0
 800081c:	f443 63a8 	orr.w	r3, r3, #1344	; 0x540
 8000820:	6033      	str	r3, [r6, #0]
		if constexpr (mask(6)) GPIOG->MODER = (GPIOG->MODER & ~mask2(6)) | mask2(6, i(Mode::Output));
 8000822:	f8d8 3000 	ldr.w	r3, [r8]
 8000826:	f423 5340 	bic.w	r3, r3, #12288	; 0x3000
 800082a:	f443 5380 	orr.w	r3, r3, #4096	; 0x1000
 800082e:	f8c8 3000 	str.w	r3, [r8]
		if constexpr (mask(10)) GPIOK->MODER = (GPIOK->MODER & ~mask2(10)) | mask2(10, i(Mode::Output));
 8000832:	683b      	ldr	r3, [r7, #0]
 8000834:	f023 03c0 	bic.w	r3, r3, #192	; 0xc0
 8000838:	f043 0340 	orr.w	r3, r3, #64	; 0x40
 800096c:	40020c00 	.word	0x40020c00
 8000970:	40022800 	.word	0x40022800
 8000974:	00300008 	.word	0x00300008
 8000984:	40021800 	.word	0x40021800

This is the runtime version of the PinGroup::write() method. Still pretty fast at 39 instructions.

			GPIOB->BSRR = ((~p & mask(1)) << 16) | p;
 8000730:	4b1a      	ldr	r3, [pc, #104]	; (800079c)
			if constexpr (2 < width) if constexpr (constexpr auto pos = shift_mask(1, 2); pos >= 0) p |= (data & (1ul << 2)) ? (1ul << pos) : (1ul << (pos + 16));
 8000732:	f010 0f04 	tst.w	r0, #4
			GPIOB->BSRR = ((~p & mask(1)) << 16) | p;
 8000736:	bf14      	ite	ne
 8000738:	2280      	movne	r2, #128	; 0x80
 800073a:	f44f 0200 	moveq.w	r2, #8388608	; 0x800000
			if constexpr (0 < width) if constexpr (constexpr auto pos = shift_mask(3, 0); pos >= 0) p |= (data & (1ul << 0)) ? (1ul << pos) : (1ul << (pos + 16));
 800073e:	f010 0f01 	tst.w	r0, #1
			GPIOB->BSRR = ((~p & mask(1)) << 16) | p;
 8000742:	619a      	str	r2, [r3, #24]
			if constexpr (0 < width) if constexpr (constexpr auto pos = shift_mask(3, 0); pos >= 0) p |= (data & (1ul << 0)) ? (1ul << pos) : (1ul << (pos + 16));
 8000744:	bf14      	ite	ne
 8000746:	2208      	movne	r2, #8
 8000748:	f44f 2200 	moveq.w	r2, #524288	; 0x80000
			if constexpr (4 < width) if constexpr (constexpr auto pos = shift_mask(3, 4); pos >= 0) p |= (data & (1ul << 4)) ? (1ul << pos) : (1ul << (pos + 16));
 800074c:	f010 0f10 	tst.w	r0, #16
 8000750:	bf14      	ite	ne
 8000752:	2320      	movne	r3, #32
 8000754:	f44f 1300 	moveq.w	r3, #2097152	; 0x200000
			if constexpr (5 < width) if constexpr (constexpr auto pos = shift_mask(3, 5); pos >= 0) p |= (data & (1ul << 5)) ? (1ul << pos) : (1ul << (pos + 16));
 8000758:	f010 0f20 	tst.w	r0, #32
			if constexpr (4 < width) if constexpr (constexpr auto pos = shift_mask(3, 4); pos >= 0) p |= (data & (1ul << 4)) ? (1ul << pos) : (1ul << (pos + 16));
 800075c:	ea42 0203 	orr.w	r2, r2, r3
			if constexpr (5 < width) if constexpr (constexpr auto pos = shift_mask(3, 5); pos >= 0) p |= (data & (1ul << 5)) ? (1ul << pos) : (1ul << (pos + 16));
 8000760:	bf14      	ite	ne
 8000762:	2310      	movne	r3, #16
 8000764:	f44f 1380 	moveq.w	r3, #1048576	; 0x100000
 8000768:	431a      	orrs	r2, r3
			GPIOD->BSRR = ((~p & mask(3)) << 16) | p;
 800076a:	0413      	lsls	r3, r2, #16
 800076c:	f483 1360 	eor.w	r3, r3, #3670016	; 0x380000
 8000770:	4313      	orrs	r3, r2
 8000772:	4a0b      	ldr	r2, [pc, #44]	; (80007a0)
			if constexpr (6 < width) if constexpr (constexpr auto pos = shift_mask(6, 6); pos >= 0) p |= (data & (1ul << 6)) ? (1ul << pos) : (1ul << (pos + 16));
 8000774:	f010 0f40 	tst.w	r0, #64	; 0x40
			GPIOD->BSRR = ((~p & mask(3)) << 16) | p;
 8000778:	6193      	str	r3, [r2, #24]
			GPIOG->BSRR = ((~p & mask(6)) << 16) | p;
 800077a:	4b0a      	ldr	r3, [pc, #40]	; (80007a4)
 800077c:	bf14      	ite	ne
 800077e:	2240      	movne	r2, #64	; 0x40
 8000780:	f44f 0280 	moveq.w	r2, #4194304	; 0x400000
 8000784:	619a      	str	r2, [r3, #24]
			if constexpr (3 < width) if constexpr (constexpr auto pos = shift_mask(10, 3); pos >= 0) p |= (data & (1ul << 3)) ? (1ul << pos) : (1ul << (pos + 16));
 8000786:	f010 0f08 	tst.w	r0, #8
			GPIOK->BSRR = ((~p & mask(10)) << 16) | p;
 800078a:	f503 5380 	add.w	r3, r3, #4096	; 0x1000
 800078e:	bf14      	ite	ne
 8000790:	2208      	movne	r2, #8
 8000792:	f44f 2200 	moveq.w	r2, #524288	; 0x80000
 8000796:	619a      	str	r2, [r3, #24]
	}
 8000798:	4770      	bx	lr
 800079a:	bf00      	nop
 800079c:	40020400 	.word	0x40020400
 80007a0:	40020c00 	.word	0x40020c00
 80007a4:	40021800 	.word	0x40021800

The PinGroup::read() is inlined with 23 instructions.

			const uint16_t p = (GPIOK->IDR & mask(10)) ^ inverted(10);
 800025c:	4915      	ldr	r1, [pc, #84]	; (80002b4)
 800025e:	2302      	movs	r3, #2
 8000260:	6063      	str	r3, [r4, #4]
			const uint16_t p = (GPIOB->IDR & mask(1)) ^ inverted(1);
 8000262:	4b15      	ldr	r3, [pc, #84]	; (80002b8)
 8000264:	6918      	ldr	r0, [r3, #16]
			const uint16_t p = (GPIOD->IDR & mask(3)) ^ inverted(3);
 8000266:	f8d3 2810 	ldr.w	r2, [r3, #2064]	; 0x810
			const uint16_t p = (GPIOG->IDR & mask(6)) ^ inverted(6);
 800026a:	f503 53a0 	add.w	r3, r3, #5120	; 0x1400
			if constexpr (0 < width) if constexpr (constexpr auto pos = shift_mask(3, 0); pos >= 0) r |= ((p >> pos) & 1) << 0;
 800026e:	f002 0238 	and.w	r2, r2, #56	; 0x38
			const uint16_t p = (GPIOG->IDR & mask(6)) ^ inverted(6);
 8000272:	691b      	ldr	r3, [r3, #16]
			const uint16_t p = (GPIOK->IDR & mask(10)) ^ inverted(10);
 8000274:	6909      	ldr	r1, [r1, #16]
			const uint16_t p = (GPIOG->IDR & mask(6)) ^ inverted(6);
 8000276:	f003 0340 	and.w	r3, r3, #64	; 0x40
			const uint16_t p = (GPIOK->IDR & mask(10)) ^ inverted(10);
 800027a:	f001 0108 	and.w	r1, r1, #8
			if constexpr (3 < width) if constexpr (constexpr auto pos = shift_mask(10, 3); pos >= 0) r |= ((p >> pos) & 1) << 3;
 800027e:	430b      	orrs	r3, r1
			if constexpr (2 < width) if constexpr (constexpr auto pos = shift_mask(1, 2); pos >= 0) r |= ((p >> pos) & 1) << 2;
 8000280:	f3c0 10c0 	ubfx	r0, r0, #7, #1
			if constexpr (3 < width) if constexpr (constexpr auto pos = shift_mask(10, 3); pos >= 0) r |= ((p >> pos) & 1) << 3;
 8000284:	ea43 0380 	orr.w	r3, r3, r0, lsl #2
			if constexpr (0 < width) if constexpr (constexpr auto pos = shift_mask(3, 0); pos >= 0) r |= ((p >> pos) & 1) << 0;
 8000288:	f3c2 00c0 	ubfx	r0, r2, #3, #1
			if constexpr (3 < width) if constexpr (constexpr auto pos = shift_mask(10, 3); pos >= 0) r |= ((p >> pos) & 1) << 3;
 800028c:	4303      	orrs	r3, r0
			if constexpr (4 < width) if constexpr (constexpr auto pos = shift_mask(3, 4); pos >= 0) r |= ((p >> pos) & 1) << 4;
 800028e:	1150      	asrs	r0, r2, #5
			if constexpr (3 < width) if constexpr (constexpr auto pos = shift_mask(10, 3); pos >= 0) r |= ((p >> pos) & 1) << 3;
 8000290:	ea43 1300 	orr.w	r3, r3, r0, lsl #4
			if constexpr (5 < width) if constexpr (constexpr auto pos = shift_mask(3, 5); pos >= 0) r |= ((p >> pos) & 1) << 5;
 8000294:	0050      	lsls	r0, r2, #1
 8000296:	f000 0020 	and.w	r0, r0, #32
 800029a:	4318      	orrs	r0, r3
 800029c:	4770      	bx	lr
 80002b4:	40022800 	.word	0x40022800
 80002b8:	40020400 	.word	0x40020400

For strictly-ordered access to pins use GpioPort, as before, however now you
can specify reversed ports with a negative width. You can also invert the whole port.

using PinGroup1 = GpioPort< GpioB0, +8 >;
using PinGroup2 = GpioPort< GpioB7, -8 >;
using PinGroup3 = GpioPort< GpioInverted<GpioB7>, -8 >;

@strongly-typed, this should solve the problem you wanted solved, maybe you can try it out, when you have time.

@salkinium salkinium force-pushed the feature/gpio_group branch 2 times, most recently from 8631531 to fa6ef1b Compare May 22, 2018 00:56
@salkinium salkinium changed the title [wip] Add GpioGroup for efficient access to GPIO registers Add GpioGroup for efficient access to GPIO registers May 22, 2018
@salkinium salkinium force-pushed the feature/gpio_group branch from 58c49df to f1ceae9 Compare May 22, 2018 14:14
@strongly-typed
Copy link
Collaborator

strongly-typed commented May 22, 2018

Great work, thank you! 🎉

Most important to me is the single-cycle read per port. As polling Hall sensors for a BLDC is a bit time-critical I wanted to make sure that there is a single read to the port register.

Is there a possibility to static_assert that all Pins on a SoftwareGpioPort are on the same port? E.g.

static_assert(myPinGroup::isOnSamePort(), "Hall pins must be at the same port to guarantee atomic read operations.")

Because the implementation is abstract from the actual pin assignment (using pinGroup = SoftwareGpioPort< Hall1, Hall2, Hall3 >;)

This does not yet work:

pinGroup::setInput(Gpio::InputType::PullUp);
pinGroup::setInputTrigger(Hall1::InputTrigger::BothEdges);
pinGroup::enableExternalInterrupt();
pinGroup::enableExternalInterruptVector(12);

@salkinium
Copy link
Member Author

salkinium commented May 22, 2018

single-cycle read per port

🤓🤓 actually 🤓🤓, double-cycle @ bus frequency, 🤓🤓 since bus is shared by multiple masters 🤓🤓 and there is an arbitration cycle. 🤓🤓

But I know what you mean, one single read per port register, not a gazillion like before.

Is there a possibility to static_assert that all Pins on a SoftwareGpioPort are on the same port?

I have to think about the semantics here, it's getting a bit fuzzy with so many classes GpioPN, GpioSet, GpioPort and SoftwareGpioPort.

I think maybe the best is to expose these properties via public constexpr methods/statics on the SoftwareGpioPort, and let the user do the static_assert. Then there wouldn't be a combinatorics explosion when adding new traits for assertion. Would become part of the modm::GpioPort interface, so that drivers can query that too.

This does not yet work

As usual you're the guinea pig 😝, this wasn't tested beyond some LED blinking.

I'll double check the bit mask generation code. I want to make a new test type for this, which reads back the register values from hardware. With the new test modules this could be made specific to the target, which wasn't possible in xpcc.

@strongly-typed
Copy link
Collaborator

I feel very comfortable to be the guinea pig 🐷 😀

So far it compiles without changes. I will try in real hardware tmr.

@salkinium
Copy link
Member Author

The Input Triggers and external interrupts are not ported, since they do not actually belong in the Gpio class to begin with.

pinGroup::setInputTrigger(Hall1::InputTrigger::BothEdges);
pinGroup::enableExternalInterrupt();
pinGroup::enableExternalInterruptVector(12);

ASAIK the triggers and interrupts are not configurable per pin, but only per group. So you can configure the Input trigger on GpioA0 and it'll get overwritten by GpioB0, since they share the same signal. Furthermore, the interrupt vectors are also shared across these signals.
The AVR has similar issues, where multiple pins map to the same signal/interrupt.

So this'll likely be converted into a connect() call as well to a separate peripheral.

@salkinium salkinium force-pushed the feature/gpio_group branch from f1ceae9 to 7332c3e Compare May 23, 2018 22:43
@salkinium
Copy link
Member Author

salkinium commented May 23, 2018

I've added some unittests for testing the GPIO interface and fixed a few inconsistencies surrounding the connect method. Note that the test will only be included when compiling for the F469NI-DISCO, since I need to know a list of free-floating GPIOs to perform my tests to avoid electrical damage to the board.

@salkinium salkinium force-pushed the feature/gpio_group branch 11 times, most recently from e41beec to ca1262c Compare May 24, 2018 21:21
@salkinium salkinium force-pushed the feature/gpio_group branch from ca1262c to 68f5c22 Compare May 24, 2018 21:32
@salkinium
Copy link
Member Author

You can now static_assert the number of ports in the GpioSet:

using PinGroup = SoftwareGpioPort< GpioB7, GpioB6, GpioB5, GpioB4, GpioB3, GpioB2, GpioB1, GpioB0 >;
static_assert(PinGroup::number_of_ports  == 1);

@salkinium salkinium force-pushed the feature/gpio_group branch from 68f5c22 to d872214 Compare May 24, 2018 23:24
salkinium added 5 commits May 25, 2018 22:59
These classes cluster access to GPIO registers from a set or ordered
list of individual GPIO pins.

Use:
- `GpioSet` for unordered access.
- `SoftwareGpioPort` for ordered access.
- `GpioPort` for strictly-ordered access.
Hardcoded for free IOs on NUCLEO-64 and AL-AVREB-CAN boards. Test will only
be included if the `board.nucleo-*` or `board.al-avr*` module is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants