support6800/__mul.s Speeds up multiplication #156

zu2 · 2024-11-07T00:02:46Z

Shifting through the index register is slow (7 cycles), so I replaced it with shifting AccA/B (2 cycles).
This saves 160 cycles per multiplication.

diff -u __mul.s.bak __mul.s
--- __mul.s.bak	2024-11-07 08:35:48
+++ __mul.s	2024-11-07 08:42:02
@@ -10,7 +10,7 @@
 	psha
 	tsx
 	; ,x and 1,x are D
-	; 3,x and 4,x are the top of stack value
+	; 4,x and 5,x are the top of stack value
 	ldab #16
 	stab @tmp		; iteration count
 
@@ -19,16 +19,17 @@
 
 	; Rotate through the number
 nextbit:
-	lsr ,x
-	ror 1,x
+	aslb
+	rola
+	rol	5,x
+	rol	4,x
 	bcc noadd
-	addb 5,x
-	adca 4,x
-noadd:	lsl  5,x
-	rol  4,x
+	addb 1,x
+	adca 0,x
+noadd:
 	dec @tmp
 	bne nextbit
-	; For a 16x16 to 32bit just store 3-4,x into sreg
+	; For a 16x16 to 32bit just store 4-5,x into sreg
 	ins
 	ins
 	jmp __pop2

The text was updated successfully, but these errors were encountered:

zu2 · 2024-11-07T00:13:37Z

If put the multiplier on page 0 (@Tmp2) instead of pushing onto the stack, ADD/ADC will reduce from 5 cycles to 3 cycles, It reduce several dozen cycles further.

EtchedPixels · 2024-11-09T10:36:19Z

That makes sense - to handle interrupts in C we already have to save tmp and tmp2 so might as well use them

EtchedPixels · 2024-11-10T11:04:50Z

initial patch applied

EtchedPixels added the enhancement New feature or request label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support6800/__mul.s Speeds up multiplication #156

support6800/__mul.s Speeds up multiplication #156

zu2 commented Nov 7, 2024

zu2 commented Nov 7, 2024

EtchedPixels commented Nov 9, 2024

EtchedPixels commented Nov 10, 2024

support6800/__mul.s Speeds up multiplication #156

support6800/__mul.s Speeds up multiplication #156

Comments

zu2 commented Nov 7, 2024

zu2 commented Nov 7, 2024

EtchedPixels commented Nov 9, 2024

EtchedPixels commented Nov 10, 2024