You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Shifting through the index register is slow (7 cycles), so I replaced it with shifting AccA/B (2 cycles).
This saves 160 cycles per multiplication.
diff -u __mul.s.bak __mul.s
--- __mul.s.bak 2024-11-07 08:35:48
+++ __mul.s 2024-11-07 08:42:02
@@ -10,7 +10,7 @@
psha
tsx
; ,x and 1,x are D
- ; 3,x and 4,x are the top of stack value
+ ; 4,x and 5,x are the top of stack value
ldab #16
stab @tmp ; iteration count
@@ -19,16 +19,17 @@
; Rotate through the number
nextbit:
- lsr ,x
- ror 1,x
+ aslb
+ rola
+ rol 5,x
+ rol 4,x
bcc noadd
- addb 5,x
- adca 4,x
-noadd: lsl 5,x
- rol 4,x
+ addb 1,x
+ adca 0,x
+noadd:
dec @tmp
bne nextbit
- ; For a 16x16 to 32bit just store 3-4,x into sreg
+ ; For a 16x16 to 32bit just store 4-5,x into sreg
ins
ins
jmp __pop2
The text was updated successfully, but these errors were encountered:
If put the multiplier on page 0 (@Tmp2) instead of pushing onto the stack, ADD/ADC will reduce from 5 cycles to 3 cycles, It reduce several dozen cycles further.
Shifting through the index register is slow (7 cycles), so I replaced it with shifting AccA/B (2 cycles).
This saves 160 cycles per multiplication.
The text was updated successfully, but these errors were encountered: