Skip to content

Conversation

@Dentosal
Copy link
Member

@Dentosal Dentosal commented Mar 14, 2025

Closes #627. VM issue: FuelLabs/fuel-vm#857. VM PR: FuelLabs/fuel-vm#925.

The design is quite similar of the RISC-V of the same name. JAL $ra $rb imm stores the address of the next instruction to $ra, so that register can be used as a return address from the subroutine. If ra is $zero, the value is discarded instead, so this can be used as a jump without having to trash a register. After storing the return address, it jumps to instruction at memory address $rb + imm * 4.

The main purpose of this instruction is efficient subroutine-calling and returning. JAL $ret_addr $subroutine_addr 0 is used to perform the call, and JAL $zero $ret_addr 0 returns from it. For nexted function calls, the callee is responsible for storing the $ret_addr.

The following snippet shows a minimal program using the functionality:

// main function
jal $ret_addr $pc 2 // call subroutine
ret $zero // end program

// subroutine
/* subroutine body comes here */
jal $zero $ret_addr 0 // Return from the subroutine

Fibonacci example

To show off how compact code this makes, I wrote a small fibonacci function using it. The function here uses the following register-based ABI:

  • Function argument and return value $fnarg in 0x10
  • Function return address $return_addr in 0x11

Also the code uses the following locals: $local1: 0x12, $local2: 0x13, $local3: 0x14 (named for pshl/popl)

// Set argument
movi $fnarg 10 // <- this computes fibo(10), i.e. 10th fibonacci number, 55

// Main function
jal $return_addr $pc 3 // <- offset to the subroutine
log $fnarg $zero $zero $zero
ret $one

// Fibonacci subroutine
// fibo(0) = 0, fibo(1) = 1, fibo(n) = fibo(n-1) + fibo(n-2)
pshl 0b11110 // Save return_address and local{1,2,3}
// Compute fn pointer to the current function and place it in local3
subi $local3 $pc 4 // <- subtract 4 to get prev instruction start
// If n < 2 no computation needed
movi $local1 2
lt $local1 $fnarg $local1
jnzf $local1 $zero 8 // Skip over computation
// Else call self with n - 1 and n - 2 and sum those
subi $local2 $fnarg 2         // Save n - 2 to local2
subi $fnarg $fnarg 1          // n -= 1
jal $return_addr $local3 0    // Call self
move $local1 $fnarg           // Copy result to local1
move $fnarg $local2           // Restore n - 2 from local2
jal $return_addr $local3 0    // Call self
move $local2 $fnarg           // Copy result to local2
add $fnarg $local1 $local2 // result = local1 + local2
// Computation ends here this is where jnzf jumps to
popl 0b11110 // Restore return_address and local{1,2,3}
jal $zero $return_addr 0 // Return from subroutine

Before requesting review

  • I have reviewed the changes myself

@Dentosal Dentosal self-assigned this Mar 14, 2025
@Dentosal Dentosal marked this pull request as ready for review March 14, 2025 11:42
@Dentosal Dentosal requested review from a team March 14, 2025 11:42
@Dentosal Dentosal added the comp:FVM Component: FuelVM label Mar 14, 2025
@Voxelot
Copy link
Member

Voxelot commented Mar 17, 2025

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

@xunilrj
Copy link
Contributor

xunilrj commented Mar 17, 2025

Today this is how we compile the following fn.

fn main() -> u64 {
    1337
}

This is the function ASM (not super optimized to avoid inlining):

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
move $$reta $r0               ; restore return address
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jmp $$reta                    ; return from call

This is the ASM calling the fn:

sub  $$reta $pc $is           ; get current instruction offset from instructions start ($is) 
srli $$reta $$reta i2         ; get current instruction offset in 32-bit words
addi $$reta $$reta i4         ; [call]: set new return address
jmpf $zero i76                ; [call]: call main_0
move $r0 $$retv               ; [call]: copy the return value

With this new instruction, we could call fns like:

jal $$reta $pc i76
move $r0 $$retv  

and the fn would be

pshl i3                       ; save registers 16..40
pshh i524288                  ; save registers 40..64
move $$locbase $sp            ; save locals base register for function main_0
move $r0 $$reta               ; save return address
movi $r1 i1337                ; initialize constant into register
move $$retv $r1               ; set return value
poph i524288                  ; restore registers 40..64
popl i3                       ; restore registers 16..40
jal $zero $r0 0

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

@vaivaswatha
Copy link

cc @vaivaswatha can you comment on the impact of this change? Ie. any concerns regarding register allocation for nested sub-routines?

There shouldn't be any problem. When we enter a function, we save all (used) registers and pop them all back at the end. So register allocation shouldn't be affected. I don't see any downsides, and the upside is as elaborated by @xunilrj .

@Dentosal
Copy link
Member Author

Which means we can save 3 instructions when calling fns (huge gains!), and none in the function definition, given that jal $zero $ret_addr 0 seems to be identical to jmp $$reta

jal $zero $ret_addr 0 isn't exectly identical to jmp $$reta, in the sense that the jmp is $is-relative, and jal is not.

We could save extra 4 instructions per function definition, by using JAL last argument as a flag to do register pushing and popping.

I'm not sure how that would work? The immediate part here is at most 12 bits long, and the VM has 48 user-writable registers. Unless we special-case some of these registers, of course, but that seems unwise.

I'm noticing that the function calls could be optimized a lot further with smarter register allocation. For instance...

  • you could save two instructions by only using only higher-half (pshh/poph) registers in the function body, so pshl/popl isn't required at all
  • there's no actual need to move $r0 $$reta, just use jal $zero $$reta 0 directly
  • and of course, the whole function should be inlined in this case
  • after returning, the move $r0 $$retv could be optimized away by treating $$retv as the return value

@xunilrj
Copy link
Contributor

xunilrj commented Mar 22, 2025

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

@Dentosal
Copy link
Member Author

I was imagining one bit per push/pop. So from the 12bits not being used, 4 would allow jump and push, or jump and pop all registers.

I don't think push/pop all registers are sensible operations. At least you'd like to keep the return value and address as-is.

@Dentosal
Copy link
Member Author

Some benchmarks with a sway compiler modified to use this instruction:

build command forc build --release.

Project d821dcb d821dcb with JAL support reduction
mira-v1-core 89.384 KB 85.704 KB 4.3%
sway-applications name-registry/registry-contract 24.664 KB 23.128 KB 6.2%

@Voxelot
Copy link
Member

Voxelot commented May 12, 2025

Should we align the naming more closely with RISCV instructions? ie.

JAL -> jmp and link with only an immediate value operand
jal ra, immediate_offset
JALR -> jmp and link with both register & immediate value operands
jalr ra, rb, immediate_offset


- `$rA` is a reserved register other than `$zero`
- `$rB + imm * 4 >= VM_MAX_RAM`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we panic if $rB == $pc && imm == 0 to avoid jumping into the exact same spot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely not. It's not like you couldn't otherwise make an infinite loop if you want, and then you'll just run out of gas anyway.

IGI-111 pushed a commit to FuelLabs/sway that referenced this pull request Jul 12, 2025
## Description

This PR contains an initial implementtion of subroutine calls using the
in-progress [jump-and-link instruction
`JAL`](FuelLabs/fuel-specs#630). It
substantially reduces the function call overhead: the old code used 4
instructions per call, while the new version uses 1-3 depending on the
distance to the called function.

### Future optimizations

* Reorder functions, so those that call each other are adjacent
* Use absolute or IS-relative jumps where it makes sense, see
#7267

## Checklist

- [x] I have linked to any relevant issues.
- [x] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [x] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [x] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [x] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [x] I have requested a review from the relevant team or maintainers.
@Dentosal Dentosal enabled auto-merge (squash) July 28, 2025 10:21
@Dentosal Dentosal requested a review from Voxelot July 28, 2025 10:22
@Dentosal Dentosal merged commit 2869996 into master Jul 28, 2025
6 checks passed
@Dentosal Dentosal deleted the dento/jal-instruction branch July 28, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:FVM Component: FuelVM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combined jump operation for internal function calls

6 participants