56 - Lua detours #58

yogwoggf · 2025-11-12T05:53:01Z

Resolves #56

WIP

Details

I decided it'd be a good idea for me and others to write down exactly how it works. There is an unfortunate amount of complexity due to LuaJIT's insanity.

1. Debug info remap

Before anything, the debug information of the target proto is mapped onto the detour proto. This makes it nearly impossible to use most debug functions to detect the detour. There is still detection possible, more on that later.

2. Cloning

To facilitate the calling of an original function, we need to clone the original function such that detours can tailcall into the original function easily.

Memory allocations

There is a new helper in autorun-luajit that enables allocating GC objects in LuaJIT as if they're properly sanctioned. This means no memory leaks or anything that would mess with the GC system. It follows the usual allocation scheme, which is:

Call G(L)->allocf to allocate a block of memory
Set up the GCHeader in the base of the allocation
Point the allocation to the next GC root
Mark as a white object
Set GC type (bitwise NOT of the corresponding type tag)

For open upvalues, you'd need to do something way more complicated, but so far I've found success in simply just using closed upvalues.

Proto cloning

The first thing to do is to clone the underlying GCproto, which is easy thanks to the sizept field which represents the entire size of the proto, not just the GCproto struct. The original proto is then copied byte-by-byte to the new proto. After this, we collect all of the internal proto structures (uvinfo, lineinfo, k, etc.), compute their offsets to the base allocation, and fix these pointers in the cloned proto.

Function cloning

The second thing to do is clone the underlying GCfunc, which is not so easy thanks to its small design. The core problem is that GCfuncL stores the upvalue array in two parts. The first upvalue is stored in its own struct, while the rest are stored in a contiguous array adjacent to the base memory allocation. The upvalue array is just a list of GCRefs that point to the real GCupval objects. At this point, these are all closed since they're not being used in any active function.

Moving on, the new GCfunc is allocated, with the upvalue array dynamically allocated according to the proto's sizeuv field. After this, we carefully copy the original GCfunc and its associated upvalue array to the new one. At this point, the new GCfunc points to the old proto and the upvalue array still points to the old upvalues.

The easiest thing to fix is the pc field pointing to the old proto. This is simply just set to the bytecode of the new proto, which is directly after the GCproto struct.

Now, we need to clone and fix each upvalue in the function's upvalue array, which is fairly complicated. The general process is, clone a new GCupval by allocating it in the usual GC list (not the special one for open UVs), copy it byte by byte from the original upvalue, and then store the original TValue inside the upvalue in its closed form. Then, we change the disambiguation hash so that it is not confused with the old upvalue. After this is finished, we can then point the GCRef in the GCfunc's upvalue array to our new GCupval, removing any trace of the old upvalue while keeping its value the same.

Push to stack

After all of this is done, and the function and proto are binded, we simply create a TValue of type LJ_TFUNC and push it to the stack, which allows the script to use the new function. The addresses of the original and new function are obviously different, but basically every single property of them are identical.

3. Detour bytecode engine

To facilitate fast and clean detouring, I decided to go with a bytecode approach. This system assumes that the detour function can be accessed quickly via upvalue zero. This enables us to implement a low-cost detour for the target function that specially tailcalls into the detour function, removing any trace of the detour.

Frame sizes

The original frame size of a function and the detoured frame size differ. This is a detection vector, but an easily patchable one. The issue though, is that the detour requires register re-allocation due to the fact arguments are passed in registers 0-(nargs-1). To facilitate re-allocation, we must allocate double the argument count, and also account for a few reserved registers. This comes out to 2 * nargs + 2, with an extra slot required for variadic detours.

This is then updated in the GCproto structure

Bytecode emission

The bytecode structure of basic detours is as follows:

FUNCF maxslots
UGET nargs 0
MOV nargs+2 0
MOV nargs+3 1
...
MOV nargs+nargs (nargs-1)
CALLT nargs nargs+1

This simply sets up the function to request the necessary amount of stack slots needed. Then, we pull in the detour function to register nargs (after arguments) from upvalue 0. After that, we perform a dynamic reallocation of all of the arguments, shifting them after the detour register so that they will be called correctly. Finally, the CALLT instruction is emitted, which is just a tailcall instruction which calls the detour function. It has the added benefit of handling any potential return the detour may make without any explicit handling required.

Varg bytecode emission

It is much different for variadic functions, however. Variadic functions may have fixed arguments, which are then allocated like normal functions. The difference is, varg handling requires an extra slot, which means we must account for that during register allocation. Another edge-case is that CALLT does not handle varargs correctly since they introduce a pseudo-call frame (see FRAME_VARG). We instead use CALLMT which is similar, but handles vargs correctly and also contains the same return handling functionality.

We also need to emit the VARG instruction, which is an ABC-encoded opcode that sets up the vararg pseudo-call frame for the incoming detour call. This passes the variadic arguments correctly.

FUNCV maxslots+1
UGET nargs 0
<register allocation again, although only for the fixed args>
VARG nargs*2+2 nargs
CALLMT nargs nargs

This is especially useful for dealing with functions like hook.Call, which contain a mix of both fixed arguments and variadic arguments.

Space checks

For now, we enforce functions have at least one upvalue. I believe I can work around this, but it will be done later as it will probably require several new complicated systems to handle upvalue allocation.

We also enforce functions have enough bytecode to contain the detour bytecode. It is simply overwritten as the LuaJIT VM ISA is not variably sized and has a constant instruction width.

4. Upvalue replacement

To pass the detour function, we overwrite the first upvalue in the function

Upvalue cloning

We do the same upvalue cloning as described before, but only target the first GCR pointing to the GCupval struct. We then get rid of this struct and clone our own. This way, any functions sharing the same GCupval do not get affected by another function's detour. This is particularly useful for detouring the hook library as it has many shared upvalues.

We clone the GCupval and then replace the value inside the upvalue to a TValue pointing to our detour function.

Upvalue closing

In this case, the upvalue may or may not be closed. We do not take the chance and forcibly close it, by setting GCupval->closed to 1 and setting GCupval->v to the TValue union located at GCupval->uv. This makes the GCupval store the TValue and not rely on any potentially unstable MRef references that could get deleted at the next GC pass.

5. Detoured

At this point, the function is now successfully detoured, with every call redirecting to the detour function properly. The original function is also intact, likely being called by the detour callback. This should work fine as our operations are definitely interesting, but not invalid. So far, I have tried the detouring system multiple times and have seen rare one-off crashes, which may be related to some race conditions. GC passes work as expected and do not interfere with any detours.

To avoid detection, we will need to remap any debug calls to the original function, and replace debug.getinfo's func field with the detoured one to avoid letting any scripts know that two copies exist.

Otherwise, it is fairly secure, with the specialized bytecode avoiding adding any extra call frames to the stack as seen here:

The debug info is also intact, thanks to the initial debug info remap step.

Todo

Create clone of original function to call from the detour
Handle case where the target has no upvalues
Handle vararg functions (less priority)

…r now

…verwriting the TValue in place.

…place. This caused really nasty errors if upvalues were shared between protos, which most are if they're in the same source file.

…cing one

yogwoggf added 6 commits November 11, 2025 20:14

Add bytecode parsing/writing support, add test lua detour function fo…

1239fc7

…r now

Merge remote-tracking branch 'origin/master' into 56-lua-detours

2661d32

Add upvalue replacement

af471f0

Add trampoline module, which makes detouring work

e5b8339

Basic explanation

3067645

Fix register allocation

e6e1336

yogwoggf self-assigned this Nov 12, 2025

yogwoggf added 17 commits November 11, 2025 21:58

Check for varags

6a50ae1

Clean up code

b047543

Add WIP detour restoration

e002389

Remove extraneous state

f084fd4

Replace proto's debug info with the target

f0aacf5

Add preliminary varg handling

e1507f6

Cleanup code

034c966

Add comment

fbdc361

Write initial implementation of function cloning

f32c871

Fixup offsets and fix other issues

1b4407a

Fix copying garbage data, function cloning somewhat working.

b3c02c8

Change upvalue replacement to create a TValue pointer as opposed to o…

3c433e3

…verwriting the TValue in place.

Add deep upvalue cloning

7e978b2

Fix a major bug causing tons of problems with upvalue clones

fe9a3c6

Clone upvalue before replacing, instead of writing to the upvalue in-…

ca1de7f

…place. This caused really nasty errors if upvalues were shared between protos, which most are if they're in the same source file.

Update dhash

903afe9

Add an experimental method of detouring upvalue-less functions by for…

03f450b

…cing one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

56 - Lua detours #58

56 - Lua detours #58

Uh oh!

yogwoggf commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

56 - Lua detours #58

Are you sure you want to change the base?

56 - Lua detours #58

Uh oh!

Conversation

yogwoggf commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Debug info remap

2. Cloning

Memory allocations

Proto cloning

Function cloning

Push to stack

3. Detour bytecode engine

Frame sizes

Bytecode emission

Varg bytecode emission

Space checks

4. Upvalue replacement

Upvalue cloning

Upvalue closing

5. Detoured

Todo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yogwoggf commented Nov 12, 2025 •

edited

Loading