Skip to content

Conversation

@yogwoggf
Copy link
Collaborator

@yogwoggf yogwoggf commented Nov 12, 2025

Resolves #56

WIP

Details

I decided it'd be a good idea for me and others to write down exactly how it works. There is an unfortunate amount of complexity due to LuaJIT's insanity.

1. Debug info remap

Before anything, the debug information of the target proto is mapped onto the detour proto. This makes it nearly impossible to use most debug functions to detect the detour. There is still detection possible, more on that later.

2. Cloning

To facilitate the calling of an original function, we need to clone the original function such that detours can tailcall into the original function easily.

Memory allocations

There is a new helper in autorun-luajit that enables allocating GC objects in LuaJIT as if they're properly sanctioned. This means no memory leaks or anything that would mess with the GC system. It follows the usual allocation scheme, which is:

  1. Call G(L)->allocf to allocate a block of memory
  2. Set up the GCHeader in the base of the allocation
  3. Point the allocation to the next GC root
  4. Mark as a white object
  5. Set GC type (bitwise NOT of the corresponding type tag)

For open upvalues, you'd need to do something way more complicated, but so far I've found success in simply just using closed upvalues.

Proto cloning

The first thing to do is to clone the underlying GCproto, which is easy thanks to the sizept field which represents the entire size of the proto, not just the GCproto struct. The original proto is then copied byte-by-byte to the new proto. After this, we collect all of the internal proto structures (uvinfo, lineinfo, k, etc.), compute their offsets to the base allocation, and fix these pointers in the cloned proto.

Function cloning

The second thing to do is clone the underlying GCfunc, which is not so easy thanks to its small design. The core problem is that GCfuncL stores the upvalue array in two parts. The first upvalue is stored in its own struct, while the rest are stored in a contiguous array adjacent to the base memory allocation. The upvalue array is just a list of GCRefs that point to the real GCupval objects. At this point, these are all closed since they're not being used in any active function.

Moving on, the new GCfunc is allocated, with the upvalue array dynamically allocated according to the proto's sizeuv field. After this, we carefully copy the original GCfunc and its associated upvalue array to the new one. At this point, the new GCfunc points to the old proto and the upvalue array still points to the old upvalues.

The easiest thing to fix is the pc field pointing to the old proto. This is simply just set to the bytecode of the new proto, which is directly after the GCproto struct.

Now, we need to clone and fix each upvalue in the function's upvalue array, which is fairly complicated. The general process is, clone a new GCupval by allocating it in the usual GC list (not the special one for open UVs), copy it byte by byte from the original upvalue, and then store the original TValue inside the upvalue in its closed form. Then, we change the disambiguation hash so that it is not confused with the old upvalue. After this is finished, we can then point the GCRef in the GCfunc's upvalue array to our new GCupval, removing any trace of the old upvalue while keeping its value the same.

Push to stack

After all of this is done, and the function and proto are binded, we simply create a TValue of type LJ_TFUNC and push it to the stack, which allows the script to use the new function. The addresses of the original and new function are obviously different, but basically every single property of them are identical.

3. Detour bytecode engine

To facilitate fast and clean detouring, I decided to go with a bytecode approach. This system assumes that the detour function can be accessed quickly via upvalue zero. This enables us to implement a low-cost detour for the target function that specially tailcalls into the detour function, removing any trace of the detour.

Frame sizes

The original frame size of a function and the detoured frame size differ. This is a detection vector, but an easily patchable one. The issue though, is that the detour requires register re-allocation due to the fact arguments are passed in registers 0-(nargs-1). To facilitate re-allocation, we must allocate double the argument count, and also account for a few reserved registers. This comes out to 2 * nargs + 2, with an extra slot required for variadic detours.

This is then updated in the GCproto structure

Bytecode emission

The bytecode structure of basic detours is as follows:

FUNCF maxslots
UGET nargs 0
MOV nargs+2 0
MOV nargs+3 1
...
MOV nargs+nargs (nargs-1)
CALLT nargs nargs+1

This simply sets up the function to request the necessary amount of stack slots needed. Then, we pull in the detour function to register nargs (after arguments) from upvalue 0. After that, we perform a dynamic reallocation of all of the arguments, shifting them after the detour register so that they will be called correctly. Finally, the CALLT instruction is emitted, which is just a tailcall instruction which calls the detour function. It has the added benefit of handling any potential return the detour may make without any explicit handling required.

Varg bytecode emission

It is much different for variadic functions, however. Variadic functions may have fixed arguments, which are then allocated like normal functions. The difference is, varg handling requires an extra slot, which means we must account for that during register allocation. Another edge-case is that CALLT does not handle varargs correctly since they introduce a pseudo-call frame (see FRAME_VARG). We instead use CALLMT which is similar, but handles vargs correctly and also contains the same return handling functionality.

We also need to emit the VARG instruction, which is an ABC-encoded opcode that sets up the vararg pseudo-call frame for the incoming detour call. This passes the variadic arguments correctly.

FUNCV maxslots+1
UGET nargs 0
<register allocation again, although only for the fixed args>
VARG nargs*2+2 nargs
CALLMT nargs nargs

This is especially useful for dealing with functions like hook.Call, which contain a mix of both fixed arguments and variadic arguments.

Space checks

For now, we enforce functions have at least one upvalue. I believe I can work around this, but it will be done later as it will probably require several new complicated systems to handle upvalue allocation.

We also enforce functions have enough bytecode to contain the detour bytecode. It is simply overwritten as the LuaJIT VM ISA is not variably sized and has a constant instruction width.

4. Upvalue replacement

To pass the detour function, we overwrite the first upvalue in the function

Upvalue cloning

We do the same upvalue cloning as described before, but only target the first GCR pointing to the GCupval struct. We then get rid of this struct and clone our own. This way, any functions sharing the same GCupval do not get affected by another function's detour. This is particularly useful for detouring the hook library as it has many shared upvalues.

We clone the GCupval and then replace the value inside the upvalue to a TValue pointing to our detour function.

Upvalue closing

In this case, the upvalue may or may not be closed. We do not take the chance and forcibly close it, by setting GCupval->closed to 1 and setting GCupval->v to the TValue union located at GCupval->uv. This makes the GCupval store the TValue and not rely on any potentially unstable MRef references that could get deleted at the next GC pass.

5. Detoured

At this point, the function is now successfully detoured, with every call redirecting to the detour function properly. The original function is also intact, likely being called by the detour callback. This should work fine as our operations are definitely interesting, but not invalid. So far, I have tried the detouring system multiple times and have seen rare one-off crashes, which may be related to some race conditions. GC passes work as expected and do not interfere with any detours.

To avoid detection, we will need to remap any debug calls to the original function, and replace debug.getinfo's func field with the detoured one to avoid letting any scripts know that two copies exist.

Otherwise, it is fairly secure, with the specialized bytecode avoiding adding any extra call frames to the stack as seen here:
gmod_Et93mxsygQ

The debug info is also intact, thanks to the initial debug info remap step.

Todo

  • Create clone of original function to call from the detour
  • Handle case where the target has no upvalues
  • Handle vararg functions (less priority)

@yogwoggf yogwoggf self-assigned this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lua detouring

2 participants