-
Notifications
You must be signed in to change notification settings - Fork 4
56 - Lua detours #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
yogwoggf
wants to merge
23
commits into
master
Choose a base branch
from
56-lua-detours
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
56 - Lua detours #58
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…verwriting the TValue in place.
…place. This caused really nasty errors if upvalues were shared between protos, which most are if they're in the same source file.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #56
WIP
Details
I decided it'd be a good idea for me and others to write down exactly how it works. There is an unfortunate amount of complexity due to LuaJIT's insanity.
1. Debug info remap
Before anything, the debug information of the target proto is mapped onto the detour proto. This makes it nearly impossible to use most debug functions to detect the detour. There is still detection possible, more on that later.
2. Cloning
To facilitate the calling of an original function, we need to clone the original function such that detours can tailcall into the original function easily.
Memory allocations
There is a new helper in
autorun-luajitthat enables allocating GC objects in LuaJIT as if they're properly sanctioned. This means no memory leaks or anything that would mess with the GC system. It follows the usual allocation scheme, which is:G(L)->allocfto allocate a block of memoryGCHeaderin the base of the allocationFor open upvalues, you'd need to do something way more complicated, but so far I've found success in simply just using closed upvalues.
Proto cloning
The first thing to do is to clone the underlying GCproto, which is easy thanks to the
sizeptfield which represents the entire size of the proto, not just the GCproto struct. The original proto is then copied byte-by-byte to the new proto. After this, we collect all of the internal proto structures (uvinfo, lineinfo, k, etc.), compute their offsets to the base allocation, and fix these pointers in the cloned proto.Function cloning
The second thing to do is clone the underlying GCfunc, which is not so easy thanks to its small design. The core problem is that GCfuncL stores the upvalue array in two parts. The first upvalue is stored in its own struct, while the rest are stored in a contiguous array adjacent to the base memory allocation. The upvalue array is just a list of
GCRefs that point to the realGCupvalobjects. At this point, these are all closed since they're not being used in any active function.Moving on, the new GCfunc is allocated, with the upvalue array dynamically allocated according to the proto's
sizeuvfield. After this, we carefully copy the original GCfunc and its associated upvalue array to the new one. At this point, the new GCfunc points to the old proto and the upvalue array still points to the old upvalues.The easiest thing to fix is the
pcfield pointing to the old proto. This is simply just set to the bytecode of the new proto, which is directly after theGCprotostruct.Now, we need to clone and fix each upvalue in the function's upvalue array, which is fairly complicated. The general process is, clone a new
GCupvalby allocating it in the usual GC list (not the special one for open UVs), copy it byte by byte from the original upvalue, and then store the original TValue inside the upvalue in its closed form. Then, we change the disambiguation hash so that it is not confused with the old upvalue. After this is finished, we can then point the GCRef in the GCfunc's upvalue array to our new GCupval, removing any trace of the old upvalue while keeping its value the same.Push to stack
After all of this is done, and the function and proto are binded, we simply create a TValue of type
LJ_TFUNCand push it to the stack, which allows the script to use the new function. The addresses of the original and new function are obviously different, but basically every single property of them are identical.3. Detour bytecode engine
To facilitate fast and clean detouring, I decided to go with a bytecode approach. This system assumes that the detour function can be accessed quickly via upvalue zero. This enables us to implement a low-cost detour for the target function that specially tailcalls into the detour function, removing any trace of the detour.
Frame sizes
The original frame size of a function and the detoured frame size differ. This is a detection vector, but an easily patchable one. The issue though, is that the detour requires register re-allocation due to the fact arguments are passed in registers 0-(nargs-1). To facilitate re-allocation, we must allocate double the argument count, and also account for a few reserved registers. This comes out to
2 * nargs + 2, with an extra slot required for variadic detours.This is then updated in the
GCprotostructureBytecode emission
The bytecode structure of basic detours is as follows:
This simply sets up the function to request the necessary amount of stack slots needed. Then, we pull in the detour function to register
nargs(after arguments) from upvalue 0. After that, we perform a dynamic reallocation of all of the arguments, shifting them after the detour register so that they will be called correctly. Finally, theCALLTinstruction is emitted, which is just a tailcall instruction which calls the detour function. It has the added benefit of handling any potential return the detour may make without any explicit handling required.Varg bytecode emission
It is much different for variadic functions, however. Variadic functions may have fixed arguments, which are then allocated like normal functions. The difference is, varg handling requires an extra slot, which means we must account for that during register allocation. Another edge-case is that
CALLTdoes not handle varargs correctly since they introduce a pseudo-call frame (seeFRAME_VARG). We instead useCALLMTwhich is similar, but handles vargs correctly and also contains the same return handling functionality.We also need to emit the
VARGinstruction, which is an ABC-encoded opcode that sets up the vararg pseudo-call frame for the incoming detour call. This passes the variadic arguments correctly.This is especially useful for dealing with functions like
hook.Call, which contain a mix of both fixed arguments and variadic arguments.Space checks
For now, we enforce functions have at least one upvalue. I believe I can work around this, but it will be done later as it will probably require several new complicated systems to handle upvalue allocation.
We also enforce functions have enough bytecode to contain the detour bytecode. It is simply overwritten as the LuaJIT VM ISA is not variably sized and has a constant instruction width.
4. Upvalue replacement
To pass the detour function, we overwrite the first upvalue in the function
Upvalue cloning
We do the same upvalue cloning as described before, but only target the first GCR pointing to the GCupval struct. We then get rid of this struct and clone our own. This way, any functions sharing the same GCupval do not get affected by another function's detour. This is particularly useful for detouring the
hooklibrary as it has many shared upvalues.We clone the GCupval and then replace the value inside the upvalue to a TValue pointing to our detour function.
Upvalue closing
In this case, the upvalue may or may not be closed. We do not take the chance and forcibly close it, by setting
GCupval->closedto 1 and settingGCupval->vto the TValue union located atGCupval->uv. This makes the GCupval store the TValue and not rely on any potentially unstable MRef references that could get deleted at the next GC pass.5. Detoured
At this point, the function is now successfully detoured, with every call redirecting to the detour function properly. The original function is also intact, likely being called by the detour callback. This should work fine as our operations are definitely interesting, but not invalid. So far, I have tried the detouring system multiple times and have seen rare one-off crashes, which may be related to some race conditions. GC passes work as expected and do not interfere with any detours.
To avoid detection, we will need to remap any debug calls to the original function, and replace
debug.getinfo'sfuncfield with the detoured one to avoid letting any scripts know that two copies exist.Otherwise, it is fairly secure, with the specialized bytecode avoiding adding any extra call frames to the stack as seen here:

The debug info is also intact, thanks to the initial debug info remap step.
Todo