Skip to content

yodaxtah/jakx-c-kernel-decompiled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jakx-c-kernel-decompiled

A repository to work on a part of the Jak X decompilation, with the purpose of porting the game to PC. This port requires the decompilation of the C++ kernel and the GOAL code that is ran by that kernel.

C++ decompilation

This project's primary purpose is to provide for the game/jakx and common/jakx C++ (kernel) code for the OpenGoal Project, necessary to run Jak X. As the game also useses networking, the secondary goal is to reverse the rest of the (SCE-RT) functions' labels (Medius, etc.) so that the game can hopefully be connected to unofficial fan-hosted servers. Even if we'd step away from Medius code in the client-side of the game, it's still useful to see what the names are of network related C function calls in GOAL code during GOAL decompilation. This way, people decompiling that GOAL code can better guess/understand what that GOAL code calling those functions is doing.

The following code can be found in this project:

  • elf/kernel/jakx and elf/kernel/common: The Jak X C++ code. (For now, still pseudo-code.)
  • elf/cpp-dump: The pseudo code generated by Ghidra with all symbols we have added --- months of work pourred in, so it is often nothing near the original ELF's export. This is useful to have some reference of the binary, in case that'd be necessary and I'd be unavailable.

GOAL decompilation

As said, this project will focus on the C++ part of the code base, with the intention of merging/bringing it later back into the OpenGOAL project. Regarding the part of the codebase that consist of GOAL code, initial work has already been done to start decompilation in this pull request.

ELF overview

Using one of the scripts, you can generate an exhaustive label overview of the ELF. There are 4512 occurrences of the pattern "function" in that dump, which should be the exact number of functions reversed. There might be a handful of functions that have not been disassembled (discovered) yet, but that number should be low. There are 1310 occurrences of the regex pattern "FUN_........", hence a third of all functions have no information at all on known functions (yet). There are 1659 occurrences of the regex pattern ".*FUN_.........*", so additional information is available on about 300 of them, and this should be the number of functions that could not be matched. They are however not necessary to port the game, but they might make it a little bit easier.

Symbol matching

In order to make sense of the ELF, I've been primarily adding symbol names, which is what I refer to "symbol matching". Unlike efficient approaches, I've been working from the ground up: I've invested time in adding as much symbols as possible before trying to export decompiled code.

Matching sources

I matched against several bsim servers that held the definitions of the following games. I looked at games around the same date that would have a shared codebase (i.e., Jak and Daxter, obvisouly, Medius, or simply around 2005, the release of Jak X). Retro Reversing has listed a few PS2 games with unstripped symbols and PS2 demos with symbols.

Next to that, I also got symbols from the PS2SDK project, where I could compare strings at best, or compare enums or other variables at worst. Sometimes I also copied over signatures from there.

Naming convention

My naming scheme changed over time as I noticed I needed to be more precise on where my symbols came from and how well I could trust them. This means that I cannot describe something here that will definitely fit all situations. I usually copied over the names and added a suffix, i.e. foo_G.

  • _G: Usually, I find these symbol names in other guess symbols, but I'm not entirely sure as they are not matching perfectly. Careful though, the names may also be completely made up, so check the above referene symbols. With global variables, this is usually made clear by using ALL_CAPS_STYLE_G, but not consistent.
  • _S: These symbol names are based on a string. This means I'm already confident I'm correct (I used to use _G or _Q for this as well). Later on, I typically remove these labels either way. Some names cannot be verified and in those cases, I would rarely removed the suffix. (Example: FUN_00133cc8_addPurchase calls addPurchase_S and also prints "addPurchase error" after checking its result.)
  • _Q: These symbol names are either guessed by matching functions recursively in BSIM search windows, or based on strings. In general, I was moderatly confident they were right, but wanted to come across another occurrence I could verify to be absolutely sure. I later started to use _S for string sources that would give away a name.
  • _W: I'm guessing a name wildly, based on some function body or data structure that is related somewhere. (I used to use _G for this as well.)
  • _T: The source is one of the tables; depending on how certain I am, it will be combined with W, G or nothing.
  • no prefix: This may mean I'm sure it's correct or that I named the symbol that way early on when I wasn't careful and it might even be made up, or simply a guess.

Symbol Transfer

The address of the binary dump from PCSX2 and the decrypted ELF match exactly. As most of the initial work was still in the memory dump, which was the most relevant for the decompilation of the game/jakx and common/jakx C++ code, I created a few scripts to go through the code and usually interactively ask whether to override a symbol or not.

To execute them, I simply copied over the code into Ghidraton (but the Python Window should work to with some small changes, normally). Most scripts (are horribly coded but) work fine. It is recommended though to minimally understand what the scripts do, after all they're small anyway. If you're not familiar with the Ghidra API, some knowledge on them could always be handy --- you could ask Perplexity/ChatGPT, as they surprisingly know the API very well!

Note however that some specific sce symbols (e.g. sceLseek, etc.) have a 8 byte address mismatch to my (and anyone's?) memory dump, for some reason. That might be a bug in the function label porting script, where I use +/- to offset and navigate the code.

In one of the scripts, you might be getting this error when you try to apply a signature override in Ghidra. (I came across it when doing this manually.) This should occur whenever the function call has a BLUE label "ptr_addr1_addr2". If it's either WHITE or simply "LAB_addr", then it's fine and shouldn't happen.

Error overriding signature: ghidra.util.exception.InvalidInputException: DataTypeSymbol has a reference
---------------------------------------------------
Build Date: 2024-Jun-07 1416 EDT
Ghidra Version: 11.1
Java Home: C:\Program Files\Eclipse Adoptium\jdk-17.0.11.9-hotspot
JVM Version: Eclipse Adoptium 17.0.11
OS: Windows 10 10.0 amd64
Workstation: REUBUS

To resolve the above error from appearing (for that function call signature override), simply remove the ptr label, so that it will be a blue "LAB_addr" label, and try again. You might be able to execute the signature override, I wasn't as I think I ran against a bug that might get fixed in the future. (The code is of course perfectly fine ;p)

String Tables

From what I understand from Perplexity, string tables are used for symbol resolution and to serve as debugging information. The ones I found all 10 reside in .text, but sadly, somewhere around function entry, they stopped appearing. I have labeled the start of the string tables with CPP_FILE.

I've used a prompt to apply these names as Perplexity is very good at transforming these strings into signatures, but they require double checks. Additionally, the return types appear not to be reliable --- is it even part of those strings? Further down in this conversation, you can find a few examples that are useful to learn how to interpret these strings. The prompt that gave okay results is the following:

I'm trying to figure out the signature of a symbol that I found in a C++ string table for symbol resolution and debugging. What would be the signature of the following mangled name: _videoCallbackEP7sceMpegP16sceMpegCbDataStrPv.

Todolist

Large tasks:

  • Although I don't expect to gain much from it, one can try to match the functions against those of Jak 1 or Jak 2.
  • gcc2_compiled. functions in other (demo) games have additional labels that give away their names, apparently. I noticed this too late, but it would be helpful to locate other nameless functions if we can match these gcc functions.
  • Apply mangled symbol names from tables (under CPP_FILE) if reliable.
  • Find source of orphaned strings (001eba50, 001e78b0, 001e78d0, ``)

Less important details to check:

  • Is DAT_001f63e0 (or lower) an array of thread ids?
  • What are these functions for?
    DAT_001f5b78_func1 = 0;
    DAT_001f5b7c_func2 = 0;
    DAT_001f5b80_func3 = 0;
    
  • Compare print functions with new sources. NOTE: the print functions are a mess, don't try to fix their names, as sources will contradict. (For example, fiprintf in all binaries call each _vfiprintf_r, but differently.)
  • Iterate over all matches of the regex pattern 0x(1|2)[0-9a-f]{5} to find addresses that should be labeled instead. Currently, there are 787 in the decrypted ELF.
  • ... many more that I forgot to write down.

Log

  • I dumped the PAL game's EE memory using PCSX2
  • I added a ton of debugging symbols from other games (such as R&C for SCE-RT) by comparing the functions
  • I added types (structs, enums, etc) where possible.
  • At that point, I could just reimplement the engine, because it was reversed sufficiently and I can compare with jak 3's implementation.
  • However, I wanted to squize out all information I could find, so I kept on reversing. After all, I reasoned that for the online component, it was necessary to understand what calls were made, as well as to make it easier to reverse the goal code.
  • Then I got fed up that I was searching through goal code a few times without realizing it, so I decided to look again at the original, encrypted PAL version. There, I recognized a function I had come across already in my memory dump (through other games). That's how I was able to decrypt the game's ELF, with the help of Ziemas. The "decryption" prototype can be found on Github.
  • Then, in December 2024, I started creating scripts to port the symbols from the memory dump to the decrypted ELF.
  • In January, I started exporting the C++ code.

About

A decompilation of Jak X's C kernel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published