-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open error: Gigi Rf.res Illegal byte sequence #81
Comments
This file is probably supposed to be called "Gigi Rüf.res", named after the Snowboarder. So either the file can't be extracted, or it has to be renamed (eitherway, resulting in a potentially unusable dump for many purposes). With a bit of luck/work on extract-xiso, the umlaut can likely be preserved, but it would still affect the filename (meaning the game / tools probably wouldn't find it). However, the solution likely depends on why you are extracting the XISO in the first place. |
Any modern FS supports Unicode file names, so I doubt it could be a problem. I tested the reported ISO under Linux and the problem is there also when listing files (also Windows is affected when an incompatible character set is used). The underlining problem is extract-xiso treating filenames as character sequences, without any understanding of the character set. The solution would be to convert the filenames to Unicode (UTF-8 would probably be the chosen encoding), and let the FS decide how to rename incompatible characters. Since we're dealing with pretty standard characters, there shouldn't be any loss when converting back and forth, but it's not a guarantee. EDIT: To be clear, it's not that the file system does not support the character ü, but that the Windows-1252 representation of the character (0xFC) is considered invalid, probably because the ASCII (7-bit) character set is in use. |
I fixed the issue in my development branch (#80). It needs some testing, though. Remarks:
|
Right, I'm not saying you can't visually represent the names, I'm saying you are making it binary incompatible, because all tools (modding suites, emulators, tools which transfer files to your Xbox / default.xbe, other XISO tools, ,..) have to agree on how they handle this (= use existing byte sequences on filesystems which can have them? always convert? prefix illegal sequences with non-printing modifiers and add a visual umlaut? ...). Hence:
I've also seen at least one game (Furious Karting maybe?) which uses small files, which also acts as copy-protection, because extracting each file in the XISO to FATX (or many other filesystems) requires a full sector, so the size of the game explodes. It's not hard to imagine that certain games use specific filenames which aren't easily supported in other filesystems as protection, too. However, it depends on what you want to do with those files, because other losses during extraction include the loss of metadata and file offsets within the image (which, again, might be used by the copy protection). |
I got the point, and this is a problem that goes far beyond the OG Xbox, since Unicode support is basically always broken. And when it's not broken, the implicit "best fit" approach makes you lose information without even realizing.
That would be a problem of the tool, not ours.
I don't see how this is a problem, since files still use at least one sector in XISOs. Sure, those are 2048 bytes sectors, versus the typical 4096 bytes sector, but the game will at most double in size compared to the ISO size. Not ideal, but not too bad either.
Not a huge difference, so I don't know if you were talking about another game. Just curiosity, though.
I agree. Some loss of information will always occur, the important thing is to know where and when it occurs, and plan accordingly. ¹ with the exception of APFS on Sierra and probably some more exotic OS/FS combinations. |
Sounds right, now I wonder if I misremember or if it has just been a bug in some tooling. 🤔
Agreed, although I'm not 100% sure we should enforce UTF-8. I think it might be better to allow the user to set a charset (but defaulting to UTF-8). Anyhow, if Filezilla has support for re-interpreting charsets, that solves a lot of the problems already for a common use-case. |
We don't need to support any user charset, the OS does it for us (mostly), The are three charsets the OS knows of:
And then there's the charset of the data (in our case always Windows-1252, since it's the one used by the Xbox), that the OS known nothing about. When we want to talk to the OS, we need to translate¹ the data charset to the program charset. At this point, if we're printing a filename, the OS translates it to the terminal charset and prints it. If we're creating a file, instead, the OS translates it to the FS charset and creates the file. ¹Translating from one charset to the other works like this: for every character in the source string, if the destination charset can represent the character, convert it to the new encoding, otherwise use a "best fit" approach (i.e. ü becomes u). Two observations:
Unfortunately, not all systems support CP1252, so we still need to find a "universal" program charset. UTF-8 is the best candidate because it's ALMOST universal. We could try to support more charsets, but that would require handling the translations between CP1252 and those charsets, and we could only ever support charsets that can actually represent all CP1252 characters. One change I will make to my implementation is to fallback to Windows-1252 if UTF-8 is not supported, since, as I understood, UTF-8 support is only available since a specific Windows 10 version. |
I implemented the change and added a comment on #80 to explain a problem. I don't know what the best approach would be though. |
Extracting "Amped - Freestyle Snowboarding" ISO produces an error on MacOS.
Running with
LC_ALL=C
produced the same result.The text was updated successfully, but these errors were encountered: