-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tr1/text: improve text handling in TR1 #1933
Conversation
Download the built assets for this pull request: |
c76de3f
to
441aa28
Compare
The time-consuming O(n^2) loop that compared user string characters with all possible glyphs has been replaced by `uthash` lookups for improved glyph lookup speed. This requires precise glyph size knowledge, which involves some additional parsing. An added benefit is improved handling of unknown Unicode glyphs: by moving through entire codepoints rather than incrementing the pointer by 1 byte, the process avoids ending in the middle of an incomplete UTF-8 codepoint.
@aredfan fixed and made sure that we do not attempt to draw unavailable sprites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic, thank you for doing this.
Noticed a small issue with scaling in the details menu, but it's on develop too - will raise a separate ticket.
Checklist
Description
This pull request extends the named sequences support we have introduced in 4.6 to also support Unicode strings. I have confirmed the following languages to have full coverage:
Asian and Arabic languages remain unsupported at the moment. While Arabic is still far away due to the RTL rendering order, we're much closer to supporting CJK.
The sprites for the characters come from Arsunt's extended font posted in Tomb Raider Forums.
Pivotal for this feature is a textfile containing manual Unicode codepoint mappings. Although I initially experimented with JSON, YAML, and CSV, I discovered through testing that using a DSL (domain-specific language) designed specifically for this purpose offers the best readability.
The mapping file is used by the tooling in the tools/glyphs/ directory and serves two roles:
It generates C macros that map Unicode code points and escaped sequences to O_ALPHABET's sprite indices, specify glyph dimensions, and instruct how to compose compound characters - all getting hardcoded into the executable.
It directs the injector tool in creating the font.bin file that contains O_ALPHABET sprite bitmaps, along with additional positional information.
Some sprite indices are fixed. This is for compatibility with the original game to retains original text format even if font.bin goes missing.
Creating sprites for all possible accented characters is a challenging and resource-intensive task. Instead, the mapping allows us to combine certain characters so that the game overlays one glyph on another. However, we only support one accent per glyph. Consequently, Vietnamese, despite using the Latin alphabet, is currently unsupported due to its extensive use of diacritics.
As we now have many more glyphs to compare, the time-consuming O(n^2) loop that matched user string characters with all possible glyphs has been replaced by
uthash
lookups for faster glyph retrieval. This approach requires precise knowledge of glyph sizes, necessitating some additional parsing, but it benefits from eliminating ambiguity in glyph matches. An additional benefit is improved handling of Unicode codepoints without declared mappings: by traversing entire codepoints rather than incrementing the pointer by 1 byte, the process avoids ending up in the middle of an incomplete UTF-8 codepoint, preventing garbled text.