Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tr1/text: improve text handling in TR1 #1933

Merged
merged 2 commits into from
Nov 24, 2024
Merged

tr1/text: improve text handling in TR1 #1933

merged 2 commits into from
Nov 24, 2024

Conversation

rr-
Copy link
Collaborator

@rr- rr- commented Nov 21, 2024

Checklist

  • I have read the coding conventions
  • I have added a changelog entry about what my pull request accomplishes, or it is an internal change

Description

This pull request extends the named sequences support we have introduced in 4.6 to also support Unicode strings. I have confirmed the following languages to have full coverage:

Basque Irish
Belarusian Italian
Bosnian Latvian
Bulgarian Lithuanian
Catalan Macedonian
Croatian Malay
Czech Maltese
Danish, Northern Sami
Dutch Norwegian
English Polish,
Estonian Portuguese
Faroese Romanian
Finnish Russian
French Serbian
Galician Slovak
German, Slovenian
Greek Spanish
Hungarian Swedish
Icelandic Turkish
Indonesian …and possibly more.

Asian and Arabic languages remain unsupported at the moment. While Arabic is still far away due to the RTL rendering order, we're much closer to supporting CJK.
The sprites for the characters come from Arsunt's extended font posted in Tomb Raider Forums.

Pivotal for this feature is a textfile containing manual Unicode codepoint mappings. Although I initially experimented with JSON, YAML, and CSV, I discovered through testing that using a DSL (domain-specific language) designed specifically for this purpose offers the best readability.
The mapping file is used by the tooling in the tools/glyphs/ directory and serves two roles:

  1. Hardcoding Unicode to sprite mapping
    It generates C macros that map Unicode code points and escaped sequences to O_ALPHABET's sprite indices, specify glyph dimensions, and instruct how to compose compound characters - all getting hardcoded into the executable.
  2. Guidance for font.bin creation
    It directs the injector tool in creating the font.bin file that contains O_ALPHABET sprite bitmaps, along with additional positional information.

Some sprite indices are fixed. This is for compatibility with the original game to retains original text format even if font.bin goes missing.
Creating sprites for all possible accented characters is a challenging and resource-intensive task. Instead, the mapping allows us to combine certain characters so that the game overlays one glyph on another. However, we only support one accent per glyph. Consequently, Vietnamese, despite using the Latin alphabet, is currently unsupported due to its extensive use of diacritics.

As we now have many more glyphs to compare, the time-consuming O(n^2) loop that matched user string characters with all possible glyphs has been replaced by uthash lookups for faster glyph retrieval. This approach requires precise knowledge of glyph sizes, necessitating some additional parsing, but it benefits from eliminating ambiguity in glyph matches. An additional benefit is improved handling of Unicode codepoints without declared mappings: by traversing entire codepoints rather than incrementing the pointer by 1 byte, the process avoids ending up in the middle of an incomplete UTF-8 codepoint, preventing garbled text.

@rr- rr- added Feature New functionality TR1 labels Nov 21, 2024
@rr- rr- self-assigned this Nov 21, 2024
Copy link

github-actions bot commented Nov 21, 2024

@rr- rr- force-pushed the glyphs branch 2 times, most recently from c76de3f to 441aa28 Compare November 24, 2024 16:58
@rr- rr- changed the title tr1/text: experimental support for injected glyphs tr1/text: improve text handling in TR1 Nov 24, 2024
@rr- rr- marked this pull request as ready for review November 24, 2024 17:26
@rr- rr- requested review from a team as code owners November 24, 2024 17:26
@rr- rr- requested review from lahm86, walkawayy and aredfan and removed request for a team November 24, 2024 17:26
@aredfan
Copy link
Collaborator

aredfan commented Nov 24, 2024

Overall LGTM. The only issue I found is the TRUB gameflow doesn't point to the font.bin file, which causes this issue.

20241124_180430_Return_to_Egypt

rr- added 2 commits November 24, 2024 19:25
The time-consuming O(n^2) loop that compared user string characters with
all possible glyphs has been replaced by `uthash` lookups for improved
glyph lookup speed. This requires precise glyph size knowledge, which
involves some additional parsing.

An added benefit is improved handling of unknown Unicode glyphs: by
moving through entire codepoints rather than incrementing the pointer by
1 byte, the process avoids ending in the middle of an incomplete UTF-8
codepoint.
@rr-
Copy link
Collaborator Author

rr- commented Nov 24, 2024

@aredfan fixed and made sure that we do not attempt to draw unavailable sprites.

Copy link
Collaborator

@lahm86 lahm86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic, thank you for doing this.
Noticed a small issue with scaling in the details menu, but it's on develop too - will raise a separate ticket.

@rr- rr- merged commit a8d4af8 into develop Nov 24, 2024
7 checks passed
@rr- rr- deleted the glyphs branch November 24, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New functionality TR1
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Some symbols are not displayed in the game Accents in Spanish words
4 participants