fix(perf): Remove FontSources.masks as they were consuming large amounts of memory even when no font sources were set#2519
Conversation
| } | ||
|
|
||
| let mut needed = self.masks[(start as usize) / CP_RANGE_SIZE].clone(); | ||
| let needed: &BTreeSet<usize> = &self.masks[(start as usize) / CP_RANGE_SIZE]; |
There was a problem hiding this comment.
@nyurik Why did you use masks in the first place here instead of something like
| let needed: &BTreeSet<usize> = &self.masks[(start as usize) / CP_RANGE_SIZE]; | |
| // Create a BitSet for the requested range (start..=end) | |
| let mut needed = BitSet::with_capacity(end as usize); | |
| for cp in start..=end { | |
| needed.insert(cp as usize); | |
| } |
In the testing I have done, this works fairly well, but there must be a reason why this is done this way that I am not getting..
There was a problem hiding this comment.
tbh, it has been a long time since i did it, so no clue :)
There was a problem hiding this comment.
I can imagine BitSet would work well for a small range like 0-255 but if you support the whole unicode range you need a lot of memory per bitset.
| face_index: isize, | ||
| /// Unicode codepoints this font supports. | ||
| codepoints: BitSet, | ||
| codepoints: BTreeSet<usize>, |
There was a problem hiding this comment.
Unsure if there is a better way.
Lets brainstorm a bit.
so we need to know which if the 256 bits -> BitSet256 is set.
While we curently store this as one very big BitSet, that is likely not nessesary, since this range is likely fairly compressible.
No font designer would leave weird gaps in their font intentionally I think.
Vec<BitSet256>works and weighs in at ~1.1 megabytes (MAX_UNICODE_CPBits)BTreeSet<u32>works and is 32B * number of font-items -> Likely lower than above.HashMap<u16, BitSet256>((start / 256) -> next 256 code points) should works and is a compromise between the two?
There was a problem hiding this comment.
Noticed that only u32 is needed for the index, that is another halfing of that memory usage.
This also stacks with the improvement above.
There was a problem hiding this comment.
Yeah there was a mix of usize and u32 around so definitely could standardise on u32 and avoid a bunch of casting
There was a problem hiding this comment.
The problem with using a BitSet256 is you need to range it from 0-255 but the codepoints are 0-0x10FFFF so you either need to translate the requested codepoints back to 0-255 before masking (losing the performance of a bitset) or have one GIANT BitSet for each range. If you check the size of each BitSet before you'll see some are as big as 0x10FFFF (depending on the largest inserted value).
| let mut bs = BitSet::with_capacity(CP_RANGE_SIZE); | ||
| let mut set = BTreeSet::new(); | ||
| for v in 0..=MAX_UNICODE_CP { | ||
| bs.insert(v); |
There was a problem hiding this comment.
I think the core issue is that although we allocate a BitSet::with_capacity(256) the value for v here might be 0x10FFFF causing a MASSIVE reallocation to fit values 0-0x10FFFF in the BitSet. From what I understand, the values must be contiguous without holes so each BitSet could be as large as (0x10FFFF / 32) (bits per u32) * 4 (bytes per u32) and then each of these is inserted for each range up to 4352.
There was a problem hiding this comment.
my understanding is that each file would use a 136KB of RAM (136*1024*8 = 1_114_112 = 0x11_0000 codepoint presence bits) - not that much memory. If we follow the optimization I proposed below, most fonts won't even get that high, but it will keep the complexity to the minimal.
There was a problem hiding this comment.
The issue is less with each font source since it's only a single BitSet of 34816 u32 (139KB). The issue is with the pre-computed masks. Each range (of which there are 4352) carries a masking BitSet of up to 0x11_0000 (34816 u32 [4 bytes] in the backing BitVec). Assuming every range is this size (inaccurately) we get 600MB (34816*4*4352 bytes). More accurately we get masking BitSets with sizes growing from 8-34816 u32 and more likely around 300MB total. Prior to adjusting the max codepoint size we had smaller BitSets (8-2048 u32) and less of them 256 (0xFFFF/256). Resulting in at most ~2MB (2048*4*256 bytes) for masks but likely less due to not all BitSets being 2048 size.
|
one thing we could try without any other changes: fn get_available_codepoints(face: &mut Face) -> Option<GetGlyphInfo> {
- let mut codepoints = BitSet::with_capacity(MAX_UNICODE_CP);
+ let mut codepoints = BitSet::new(); // dynamic memory growth
...
for cp in 0..=MAX_UNICODE_CP {
...
+ if count >= face.num_glyphs() { break; } // we know we won't find any more glyphs in this font |
|
P.S. Oh, silly me -- we no longer need to even do that -- I added character iteration to the freetype lib 2 years ago -- PistonDevelopers/freetype-rs#257 so should just be able to call |
|
So maybe no masks? I was wondering how expensive it would be to iterate vs difference the set in the first place |
|
freetype-rs doesn't support iteration from arbitrary location (PRs welcome i guess), so i think it might still be useful to have a bitmask for perf reasons, esp since it is much simpler now. Since you ran all the memory usage tests, could you try this implementation? fn get_available_codepoints(face: &mut Face) -> Option<GetGlyphInfo> {
let mut codepoints = BitSet::new();
let mut spans = Vec::new();
let mut first: Option<usize> = None;
let mut last = 0;
for (cp, _) in face.chars() {
codepoints.insert(cp);
if let Some(start) = first {
if cp != last + 1 {
spans.push((start, last));
first = Some(cp);
}
} else {
first = Some(cp);
}
last = cp;
}
if let Some(first) = first {
spans.push((first, last));
let start = spans[0].0;
let end = spans[spans.len() - 1].1;
Some((codepoints, usize::try_from(face.num_glyphs()).unwrap(), spans, start, end))
} else {
None
}
} |
|
(didn't test, but had to update the above a few times) |
|
interestingly enough, my original implementation seemed to have had a bug - if a font file had a codepoint at the |
|
(note to self - github editor is NOT that great for ... editing code) @Auspicus have you had a chance to memory profile it? |
| /// Maximum Unicode codepoint range ID. | ||
| /// | ||
| /// Each range is 256 codepoints long, so the highest range ID is 0xFFFF / 256 = 255. | ||
| /// Each range is 256 codepoints long, so the highest range ID is 0x10FFFF / 256 = 255. |
There was a problem hiding this comment.
Should fix this number which is now 0x10FFFF/256 = 4352 (rounded up)
| let mut bs = BitSet::with_capacity(CP_RANGE_SIZE); | ||
| let mut set = BTreeSet::new(); | ||
| for v in 0..=MAX_UNICODE_CP { | ||
| bs.insert(v); |
There was a problem hiding this comment.
The issue is less with each font source since it's only a single BitSet of 34816 u32 (139KB). The issue is with the pre-computed masks. Each range (of which there are 4352) carries a masking BitSet of up to 0x11_0000 (34816 u32 [4 bytes] in the backing BitVec). Assuming every range is this size (inaccurately) we get 600MB (34816*4*4352 bytes). More accurately we get masking BitSets with sizes growing from 8-34816 u32 and more likely around 300MB total. Prior to adjusting the max codepoint size we had smaller BitSets (8-2048 u32) and less of them 256 (0xFFFF/256). Resulting in at most ~2MB (2048*4*256 bytes) for masks but likely less due to not all BitSets being 2048 size.
| let mut bs = BitSet::with_capacity(CP_RANGE_SIZE); | ||
| let mut set = BTreeSet::new(); | ||
| for v in 0..=MAX_UNICODE_CP { | ||
| bs.insert(v); |
|
I am just starting my workday now but I'll review again after work today. I will also post the dhat file in the viewer as it really highlights the BitSet allocations. |
…an BTreeSet at full capacity), lower lambda timeout and memory usage back to explicit default
| /// Returns `None` if the font contains no usable glyphs. | ||
| fn get_available_codepoints(face: &mut Face) -> Option<GetGlyphInfo> { | ||
| let mut codepoints = BitSet::with_capacity(MAX_UNICODE_CP); | ||
| let mut codepoints = BitSet::new(); |
There was a problem hiding this comment.
Without explicit capacity there will be some required resizing but it allows for this BitSet to be as small as 8 bytes (256 bits) when the font only provides codepoints within the range 0-255. If the font provides even a single codepoint at the end of the range this will need to be resized to 139KB. So we're trading off a small amount of start up time here for a lower memory footprint per font.
|
Before: After: |
|
I'm happy with where this is at now, let me know if you think we need to change anything @CommanderStorm / @nyurik |
|
oops, we should have added to the subject / changelog that this also fixes incorrect font behavior - now the font glyph count will be reported correctly |
## 🤖 New release * `martin-tile-utils`: 0.6.8 -> 0.6.9 (✓ API compatible changes) * `mbtiles`: 0.15.0 -> 0.15.1 (✓ API compatible changes) * `martin-core`: 0.2.5 -> 0.2.6 (✓ API compatible changes) * `martin`: 1.2.0 -> 1.3.0 <details><summary><i><b>Changelog</b></i></summary><p> ## `mbtiles` <blockquote> ## [0.15.1](mbtiles-v0.15.0...mbtiles-v0.15.1) - 2026-01-27 ### Added - add MLT decoding support ([#2512](#2512)) - migrate our log library to tracing ([#2494](#2494)) ### Other - unignore `diff_and_patch_bsdiff` test with unique SQLite database names ([#2480](#2480)) - *(mbtiles)* remove the prefix-ism around how files are named for binary diff copy and simpify their naming ([#2478](#2478)) - *(mbtiles)* add assertion messages what we are checking to the copy tests ([#2477](#2477)) </blockquote> ## `martin-core` <blockquote> ## [0.2.6](martin-core-v0.2.5...martin-core-v0.2.6) - 2026-01-27 ### Added - migrate our log library to tracing ([#2494](#2494)) - *(martin-core)* Allow glyph ranges more than 0xFFFF ([#2438](#2438)) ### Fixed - *(perf)* Remove FontSources.masks as they were consuming large amounts of memory even when no font sources were set ([#2519](#2519)) - improve error message if no SVG sprite files are present ([#2516](#2516)) ### Other - move our imports to tracing ([#2500](#2500)) - *(deps)* shear our dependencys ([#2497](#2497)) </blockquote> ## `martin` <blockquote> ## [1.3.0](martin-v1.2.0...martin-v1.3.0) - 2026-01-27 ### Added - *(srv)* Add `route_prefix` configuration for native subpath support without the need of a reverse proxy override ([#2523](#2523)) - add MLT decoding support ([#2512](#2512)) - migrate our log library to tracing ([#2494](#2494)) - improve martin-cp progress output time estimate ([#2491](#2491)) - *(pg)* include ID column info for tables ([#2485](#2485)) - *(pg)* support PostgreSQL materialized views ([#2279](#2279)) - *(martin-core)* Allow glyph ranges more than 0xFFFF ([#2438](#2438)) ### Fixed - *(ui)* clipboard copy for http://0.0.0.0:3000 and unify implementations ([#2487](#2487)) - the `Copy` icon displaying nicely, next to the text and with enough padding ot all items ([#2483](#2483)) - update copy text to include icon for better visibility ([#2482](#2482)) - *(perf)* Remove FontSources.masks as they were consuming large amounts of memory even when no font sources were set ([#2519](#2519)) - improve error message if no SVG sprite files are present ([#2516](#2516)) ### Other - move our request logging to tracing ([#2508](#2508)) - move our imports to tracing ([#2500](#2500)) - *(deps)* shear our dependencys ([#2497](#2497)) - *(ui)* adjust margin for copy icon in URL component ([#2489](#2489)) - unignore `diff_and_patch_bsdiff` test with unique SQLite database names ([#2480](#2480)) - *(mbtiles)* remove the prefix-ism around how files are named for binary diff copy and simpify their naming ([#2478](#2478)) - *(mbtiles)* add assertion messages what we are checking to the copy tests ([#2477](#2477)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


fixes #2518
This drops the RSS value after the bitsets are computed. Unfortunately the bitset computation still spikes the memory usage quite high.
UPDATE: Swapping out BitSet for BTreeSet results in much lower peak memory usage after startup. I don't think BitSet is doing what we want here.
UPDATE: Replaced BTreeSet back with BitSet for the font codepoints for lower peak memory usage for a font that provides every available codepoint.