Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 39 additions & 4 deletions src/lib/vm/objects/string_.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ use crate::vm::{
#[derive(Debug)]
pub struct String_ {
cls: Cell<Val>,
/// The number of Unicode characters in this string. This is initialised lazily: usize::MAX is
/// the marker that means both `this string contains usize::MAX characters` and `we don't know
/// how many characters this string has`. The former condition is rather unlikely, so we accept
/// that if anyone ever manages to make a string of usize::MAX characters, we won't cache its
/// length correctly, but will recalculate it each time.
chars_len: Cell<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use 0 as a marker. This way we don't need to recalculate the length if someone manages to hit the worst case. Or is the call to self.s.chars().count() on an empty string that much more expensive than self.chars_len.get()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use 0 as a marker, it means that for 0 length strings, we still do a call of chars().count(). This seems a pretty common case to me. With usize::MAX, you'd need a string of 18014398509481984KiB or 17592186044416MiB or 17179869184GiB or 16777216TiB or 16384PiB or 16EiB to hit the worst case. That's a big string so I'm OK with being a bit slow in such cases since it's roughly 6-7 orders of magnitude bigger than any machine's RAM :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, you win. Though I do feel for whoever does hit that string someday as they'll have to wait a loooong time for their length function to return. ;-)

Out of curiosity though, how much more expensive is the call to self.s.chars().count() on an empty string, compared to the call to self.chars_len.get()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cell is a zero-cost wrapper that means "this value might have been mutated". Therefore self.chars_len.get() should translate, at worst, to a pure load from memory (but in some cases, with inlining, the value can probably be even left in a register).

s: SmartString,
}

Expand All @@ -42,6 +48,7 @@ impl Obj for String_ {
vm,
String_ {
cls: self.cls.clone(),
chars_len: Cell::new(usize::MAX),
s: self.s.clone(),
},
))
Expand All @@ -54,7 +61,12 @@ impl Obj for String_ {
}

fn length(self: Gc<Self>) -> usize {
self.s.chars().count()
if self.chars_len.get() < usize::MAX {
return self.chars_len.get();
}
let chars_len = self.s.chars().count();
self.chars_len.set(chars_len);
chars_len
}

fn ref_equals(self: Gc<Self>, vm: &mut VM, other: Val) -> Result<Val, Box<VMError>> {
Expand All @@ -80,10 +92,17 @@ impl StaticObjType for String_ {

impl String_ {
pub fn new_str(vm: &mut VM, s: SmartString) -> Val {
String_::new_str_chars_len(vm, s, usize::MAX)
}

/// Create a new `String_` whose number of Unicode characters is already known. Note that it is
/// safe to pass `usize::MAX` for `chars_len`.
fn new_str_chars_len(vm: &mut VM, s: SmartString, chars_len: usize) -> Val {
Val::from_obj(
vm,
String_ {
cls: Cell::new(vm.str_cls),
chars_len: Cell::new(chars_len),
s,
},
)
Expand All @@ -94,6 +113,7 @@ impl String_ {
vm,
String_ {
cls: Cell::new(vm.sym_cls),
chars_len: Cell::new(usize::MAX),
s,
},
)
Expand Down Expand Up @@ -136,7 +156,18 @@ impl String_ {
let mut new = String::new();
new.push_str(&self.s);
new.push_str(&other_str.s);
Ok(String_::new_str(vm, new.into()))
if self.chars_len.get() != usize::MAX && other_str.chars_len.get() != usize::MAX {
// If both strings have had their number of characters initialised then we can
// initialise the new string with its `chars_len` eagerly initialised.
let chars_len = self
.chars_len
.get()
.checked_add(other_str.chars_len.get())
.unwrap();
Ok(String_::new_str_chars_len(vm, new.into(), chars_len))
} else {
Ok(String_::new_str(vm, new.into()))
}
}

pub fn substring(&self, vm: &mut VM, start: usize, end: usize) -> Result<Val, Box<VMError>> {
Expand All @@ -156,11 +187,15 @@ impl String_ {
.take(end - start + 1)
.collect::<String>()
.into();
Ok(String_::new_str(vm, substr))
Ok(String_::new_str_chars_len(vm, substr, end - start))
}

pub fn to_string_(&self, vm: &mut VM) -> Result<Val, Box<VMError>> {
Ok(String_::new_str(vm, self.s.clone()))
Ok(String_::new_str_chars_len(
vm,
self.s.clone(),
self.chars_len.get(),
))
}

pub fn to_symbol(&self, vm: &mut VM) -> Result<Val, Box<VMError>> {
Expand Down