Replies: 2 comments 5 replies
-
Thanks, @xunilrj for starting this discussion and doing some real-world measurements is certainly the right start. I want to shortly sum up the main concerns I had with the current approach and outline an alternative storage layout ExtensibilityOne consideration I want to add is the flexibility of the design. For example, what would it mean if we start distinguishing between single-line and multi-line comments, or if we introduce a new line trivia. Introducing a new-line trivia would reduce the cases where we can use the optimized tools/crates/rome_rowan/src/green/token.rs Lines 17 to 24 in 9b74b52 FamiliarityThe other thing I would consider is how familiar a design is for people working on Total memory consumptionRowan sacrifices some performance in favour of total memory consumption by trying to re-use data structures like nodes and tokens. This is still true but we reduced the cacheable types because tokens are now less likely to be cached because they now also include the leading and trailing trivia and we don't cache the trivia on their own. Alternative storage layoutAn alternative approach would be to use a #[repr(u8)]
#[derive(Debug, Clone, Copy)]
pub enum GreenTriviaPieceKind {
Whitespace,
NewLine,
Comment,
}
#[derive(Debug, Clone)]
pub struct GreenTriviaPiece {
width: usize,
kind: GreenTriviaPieceKind,
}
#[derive(Debug, Clone)]
pub struct GreenTriviaHead {
// full width of all trivia it contains so that we don't need to iterate over all trivia
width: usize,
}
#[derive(Clone)]
pub struct GreenTrivia {
ptr: ThinArc<GreenTriviaHead, GreenTriviaPiece>,
}
impl GreenTrivia {
fn header(&self) -> &GreenTriviaHead {
&self.ptr.header
}
pub fn width(&self) -> usize {
self.header().width
}
pub fn slice(&self) -> &[GreenTriviaPiece] {
&self.ptr.slice()
}
}
impl std::fmt::Debug for GreenTrivia {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("GreenTrivia")
.field("width", &self.width())
.field("n_trivia", &self.slice().len())
.finish()
}
}
#[repr(u16)]
pub enum SyntaxKind {
LIST,
WHITESPACE,
COMMENT
}
pub struct GreenToken {}
pub struct GreenTokenHead {
kind: SyntaxKind,
leading_trivia: Option<GreenTrivia>,
trailing_trivia: Option<GreenTrivia>,
}
fn main() {
println!("GreenTriviaPiece: {}", std::mem::size_of::<GreenTriviaPiece>());
println!("GreenTrivia: {}", std::mem::size_of::<GreenTrivia>());
println!("GreenTokenHead: {}", std::mem::size_of::<GreenTokenHead>());
} |
Beta Was this translation helpful? Give feedback.
-
@xunilrj what are your plans around trivia? |
Beta Was this translation helpful? Give feedback.
-
This issue #1720 contains all the details of the decisions took when we migrated the trivia to be attached to tokens.
Now we need to understand the impact in terms of performance and if we want, and how, to improve.
Today each token contains all its trivia. An statement like "\tlet a = 0;" is tokenized as: "[\tlet ][a ][= ][0][;]".
This means that "\tlet " together with its SyntaxKind are the key to the green cache.
The improvements we are aiming are:
1 - Can we use use less memory?
2 - Can we be faster?
3 - Can we use more cache?
But before we need to understand the following
GreenToken
is 32 bytes longer than before. Has this increased the memory consumption?BoxVec<...>>
. Is this double indirection creating performance problems?GreenNode
?Beta Was this translation helpful? Give feedback.
All reactions