This repository has been archived by the owner on Aug 31, 2023. It is now read-only.
Semantic IR #1812
jamiebuilds
started this conversation in
General
Semantic IR
#1812
Replies: 1 comment 2 replies
-
Thanks, @jamiebuilds for this excellent write-up. Regarding the goals:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Semantic IR
For Rome to fill the role of a complete language toolchain from compiler to IDE for JavaScript/TypeScript, we'll need likely several semantic models of the language used in different places.
Right now we only have lexical (tokens) and syntactic (cst) intermediate representations of JS/TS code. But there are a number of additional intermediate representations that we'll likely want to explore:
For now, I just want to focus on the Semantic IR (since it is the next thing in front of us to build).
Motivation
Because we don't have a semantic model today, if we attempted to build all the different IDE/compiler features at once, we'd end up repeating a lot of logic of translating syntax into semantics. That opens us up to more bugs and maintenance work.
It would also mean that some of our semantic queries would be harder to cache because they'd directly depend on our CST/AST which gets invalidated on every key stroke because it contains lexical information (tokens and source locations).
By introducing a Semantic IR, we can build our semantic queries without being concerned with syntax. This will simplify their implementations, and make them more cacheable across keystrokes.
Requirements
I'm not going to jump into a design right away, instead I'd like to consider the requirements we want to set for the Semantic IR.
These requirements are largely just intuition, and likely have tradeoffs with one another, so please poke any holes in this by considering how other parts of the tooling would create/make use of the Semantic IR.
That said, these are the goals I've come up with:
Some explicit non-goals:
Other details:
Inspiration
Rust Analyzer HIR
Rust Analyzer has a High-Level Intermediate Representation (HIR) that aims to fulfill similar needs. However, a significant part of it is designed around expanding Rust macros which we don't need for JS/TS.
At its core, RA's HIR breaks down a source file into an
ItemTree
with all of the semantic elements broken out into indexed ECS-like (Entity-Component-System) arenas. The program is further broken down into function bodies, blocks, statements, and expressions, all using the same indexed arenas.Some relevant bits of code:
ItemTreeData
[Source]
Function
[Source]
Body
[Source]
Expr
[Source]
Statement
[Source]
I think there's a lot to like about this approach. The entity-component-system model allows for very rich data structures, and breaking the indexes down seems to make it easier to incrementally build.
The one major concern that I have is that it's a little bit too non-performant when looking syntax up from one of the semantic items. It requires iterating over all the HIR expressions and comparing their positions. However, we might be able to do something more optimized since we don't have to do the macro expansion or nearly as much lowering as Rust Analyzer needs to.
One nice bit of this system is that if you can implement fairly universal logic to traverse the data structure and get the control flow (and other semantic information) for free.
The ECS model is also known for enabling highly parallel systems.
What now?
I'd like for us all to keep adding to this exploration and refine the requirements. After that we can start doing design and experimentation work.
Beta Was this translation helpful? Give feedback.
All reactions