This repository has been archived by the owner on Aug 31, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 657
/
Copy pathlib.rs
244 lines (216 loc) · 7.46 KB
/
lib.rs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
//! Extremely fast, lossless, and error tolerant JavaScript Parser.
//!
//! The parser uses an abstraction over non-whitespace tokens.
//! This allows us to losslessly or lossly parse code without requiring explicit handling of whitespace.
//! The parser yields events, not an AST, the events are resolved into untyped syntax nodes, which can then
//! be casted into a typed AST.
//!
//! The parser is able to produce a valid AST from **any** source code.
//! Erroneous productions are wrapped into `ERROR` syntax nodes, the original source code
//! is completely represented in the final syntax nodes.
//!
//! You probably do not want to use the parser struct, unless you want to parse fragments of Js source code or make your own productions.
//! Instead use functions such as [`parse_text`] and [`parse_text_lossy`] which offer abstracted versions for parsing.
//!
//! Notable features of the parser are:
//! - Extremely fast parsing and lexing through the extremely fast [`rslint_lexer`].
//! - Ability to do Lossy or Lossless parsing on demand without explicit whitespace handling.
//! - Customizable, able to parse any fragments of JS code at your discretion.
//! - Completely error tolerant, able to produce an AST from any source code.
//! - Zero cost for converting untyped nodes to a typed AST.
//! - Ability to go from AST to SyntaxNodes to SyntaxTokens to source code and back very easily with nearly zero cost.
//! - Very easy tree traversal through [`SyntaxNode`](rome_rowan::SyntaxNode).
//! - Descriptive errors with multiple labels and notes.
//! - Very cheap cloning, cloning an ast node or syntax node is the cost of adding a reference to an Rc.
//! - Cheap incremental reparsing of changed text.
//!
//! The crate further includes utilities such as:
//! - ANSI syntax highlighting of nodes (through [`util`]) or text through [`rslint_lexer`].
//! - Rich utility functions for syntax nodes through [`SyntaxNodeExt`].
//!
//! It is inspired by the rust analyzer parser but adapted for JavaScript.
//!
//! # Syntax Nodes vs AST Nodes
//! The crate relies on a concept of untyped [`SyntaxNode`]s vs typed [`AstNode`]s.
//! Syntax nodes represent the syntax tree in an untyped way. They represent a location in an immutable
//! tree with two pointers. The syntax tree is composed of [`SyntaxNode`]s and [`SyntaxToken`]s in a nested
//! tree structure. Each node can have parents, siblings, children, descendants, etc.
//!
//! [`AstNode`]s represent a typed version of a syntax node. They have the same exact representation as syntax nodes
//! therefore a conversion between either has zero runtime cost. Every piece of data of an ast node is optional,
//! this is due to the fact that the parser is completely error tolerant.
//!
//! Each representation has its advantages:
//!
//! ### SyntaxNodes
//! - Very simple traversing of the syntax tree through functions on them.
//! - Easily able to convert to underlying text, range, or tokens.
//! - Contain all whitespace bound to the underlying production (in the case of lossless parsing).
//! - Can be easily converted into its typed representation with zero cost.
//! - Can be turned into a pretty representation with fmt debug.
//!
//! ### AST Nodes
//! - Easy access to properties of the underlying production.
//! - Zero cost conversion to a syntax node.
//!
//! In conclusion, the use of both representations means we are not constrained to acting through
//! typed nodes. Which makes traversal hard and you often have to resort to autogenerated visitor patterns.
//! AST nodes are simply a way to easily access subproperties of a syntax node.event;
mod parser;
#[macro_use]
mod token_set;
mod event;
mod lossless_tree_sink;
mod lossy_tree_sink;
mod numbers;
mod parse;
mod state;
mod syntax_node;
mod token_source;
#[cfg(test)]
mod tests;
#[macro_use]
pub mod ast;
pub mod syntax;
pub mod util;
pub use crate::{
ast::{AstNode, AstToken},
event::{process, Event},
lossless_tree_sink::LosslessTreeSink,
lossy_tree_sink::LossyTreeSink,
numbers::{parse_js_num, BigInt, JsNum},
parse::*,
parser::{Checkpoint, CompletedMarker, Marker, Parser},
state::{ParserState, StrictMode},
syntax_node::*,
token_set::TokenSet,
token_source::TokenSource,
util::{SyntaxNodeExt, SyntaxTokenExt},
};
pub use rome_rowan::{SyntaxText, TextRange, TextSize, WalkEvent};
pub use rslint_syntax::*;
/// The type of error emitted by the parser, this includes warnings, notes, and errors.
/// It also includes labels and possibly notes
pub type ParserError = rslint_errors::Diagnostic;
use std::ops::Range;
/// Abstracted token for `TokenSource`
#[derive(Debug, Clone, Eq, PartialEq, Hash)]
pub struct Token {
/// What kind of token it is
pub kind: SyntaxKind,
/// The range (in byte indices) of the token
pub range: Range<usize>,
/// How long the token is
pub len: TextSize,
}
impl From<Token> for Range<usize> {
fn from(token: Token) -> Self {
token.range
}
}
/// An abstraction for syntax tree implementations
pub trait TreeSink {
/// Adds new token to the current branch.
fn token(&mut self, kind: SyntaxKind);
/// Start new branch and make it current.
fn start_node(&mut self, kind: SyntaxKind);
/// Finish current branch and restore previous
/// branch as current.
fn finish_node(&mut self);
/// Emit errors
fn errors(&mut self, errors: Vec<ParserError>);
/// Consume multiple tokens and glue them into one kind
fn consume_multiple_tokens(&mut self, amount: u8, kind: SyntaxKind);
}
/// Matches a `SyntaxNode` against an `ast` type.
///
/// # Example:
///
/// ```ignore
/// match_ast! {
/// match node {
/// ast::CallExpr(it) => { ... },
/// ast::BlockStmt(it) => { ... },
/// ast::Script(it) => { ... },
/// _ => None,
/// }
/// }
/// ```
#[macro_export]
macro_rules! match_ast {
(match $node:ident { $($tt:tt)* }) => { match_ast!(match ($node) { $($tt)* }) };
(match ($node:expr) {
$( ast::$ast:ident($it:ident) => $res:expr, )*
_ => $catch_all:expr $(,)?
}) => {{
$( if let Some($it) = ast::$ast::cast($node.clone()) { $res } else )*
{ $catch_all }
}};
}
/// A structure describing the syntax features the parser will accept. The
/// default is an ECMAScript 2021 Script without any proposals.
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Syntax {
pub file_kind: FileKind,
pub top_level_await: bool,
pub global_return: bool,
pub class_fields: bool,
pub decorators: bool,
}
impl Syntax {
pub fn new(file_kind: FileKind) -> Self {
let mut this = Self {
file_kind,
..Default::default()
};
if file_kind == FileKind::TypeScript {
this = this.typescript();
}
this
}
pub fn top_level_await(mut self) -> Self {
self.top_level_await = true;
self
}
pub fn global_return(mut self) -> Self {
self.global_return = true;
self
}
pub fn class_fields(mut self) -> Self {
self.class_fields = true;
self
}
pub fn decorators(mut self) -> Self {
self.decorators = true;
self
}
pub fn script(mut self) -> Self {
self.file_kind = FileKind::Script;
self
}
pub fn module(mut self) -> Self {
self.file_kind = FileKind::Module;
self
}
pub fn typescript(mut self) -> Self {
self.file_kind = FileKind::TypeScript;
self.class_fields().decorators().top_level_await()
}
}
/// The kind of file we are parsing
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum FileKind {
Script,
Module,
TypeScript,
}
impl Default for FileKind {
fn default() -> Self {
FileKind::Script
}
}
impl From<FileKind> for Syntax {
fn from(kind: FileKind) -> Self {
Syntax::new(kind)
}
}