feat(rome_css_parser): CSS lexer number and ident #4682 #4712

denbezrukov · 2023-07-19T08:22:57Z

Summary

Lexes number literal and ident

https://drafts.csswg.org/css-syntax/#consume-a-number
https://drafts.csswg.org/css-syntax/#consume-an-ident-sequence

https://github.com/swc-project/swc/blob/main/crates/swc_css_parser/src/lexer/mod.rs
https://github.com/servo/rust-cssparser/blob/master/src/tokenizer.rs

Test Plan

cargo test -p rome_css_parser

netlify · 2023-07-19T08:23:06Z

✅ Deploy Preview for docs-rometools canceled.

Name	Link
🔨 Latest commit	`f5e66b5`
🔍 Latest deploy log	https://app.netlify.com/sites/docs-rometools/deploys/64ba50265afa650008f172fa

github-actions · 2023-07-19T08:31:57Z

Parser conformance results on ubuntu-latest

js/262

Test result	`main` count	This PR count	Difference
Total	48863	48863	0
Passed	47810	47810	0
Failed	1053	1053	0
Panics	0	0	0
Coverage	97.84%	97.84%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6212	6212	0
Passed	1764	1764	0
Failed	4448	4448	0
Panics	0	0	0
Coverage	28.40%	28.40%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	639	639	0
Passed	573	573	0
Failed	66	66	0
Panics	0	0	0
Coverage	89.67%	89.67%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	17224	17224	0
Passed	13121	13121	0
Failed	4103	4103	0
Panics	0	0	0
Coverage	76.18%	76.18%	0.00%

crates/rome_css_parser/src/lexer/mod.rs

ematipico · 2023-07-21T08:24:54Z

crates/rome_css_parser/src/lexer/mod.rs

+    fn assert_at_char_boundary(&self, offset: usize) {
+        debug_assert!(self.source.is_char_boundary(self.position + offset));
+    }


Is the compiler smart enough to remove this function in production builds?

We need to check it.
We have the same function in other parsers:

tools/crates/rome_js_parser/src/lexer/mod.rs

Lines 542 to 546 in 22bca8e

/// Asserts that the lexer is at a UTF8 char boundary

#[inline]

fn assert_at_char_boundary(&self) {

debug_assert!(self.source.is_char_boundary(self.position));

}

The Rust compiler and its LLVM backend are smart enough to recognize when functions are empty or when their output isn't used, and can often inline or eliminate such calls during the optimization phase. This process is sometimes referred to as "dead code elimination".

So, in a release build, the assert_at_char_boundary function should in theory have no runtime cost because it is empty, and the call to it in functions should be removed.

ematipico · 2023-07-21T08:26:06Z

crates/rome_css_parser/src/lexer/mod.rs

+            unsafe {
+                core::hint::unreachable_unchecked();
+            }


Isn't unreachable!() enough?

It's interesting.
Actually, I don't know :D
I've just copied it from:

tools/crates/rome_json_parser/src/lexer/mod.rs

Lines 177 to 196 in 22bca8e

fn current_char_unchecked(&self) -> char {

// Precautionary measure for making sure the unsafe code below does not read over memory boundary

debug_assert!(!self.is_eof());

self.assert_at_char_boundary();

// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char

let string = unsafe {

std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..))

};

let chr = if let Some(chr) = string.chars().next() {

chr

} else {

// Safety: we always call this when we are at a valid char, so this branch is completely unreachable

unsafe {

core::hint::unreachable_unchecked();

}

};

chr

}

tools/crates/rome_js_parser/src/lexer/mod.rs

Lines 508 to 527 in 22bca8e

fn current_char_unchecked(&self) -> char {

// Precautionary measure for making sure the unsafe code below does not read over memory boundary

debug_assert!(!self.is_eof());

self.assert_at_char_boundary();

// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char

let string = unsafe {

std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..))

};

let chr = if let Some(chr) = string.chars().next() {

chr

} else {

// Safety: we always call this when we are at a valid char, so this branch is completely unreachable

unsafe {

core::hint::unreachable_unchecked();

}

};

chr

}

Thank you :)

The main difference between the two lies in their behavior and purpose:

unreachable!(): This macro is used to indicate a section of code that should never be reached under normal conditions. If the unreachable!() code path is executed, the program will panic, providing an error message and backtrace. This is mainly used for situations where you are confident that the code will never be reached, but if it does get reached due to a logic error, you want to know about it. The macro can also optionally take a custom message, like unreachable!("Custom message").

core::hint::unreachable_unchecked(): This function, on the other hand, is used for telling the compiler that a certain piece of code will never be reached, allowing it to eliminate the code. However, if this code path does get executed, it results in undefined behavior, meaning that anything can happen. It could crash, it could corrupt data, it could keep working as if nothing happened. It's a way to make a promise to the compiler. If you break that promise, all bets are off. This should be used sparingly and only when you're absolutely sure that this code will never be reached and you need every bit of performance you can get.

crates/rome_css_parser/src/lexer/mod.rs

github-actions bot added the A-Tooling Area: our own build, development, and release tooling label Jul 19, 2023

denbezrukov force-pushed the feat/css-lexer-number-ident branch from 120bc9f to 28d4c33 Compare July 19, 2023 08:24

denbezrukov force-pushed the feat/css-lexer-number-ident branch 2 times, most recently from d822c40 to 7022e59 Compare July 19, 2023 09:52

denbezrukov requested a review from ematipico July 20, 2023 13:05

ematipico approved these changes Jul 21, 2023

View reviewed changes

feat(rome_css_parser): CSS lexer number and ident #4682

f5e66b5

denbezrukov force-pushed the feat/css-lexer-number-ident branch from 7022e59 to f5e66b5 Compare July 21, 2023 09:30

github-actions bot added L-CSS Language: CSS A-Parser Area: parser labels Jul 21, 2023

denbezrukov merged commit 9bc3630 into main Jul 21, 2023
18 checks passed

denbezrukov deleted the feat/css-lexer-number-ident branch July 21, 2023 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rome_css_parser): CSS lexer number and ident #4682 #4712

feat(rome_css_parser): CSS lexer number and ident #4682 #4712

denbezrukov commented Jul 19, 2023 •

edited

Loading

netlify bot commented Jul 19, 2023 •

edited

Loading

github-actions bot commented Jul 19, 2023

ematipico Jul 21, 2023

denbezrukov Jul 21, 2023

denbezrukov Jul 21, 2023

ematipico Jul 21, 2023

denbezrukov Jul 21, 2023

ematipico Jul 21, 2023

denbezrukov Jul 21, 2023

	/// Asserts that the lexer is at a UTF8 char boundary
	#[inline]
	fn assert_at_char_boundary(&self) {
	debug_assert!(self.source.is_char_boundary(self.position));
	}

	fn current_char_unchecked(&self) -> char {
	// Precautionary measure for making sure the unsafe code below does not read over memory boundary
	debug_assert!(!self.is_eof());
	self.assert_at_char_boundary();

	// Safety: We know this is safe because we require the input to the lexer to be valid utf8 and we always call this when we are at a char
	let string = unsafe {
	std::str::from_utf8_unchecked(self.source.as_bytes().get_unchecked(self.position..))
	};
	let chr = if let Some(chr) = string.chars().next() {
	chr
	} else {
	// Safety: we always call this when we are at a valid char, so this branch is completely unreachable
	unsafe {
	core::hint::unreachable_unchecked();
	}
	};

	chr
	}

feat(rome_css_parser): CSS lexer number and ident #4682 #4712

feat(rome_css_parser): CSS lexer number and ident #4682 #4712

Conversation

denbezrukov commented Jul 19, 2023 • edited Loading

Summary

Test Plan

netlify bot commented Jul 19, 2023 • edited Loading

✅ Deploy Preview for docs-rometools canceled.

github-actions bot commented Jul 19, 2023

Parser conformance results on ubuntu-latest

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

ematipico Jul 21, 2023

Choose a reason for hiding this comment

denbezrukov Jul 21, 2023

Choose a reason for hiding this comment

denbezrukov Jul 21, 2023

Choose a reason for hiding this comment

ematipico Jul 21, 2023

Choose a reason for hiding this comment

denbezrukov Jul 21, 2023

Choose a reason for hiding this comment

ematipico Jul 21, 2023

Choose a reason for hiding this comment

denbezrukov Jul 21, 2023

Choose a reason for hiding this comment

denbezrukov commented Jul 19, 2023 •

edited

Loading

netlify bot commented Jul 19, 2023 •

edited

Loading