fix(codegen): escape backticks and ${} in template literal raw values#18102
fix(codegen): escape backticks and ${} in template literal raw values#18102CompuIves wants to merge 1 commit intooxc-project:mainfrom
${} in template literal raw values#18102Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds escaping logic for backticks (`), template literal interpolation markers (${), and HTML script close tags (</script) in template literal raw values during code generation. This prevents syntax errors when manually injecting or editing AST nodes that contain these special characters in template literals.
Changes:
- Added a new
print_template_literal_strmethod to handle proper escaping of template literal content - Added a helper function
is_preceded_by_odd_backslashesto detect already-escaped characters - Replaced calls to
print_str_escaping_script_close_tagwithprint_template_literal_strin template literal code generation - Added comprehensive unit tests for the new escaping functionality
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/oxc_codegen/src/lib.rs | Added print_template_literal_str method for escaping backticks, ${}, and </script in template literals, plus helper function and comprehensive unit tests |
| crates/oxc_codegen/src/gen.rs | Updated TemplateLiteral code generation to use the new print_template_literal_str method instead of print_str_escaping_script_close_tag |
Comments suppressed due to low confidence (1)
crates/oxc_codegen/src/gen.rs:3281
- The
TSTemplateLiteralTypealso uses template literal syntax and should useprint_template_literal_strfor consistency and to properly escape backticks,${}, and</scripttags. This is a similar fix to what was done forTemplateLiteralin this PR.
p.print_str(item.value.raw.as_str());
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| b'<' => { | ||
| // Check for `</script` (case-insensitive) | ||
| if i + 8 <= len && is_script_close_tag(&bytes[i..i + 8]) { | ||
| // Flush bytes up to and including `<`, then write `\/` | ||
| // SAFETY: `consumed` and `i + 1` are valid indices | ||
| unsafe { | ||
| self.code.print_bytes_unchecked(bytes.get_unchecked(consumed..=i)); | ||
| } | ||
| self.print_str("\\/"); | ||
| consumed = i + 2; // Skip past `</` | ||
| i += 1; // Extra increment to skip `/` | ||
| } | ||
| } |
There was a problem hiding this comment.
The new implementation uses a simple byte-by-byte loop for all patterns including </script. The original print_str_escaping_script_close_tag function uses SIMD optimizations to search for < in chunks of 16 bytes for better performance. Consider whether the simpler approach is acceptable for template literals, or if SIMD optimization should also be applied here since template literals are commonly used and can be quite long.
b3c55d2 to
848f273
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.code.print_bytes_unchecked(bytes.get_unchecked(consumed..i)); | ||
| } | ||
| self.print_str("\\$"); | ||
| consumed = i + 1; |
There was a problem hiding this comment.
When escaping ${, only the $ is being replaced with \\$, but consumed is set to skip only the $ character (i + 1), leaving the { to be printed in the next flush. This will result in \\${ being output as \\$ followed by {, which produces the incorrect output \\${ when it should just be \\${. The logic should either skip both characters (set consumed = i + 2) or only output \\ and let the next iteration handle ${.
| consumed = i + 1; | |
| consumed = i + 2; |
| } | ||
| b'<' => { | ||
| // Check for `</script` (case-insensitive) | ||
| if i + 8 <= len && is_script_close_tag(&bytes[i..i + 8]) { |
There was a problem hiding this comment.
The </script escaping in template literals uses a simpler byte-by-byte scan compared to the SIMD-optimized approach in print_str_escaping_script_close_tag. While </script is rare in template literals, consider whether the performance difference is acceptable or if SIMD optimization should be applied here as well for consistency.
|
Duplicate of #18101 |
We're using OXC pretty extensively these days for custom transforms, lints and now also for code edits where users make edits in a UI and we update code. I'm a big fan of OXC, it's super fast, easy to use and also works great in the browser!
When manually injecting AST, or e.g. editing AST, I noticed that it's possible to inject backticks in template literals, leading to syntax errors. It seems like we already have escaping logic (single/double quotes, etc) for strings, so I thought it could be valuable to also add escaping to template literals in
codegen.