Skip to content

Conversation

@quininer
Copy link
Contributor

Description:

Following our previous discussion, I am trying to introduce a self-describing serialization encoding for ast to replace #11069 .

Since serde is currently used for json serialization, it is hard to modify serde to support both json and binary encodings (due to the untagged and flatten attr). So I made our own Encode and Decode proc macro, which also facilitate unknown variant support.

benchmark

I've done some simple benchmarks with a modified swc ast, and I believe we don't have an unacceptable performance loss relative to rkyv.
I'm using cbor4ii here, which I developed, so I'd prefer it if there's no performance difference. but it's easy to migrate to bincode or other scheme.

This also introduces an unknown feature for ecma_ast crate, which should only be enabled by the wasm plugin to ensure that old plugins can decode new AST.

BREAKING CHANGE:

This PR itself only introduces new encoding and will not cause any breakage. However, it may cause breakage when switch the wasm plugin communication scheme in future.

Related issue (if exists):

@changeset-bot
Copy link

changeset-bot bot commented Sep 18, 2025

🦋 Changeset detected

Latest commit: f23fdbf

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from c9d0a01 to de8261b Compare September 18, 2025 12:27
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 18, 2025

CodSpeed Performance Report

Merging #11100 will not alter performance

Comparing quininer:r/self-desc-encode (f23fdbf) with main (dba23f5)

Summary

✅ 138 untouched

@socket-security
Copy link

socket-security bot commented Sep 19, 2025

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@quininer quininer force-pushed the r/self-desc-encode branch 12 times, most recently from 1b60f1d to c5fab18 Compare September 22, 2025 12:10
@quininer quininer force-pushed the r/self-desc-encode branch 5 times, most recently from 4da0e40 to 815343f Compare September 23, 2025 09:47
kdy1 pushed a commit that referenced this pull request Oct 12, 2025
**Description:**

This is the split of #11100.
Since we need to add unknown variants to the ast enum, this requires a
lot of code changes to handle non-exhaustive matches. I'm doing this in a
separate PR for review.
@quininer
Copy link
Contributor Author

This also requires a serialization test, include unknown variants.

Since I'm busy with other work, it will be a little slower.

@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from c736814 to 46b5ab6 Compare October 15, 2025 10:35
@github-actions
Copy link
Contributor

github-actions bot commented Oct 30, 2025

Binary Sizes

File Size
swc.linux-x64-gnu.node 31M (32023392 bytes)

Commit: c4ed6d2

Copilot AI review requested due to automatic review settings October 31, 2025 07:30
@quininer quininer force-pushed the r/self-desc-encode branch 2 times, most recently from 6d0f5ff to 3737124 Compare October 31, 2025 07:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 107 out of 112 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pub s: String,
}

#[ast_node("Tuple")]
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from usize to u64 alters the semantics of the Tuple struct. On 32-bit platforms, usize is 32 bits while u64 is 64 bits, which could lead to different behavior. This appears to be unrelated to the encoding implementation and should be documented or justified.

Suggested change
#[ast_node("Tuple")]
#[ast_node("Tuple")]
// NOTE: We use `u64` instead of `usize` for cross-platform consistency and to avoid issues with serialization/deserialization on 32-bit vs 64-bit platforms. If you need platform-dependent sizing, consider using `usize` and document the rationale.

Copilot uses AI. Check for mistakes.
@@ -1,5 +1,5 @@
#![allow(deprecated)]
#![deprecated = "Not used by swc, and this will be removed with next breaking change"]

Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation attribute and warning comment were removed without explanation. If this module is no longer deprecated and is being actively used for CBOR encoding support, this should be documented in the PR description or in a code comment explaining why the deprecation was lifted.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +43
// This is not sound, maybe Wtf8Buf should make bytes operations safe
Ok(Self(hstr::Wtf8Atom::from(unsafe {
Wtf8Buf::from_bytes_unchecked(s.0.into())
})))
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment admits this is 'not sound' yet uses unsafe code without validation. CBOR decoding should validate that the bytes form valid WTF-8. Consider adding validation before the unsafe conversion or document why it's safe in this context.

Suggested change
// This is not sound, maybe Wtf8Buf should make bytes operations safe
Ok(Self(hstr::Wtf8Atom::from(unsafe {
Wtf8Buf::from_bytes_unchecked(s.0.into())
})))
// Validate that the bytes are valid WTF-8 before constructing Wtf8Buf
match Wtf8::from_bytes(s.0) {
Some(_) => Ok(Self(hstr::Wtf8Atom::from(Wtf8Buf::from_bytes_unchecked(s.0.into())))),
None => Err(cbor4ii::core::dec::Error::message("Invalid WTF-8 bytes")),
}

Copilot uses AI. Check for mistakes.
Copy link
Member

@kdy1 kdy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the hard work!

@kdy1 kdy1 changed the title refactor(ast): Introduce flexible serialization encoding for ast refactor(ast): Introduce flexible serialization encoding for AST Oct 31, 2025
@kdy1 kdy1 enabled auto-merge (squash) October 31, 2025 07:44
quininer and others added 4 commits October 31, 2025 15:58
move to ast_node

make encoding-impl optional

fix fmt

make css_ast

fix tuple

fix unknown

fix fmt

make html_ast

make xml_ast

make jsdoc

fix fmt

fix unit tag

update visit

fix parser

make regexp_ast

no unknown

fix snap

fix arbitrary

encoding: ignore unknown field

fix cfg

fix enum struct derive
Refactor AST serialization to support flexible encoding.
Copilot AI review requested due to automatic review settings October 31, 2025 07:58
auto-merge was automatically disabled October 31, 2025 07:58

Head branch was pushed to by a user without write access

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 108 out of 113 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kdy1 kdy1 merged commit 8ad3647 into swc-project:main Oct 31, 2025
180 checks passed
@quininer quininer deleted the r/self-desc-encode branch October 31, 2025 10:02
kdy1 added a commit that referenced this pull request Nov 3, 2025
**Description:**

This is a follow up to #11100,
which will result in plugin abi break.

It currently passes the plugin tests, but I still need to do some code
and configuration cleanup, and add a regression test.

<img width="2053" height="336" alt="plugin tests"
src="https://github.com/user-attachments/assets/868b2ae8-105c-44ad-8d18-641a2dd33c9f"
/>

**BREAKING CHANGE:**

Since we switched serialization schemes, this will result in breaking
changes on the plugin ABI and api.

---------

Co-authored-by: Donny/강동윤 <[email protected]>
Co-authored-by: Donny/강동윤 <[email protected]>
Co-authored-by: Copilot <[email protected]>
gaoachao pushed a commit to lynx-family/lynx-stack that referenced this pull request Nov 6, 2025
<!--
  Thank you for submitting a pull request!

We appreciate the time and effort you have invested in making these
changes. Please ensure that you provide enough information to allow
others to review your pull request.

Upon submission, your pull request will be automatically assigned with
reviewers.

If you want to learn more about contributing to this project, please
visit:
https://github.com/lynx-family/lynx-stack/blob/main/CONTRIBUTING.md.
-->

<!-- The AI summary below will be auto-generated - feel free to replace
it with your own. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
  * Updated `swc_core` dependency to version 47.0.3.
  * Adjusted `serde_json` dependency configuration.

* **Refactor**
* Enhanced string handling across compiler plugins for robust text
processing.
  * Updated AST processing logic for improved compatibility.
  * Improved import alias handling across multiple plugin modules.
  * Refined attribute and literal value transformations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Related SWC breaking changes:

1. `Atom` -> `Wtf8Atom`: swc-project/swc#11144
> `Wtf8Atom` does not expose a `to_string` method like the old `Atom`
does. Internally, it stores the code points of the characters. You can
call `code_points()` to get an iterator of the code points or call
`to_string_lossy()` to get an lossy string in which all unpaired
surrogates are replaced with `U+FFFD`(Replacement Character).
2. flexible AST: swc-project/swc#11100

## Checklist

<!--- Check and mark with an "x" -->

- [ ] Tests updated (or not required).
- [ ] Documentation updated (or not required).
- [x] Changeset added, and when a BREAKING CHANGE occurs, it needs to be
clearly marked (or not required).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants