Skip to content

feat(ast/estree): raw transfer (experimental)#9516

Merged
graphite-app[bot] merged 1 commit intomainfrom
03-03-feat_ast_estree_raw_transfer_experimental_
Mar 4, 2025
Merged

feat(ast/estree): raw transfer (experimental)#9516
graphite-app[bot] merged 1 commit intomainfrom
03-03-feat_ast_estree_raw_transfer_experimental_

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Mar 3, 2025

First version of raw transfer (#2409).

Provides a solid speed-up transferring data from Rust to JS. Further iterations will speed it up further.

Tests check that output via raw transfer matches output via JSON transfer exactly for all of Test262 which Acorn is able to parse. It should also match for TypeScript and JSX, but that's not covered by tests as yet.

However, I think we should consider this experimental for now because there are a few rough edges (discussed below).

Therefore I've put it behind an "experimental" flag:

const ret = parseSync(filename, code, { experimentalRawTransfer: true });
console.log(ret.program);

How it works

  • JS creates an ArrayBuffer and passes it to Rust.
  • Rust creates an Allocator using that buffer as its backing memory.
  • Rust parses the AST into that allocator (including comments).
  • Rust also converts module record and errors into arena types, and writes them into the allocator.
  • Rust writes into the end of the buffer the offset at which data begins in the buffer.
  • Control passes back to JS.
  • JS code decodes data in the buffer, and creates JS objects fitting the ESTree shape.

There is no serialization step on Rust side. There is no JSON encoding or decoding involved. These are the main sources of the speed gain.

Preconditions

The reason all this works is that all AST types (and other transferred types) are #[repr(C)]. So the memory layouts of those types are specified, and can be statically calculated.

oxc_ast_tools does those layout calculations, and generates the JS-side deserializer code based on its knowledge of what offsets struct fields are at, what the discriminants of enums are etc.

Rough edges

There are a few rough edges.

Allocator::from_raw_parts

This PR adds a method Allocator::from_raw_parts. It creates a bumpalo::Bump that uses an existing block of memory as the allocator's backing memory. This ability is required, but it's not an API that bumpalo offers (and the maintainer was not willing to add it).

The implementation of Allocator::from_raw_parts is extremely hacky. It depends on internal implementation details of bumpalo which are not specified. And the method used to determine the memory layout of Bump depends on unspecified details of Rust's memory model. So, while it does seem to work in practice, it is, strictly speaking, UB.

It could break in a future Rust version, or with esoteric compiler flags e.g. -Zrandomize-layout. And it could break if we update bumpalo from the version we're currently using. For this reason, this PR pins the version of bumpalo in Cargo.toml.

I think this OK for now, but it's unpleasantly fragile.

We can resolve all these problems by replacing bumpalo with our own arena allocator (which I think we should do anyway, for other reasons).

In the meantime, Allocator::from_raw_parts is behind a cargo feature from_raw_parts, to avoid it being used anywhere else in our codebase.

Unspecified type layouts

As noted above, all AST types are #[repr(C)], so their layout is specified and stable. There are a few types which are outside of our control, though:

  1. Vec<T>. We use allocator_api2::vec::Vec, which is not #[repr(C)].
  2. &str. I don't believe the layout of this type is specified.
  3. Option<T>.

For production-grade stability, we need to try to work around these.

Vec - we should replace allocator_api2::vec::Vec with our own Vec type. This will also allow us to reduce its size (#9706).

&str - again, we need our own string slice type, to work around the problem of lone surrogates (#3526) and to make it more efficient (oxc-project/backlog#46). We can make its memory layout stable at the same time.

Option<T> is tricky. We don't want to replace Rust's Option because of the niche optimization benefits it gives. I'm not sure this one is 100% soluable, but Rust gives at least some guarantees about the layout of Option. Maybe we can avoid using Option in the AST in ways which go outside that specification.

Large buffers

For speed, raw transfer requires the entire AST to be in a single contiguous memory region, and for the start of that region to be aligned on 4 GiB.

JS does not support 64-bit integers, so offset calculations are much cheaper when the buffer is aligned on 4 GiB and no larger than 4 GiB - because then all pointers have the same value in their top 32 bits. So the pointer can be treated as a 32-bit value (bottom 32 bits only). JS can handle 32 bit integers no problem.

When creating a large buffer on JS side, it mostly ends up aligned on a 4 GiB boundary anyway, but occasionally it doesn't. So in order to ensure the buffer has at least 1 region within it which is aligned on 4 GiB, and 2 GiB in size, we have to create a 6 GiB buffer.

I think this is OK. On systems with virtual memory, allocating 6 GiB only reserves 6 GiB of virtual memory. Physical memory is only consumed when the pages of that allocation are actually written to.

But I may be missing something here, and memory exhaustion might be a danger. I think we need some real-world usage to find out.

Possibly we could reduce the need for so much memory if JS deserializer called into a small WASM module to do offset calculations. WASM can work with i64 values. Or there may be other solutions.

Endianness

Currently only little-endian systems are supported. Probably in practice this doesn't matter much, but it'd be ideal to cover big-endian too.

@github-actions github-actions bot added A-ast Area - AST A-ast-tools Area - AST tools C-enhancement Category - New feature or request labels Mar 3, 2025
Copy link
Member Author

overlookmotel commented Mar 3, 2025

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 3, 2025

CodSpeed Performance Report

Merging #9516 will not alter performance

Comparing 03-03-feat_ast_estree_raw_transfer_experimental_ (d55dbe2) with main (fb4d0b6)

Summary

✅ 33 untouched benchmarks

@overlookmotel overlookmotel marked this pull request as ready for review March 3, 2025 15:33
@overlookmotel overlookmotel force-pushed the 03-03-feat_ast_estree_raw_transfer_experimental_ branch 3 times, most recently from 3893f4f to 9812974 Compare March 3, 2025 15:56
@overlookmotel
Copy link
Member Author

Note: I've changed the NAPI CI task to build in release mode. The 40,000 Test262 tests take a long time to run, so building in release mode helps bring it down to a more reasonable run time.

Hopefully I can speed up these tests in a follow-on PR.

@overlookmotel overlookmotel force-pushed the 03-03-feat_ast_estree_raw_transfer_experimental_ branch from 4dab08b to e4459e3 Compare March 3, 2025 15:59
@overlookmotel overlookmotel force-pushed the 03-03-feat_ast_estree_raw_transfer_experimental_ branch 2 times, most recently from 1c49226 to 3e256d7 Compare March 3, 2025 19:44
@overlookmotel overlookmotel requested a review from Boshen March 4, 2025 12:30
@overlookmotel overlookmotel force-pushed the 03-03-feat_ast_estree_raw_transfer_experimental_ branch from 3e256d7 to 30732cc Compare March 4, 2025 12:30
@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Mar 4, 2025
@graphite-app
Copy link
Contributor

graphite-app bot commented Mar 4, 2025

Merge activity

First version of raw transfer (#2409).

Provides a solid speed-up transferring data from Rust to JS. Further iterations will speed it up further.

Tests check that output via raw transfer matches output via JSON transfer exactly for all of Test262 which Acorn is able to parse. It should also match for TypeScript and JSX, but that's not covered by tests as yet.

However, I think we should consider this experimental for now because there are a few rough edges (discussed below).

Therefore I've put it behind an "experimental" flag:

```js
const ret = parseSync(filename, code, { experimentalRawTransfer: true });
console.log(ret.program);
```

### How it works

* JS creates an `ArrayBuffer` and passes it to Rust.
* Rust creates an `Allocator` using that buffer as its backing memory.
* Rust parses the AST into that allocator (including comments).
* Rust also converts module record and errors into arena types, and writes them into the allocator.
* Rust writes into the end of the buffer the offset at which data begins in the buffer.
* Control passes back to JS.
* JS code decodes data in the buffer, and creates JS objects fitting the ESTree shape.

There is *no* serialization step on Rust side. There is no JSON encoding or decoding involved. These are the main sources of the speed gain.

### Preconditions

The reason all this works is that all AST types (and other transferred types) are `#[repr(C)]`. So the memory layouts of those types are specified, and can be statically calculated.

`oxc_ast_tools` does those layout calculations, and generates the JS-side deserializer code based on its knowledge of what offsets struct fields are at, what the discriminants of enums are etc.

### Rough edges

There are a few rough edges.

#### `Allocator::from_raw_parts`

This PR adds a method `Allocator::from_raw_parts`. It creates a `bumpalo::Bump` that uses an existing block of memory as the allocator's backing memory. This ability is required, but it's not an API that `bumpalo` offers (and the maintainer was not willing to add it).

The implementation of `Allocator::from_raw_parts` is extremely hacky. It depends on internal implementation details of `bumpalo` which are not specified. And the method used to determine the memory layout of `Bump` depends on unspecified details of Rust's memory model. So, while it does seem to work in practice, it is, strictly speaking, UB.

It could break in a future Rust version, or with esoteric compiler flags e.g. [`-Zrandomize-layout`](https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/randomize-layout.html). And it could break if we update `bumpalo` from the version we're currently using. For this reason, this PR pins the version of `bumpalo` in `Cargo.toml`.

I think this OK for now, but it's unpleasantly fragile.

We can resolve all these problems by replacing `bumpalo` with our own arena allocator (which I think we should do anyway, for other reasons).

In the meantime, `Allocator::from_raw_parts` is behind a cargo feature `from_raw_parts`, to avoid it being used anywhere else in our codebase.

#### Unspecified type layouts

As noted above, all AST types are `#[repr(C)]`, so their layout is specified and stable. There are a few types which are outside of our control, though:

1. `Vec<T>`. We use `allocator_api2::vec::Vec`, which is not `#[repr(C)]`.
2. `&str`. I don't believe the layout of this type is specified.
3. `Option<T>`.

For production-grade stability, we need to try to work around these.

`Vec` - we should replace `allocator_api2::vec::Vec` with our own `Vec` type. This will also allow us to reduce its size (https://github.com/oxc-project/backlog/issues/18).

`&str` - again, we need our own string slice type, to work around the problem of lone surrogates (#3526) and to make it more efficient (oxc-project/backlog#46). We can make its memory layout stable at the same time.

`Option<T>` is tricky. We don't want to replace Rust's `Option` because of the niche optimization benefits it gives. I'm not sure this one is 100% soluable, but Rust gives at least *some* guarantees about the layout of `Option`. Maybe we can avoid using `Option` in the AST in ways which go outside that specification.

#### Large buffers

For speed, raw transfer requires the entire AST to be in a single contiguous memory region, and for the start of that region to be aligned on 4 GiB.

JS does not support 64-bit integers, so offset calculations are much cheaper when the buffer is aligned on 4 GiB and no larger than 4 GiB - because then all pointers have the same value in their top 32 bits. So the pointer can be treated as a 32-bit value (bottom 32 bits only). JS can handle 32 bit integers no problem.

When creating a large buffer on JS side, it *mostly* ends up aligned on a 4 GiB boundary anyway, but occasionally it doesn't. So in order to ensure the buffer has at least 1 region within it which is aligned on 4 GiB, and 2 GiB in size, we have to create a 6 GiB buffer.

I *think* this is OK. On systems with virtual memory, allocating 6 GiB only reserves 6 GiB of *virtual* memory. Physical memory is only consumed when the pages of that allocation are actually written to.

But I may be missing something here, and memory exhaustion might be a danger. I think we need some real-world usage to find out.

*Possibly* we could reduce the need for so much memory if JS deserializer called into a small WASM module to do offset calculations. WASM can work with `i64` values. Or there may be other solutions.

#### Endianness

Currently only little-endian systems are supported. Probably in practice this doesn't matter much, but it'd be ideal to cover big-endian too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0-merge Merge with Graphite Merge Queue A-ast Area - AST A-ast-tools Area - AST tools C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant