Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spansy: support Transfer-Encoding #34

Open
sinui0 opened this issue Jul 1, 2024 · 4 comments
Open

spansy: support Transfer-Encoding #34

sinui0 opened this issue Jul 1, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@sinui0
Copy link
Member

sinui0 commented Jul 1, 2024

Our spansy crate currently does not support HTTP transfer encoding, which is very common. We should support it.

Err(ParseError(
"Transfer-Encoding not supported yet".to_string(),
))

One challenge is that it can cause an HTTP response payload to be chunked. This will cause the spans of the payload to be non-contiguous.

@sinui0 sinui0 added the enhancement New feature or request label Jul 1, 2024
@TheFrozenFire
Copy link

TheFrozenFire commented Jan 17, 2025

I'm working on this right now, and after a bit of work on it, I think I have a sane approach (which I'm still implementing).

When parsing the body source, build a RangeSet which encapsulates the chunk boundaries, such that the inversion of that rangeset is the body content. Collapse the body content down to remove the chunk boundaries, and then parse it as usual.

Once you have the body parsed, and its spans constructed, take the chunk boundary rangeset, and then iteratively offset each body content span's rangeset indices based on the data's original relative position within the source.

This is likely much simpler than trying to integrate the split ranges into the body content parsing logic.

Thoughts?

@sinui0
Copy link
Member Author

sinui0 commented Jan 17, 2025

Sounds directionally correct. The complexity will lie in handling when fields are split across the chunk boundaries (potentially >1 splits), which is more involved than simply offsetting them. We will need to add split_at support to our RangeSet type, compute where the field intersects with the boundaries, and perform the correct split -> offset -> union operations.

The simplest place to start is to modify the Span type to account for the source data not always being contiguous. Even here there is some additional nuance to consider: collapsing the chunking then parsing means that the Span will be referencing a different memory region than the original source, which has some implications for some of its public API methods.

@sinui0
Copy link
Member Author

sinui0 commented Jan 17, 2025

The simplest place to start is to modify the Span type to account for the source data not always being contiguous.

Giving this another thought cycle, it would probably make more sense for the parsed types to hold multiple Spans instead of redefining that type.

@TheFrozenFire
Copy link

TheFrozenFire commented Jan 17, 2025

Does Span not already support splits by virtue of using a RangeSet? Is there a known circumstance where a Span has multiple discontiguous ranges for its data, and that has a negative effect? When does a Span have a RangeSet with more than one element?

In keeping with the "Span" terminology, I think that the type we're looking for is Vector (Scatter/Gather) addressing (essentially the same as the Vec type in Rust). Sort of obvious to say, but the parsed types should probably just hold Vec<Span>.

Or, essentially change the semantic of Span to Vector, if the RangeSet works the way I'm thinking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants