spansy: support Transfer-Encoding #34

sinui0 · 2024-07-01T14:32:02Z

Our spansy crate currently does not support HTTP transfer encoding, which is very common. We should support it.

Lines 264 to 266 in 45370cc

    
           Err(ParseError( 
        
               "Transfer-Encoding not supported yet".to_string(), 
        
           ))

One challenge is that it can cause an HTTP response payload to be chunked. This will cause the spans of the payload to be non-contiguous.

TheFrozenFire · 2025-01-17T01:27:36Z

I'm working on this right now, and after a bit of work on it, I think I have a sane approach (which I'm still implementing).

When parsing the body source, build a RangeSet which encapsulates the chunk boundaries, such that the inversion of that rangeset is the body content. Collapse the body content down to remove the chunk boundaries, and then parse it as usual.

Once you have the body parsed, and its spans constructed, take the chunk boundary rangeset, and then iteratively offset each body content span's rangeset indices based on the data's original relative position within the source.

This is likely much simpler than trying to integrate the split ranges into the body content parsing logic.

Thoughts?

sinui0 · 2025-01-17T03:16:22Z

Sounds directionally correct. The complexity will lie in handling when fields are split across the chunk boundaries (potentially >1 splits), which is more involved than simply offsetting them. We will need to add split_at support to our RangeSet type, compute where the field intersects with the boundaries, and perform the correct split -> offset -> union operations.

The simplest place to start is to modify the Span type to account for the source data not always being contiguous. Even here there is some additional nuance to consider: collapsing the chunking then parsing means that the Span will be referencing a different memory region than the original source, which has some implications for some of its public API methods.

sinui0 · 2025-01-17T03:46:11Z

The simplest place to start is to modify the Span type to account for the source data not always being contiguous.

Giving this another thought cycle, it would probably make more sense for the parsed types to hold multiple Spans instead of redefining that type.

TheFrozenFire · 2025-01-17T19:16:51Z

Does Span not already support splits by virtue of using a RangeSet? Is there a known circumstance where a Span has multiple discontiguous ranges for its data, and that has a negative effect? When does a Span have a RangeSet with more than one element?

In keeping with the "Span" terminology, I think that the type we're looking for is Vector (Scatter/Gather) addressing (essentially the same as the Vec type in Rust). Sort of obvious to say, but the parsed types should probably just hold Vec<Span>.

Or, essentially change the semantic of Span to Vector, if the RangeSet works the way I'm thinking?

sinui0 added the enhancement New feature or request label Jul 1, 2024

lancenonce mentioned this issue Jul 4, 2024

spansy: support Transfer-Encoding #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spansy: support Transfer-Encoding #34

spansy: support Transfer-Encoding #34

sinui0 commented Jul 1, 2024

TheFrozenFire commented Jan 17, 2025 •

edited

Loading

sinui0 commented Jan 17, 2025

sinui0 commented Jan 17, 2025

TheFrozenFire commented Jan 17, 2025 •

edited

Loading

spansy: support Transfer-Encoding #34

spansy: support Transfer-Encoding #34

Comments

sinui0 commented Jul 1, 2024

TheFrozenFire commented Jan 17, 2025 • edited Loading

sinui0 commented Jan 17, 2025

sinui0 commented Jan 17, 2025

TheFrozenFire commented Jan 17, 2025 • edited Loading

TheFrozenFire commented Jan 17, 2025 •

edited

Loading

TheFrozenFire commented Jan 17, 2025 •

edited

Loading