Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a foolproof(er) higher-level API for streaming #25

Open
golddranks opened this issue Jun 12, 2018 · 9 comments
Open

Support a foolproof(er) higher-level API for streaming #25

golddranks opened this issue Jun 12, 2018 · 9 comments
Assignees

Comments

@golddranks
Copy link
Contributor

golddranks commented Jun 12, 2018

Hi, I have wrapped the streaming (ring-buffer using) API of miniz_oxide behind a higher level API here: https://github.com/golddranks/stream_zipper/blob/master/src/deflate.rs

Is there any interest in supporting such a higher level API within miniz_oxide itself?

I've experimented with a state machine-like API:

pub enum State<'a> {
    HasMoreOutput(ChunkIter<'a>),
    NeedsMoreInput(InputSink),
    Stop(FinishedStream<'a>),
}

The idea is that every function that operates on the state takes self as value, and as such, parsing the stream can proceed as a state machine alternating between states "HasMoreOutput" and "NeedsMoreInput". The states own the internal state, including the ring buffer, which makes this a higher level API as the one provided by miniz_oxide at the moment.

Here's a roughly how it would be used:

        let mut state = start_deflate_stream();

        for chunk in pure_deflate_stream.chunks(500) {
            if let State::NeedsMoreInput(sink) = state {
                state = sink.feed(chunk)?;
            };
            while let State::HasMoreOutput(out) = state {
                println!("Got {:?}", out.get());
                state = out.next()?;
            }
        }

Another experiment I did was to have just InputSink, and then passing a callback that gets the uncompressed output. This is a kind of an internal iterator. Again, here's an example

        let mut state = start_deflate_stream();

        for chunk in pure_deflate_stream.chunks(50) {
            if let State::NeedsMoreInput(sink) = state {
                state = sink.inner_iter(chunk, |out| {
                    println!("Got {:?}", out);
                })?;
            };
        }

What do you think about introducing such a high level API?

@golddranks golddranks changed the title Support a foolproof higher-level API for streaming Support a foolproof(er) higher-level API for streaming Jun 12, 2018
@golddranks
Copy link
Contributor Author

Btw. I can send a PR and work with the maintainers of this library to get it into a shape they want, I just want to know if you are open to such an idea so I won't do needless work.

@Frommi
Copy link
Owner

Frommi commented Jun 15, 2018

Hi. Sorry for the delayed reply. Your State is cleaner than what we have now, I really like it. I think there is a place for the safe wrapper that does not lose streaming functionality. Tomorrow I will have some time to go over it in detail and say something more concrete.

@Frommi
Copy link
Owner

Frommi commented Jun 17, 2018

Okay, I read the code and I think that you are onto something. There are two things that stand out for me:

  • The need for two functions get and next make the HasMoreOutput case feel a bit clunky. Maybe make something like state = out.next(|out_slice| println!("Got {:?}", out_slice))?
  • I think inner_iter is a misnomer. Iter is something that iterates, not a once-executing function. It looks more like a map. And why return full State if that function executes closure multiple times until it becomes a sink again and then wrap it in a State?

So, a bit more generally:

  • For every input action there is one-or-more output actions.
  • If we want to have a out slice as a return value without copies and closures then there is no escaping two function calls, I think.
  • It's probably better to be a bit more explicit about whether a function call executes closure only once or multiple times.

I think the best base abstraction in this case will be this:

  • Initialization with two closures: one that returns next input slice and one that takes output slice.
  • One function next that executes exactly one of the closures.

Then there are quality-of-life things like execution-till-next-input or iterator instead of input closure, etc.

What do you think? Does this makes sense?

@golddranks
Copy link
Contributor Author

Sorry for the delayed reply. I'm on a vacation at the moment, I'll come back to this later when I have time!

@golddranks
Copy link
Contributor Author

Just two quick things: I called it inner_iter because there may be multiple outputs for single input and you have to iterate through them before feeding it more input. (Imagine having chunks of one 100 kilobytes, for example!)

The reason why I'd like to avoid callback-style closure in the "basic" hi-level API is that they obstruct error handling because you can't effectively use the ? operator inside them.

@Frommi
Copy link
Owner

Frommi commented Jun 28, 2018

The point about the ? operator is very good. I now think that your first approach with API is best and I can't see ways to improve it. I thought about maybe somehow statically make sure that get was called before next in HasMoreOutput case, but there is no way to do it not extremely clunky.

@golddranks
Copy link
Contributor Author

Do you want a PR then? I'm afraid that I'm currently quite busy, but I can try and work with it little by little?

@Frommi
Copy link
Owner

Frommi commented Jul 3, 2018

It's okay, I am mostly free on the weekends. Most of the work is already done anyway.

@Frommi Frommi self-assigned this Jul 3, 2018
@oyvindln
Copy link
Collaborator

Hoping we can look into this again, the current rust API is a bit inconsistent and was quickly cobbled together to provide a basis for a C API mirroring miniz. It's not all that easy to use, and it would be nice to have an alternative to using it through flate2 in areas with no std-support.

The original post have one suggestion which may work for how the streaming part could look, maybe there are other suggestions, or some existing API designes out there that would work as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants