Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend MultiGzDecoder over GzDecoder in docs #324

Merged
merged 8 commits into from
Jul 30, 2023
37 changes: 23 additions & 14 deletions src/gz/bufread.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,12 +167,22 @@ impl<R: BufRead + Write> Write for GzEncoder<R> {
}
}

/// A gzip streaming decoder
/// A decoder for the first member of a [gzip file].
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// This structure consumes a [`BufRead`] interface, reading compressed data
/// This structure exposes a [`BufRead`] interface, reading compressed data
/// from the underlying reader, and emitting uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the final text, it should stick to the RFC terminology of "member" rather than using "stream" in some places.

/// After reading the first member of a gzip file (which is often, but not
/// always, the only member), this reader will return Ok(0) even if there
/// are more bytes available in the underlying reader. If you want to be sure
/// not to drop bytes on the floor, call `into_inner()` after Ok(0) to
/// recover the underlying reader.
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// To handle gzip files that may have multiple members, see [`MultiGzDecoder`]
/// or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
/// [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
///
/// # Examples
Expand Down Expand Up @@ -397,20 +407,19 @@ impl<R: BufRead + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a [gzip file] that may have multiple members.
///
/// This structure exposes a [`BufRead`] interface that will consume compressed
/// data from the underlying reader and emit uncompressed data.
///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while [`GzDecoder`] will only decompress
/// the first gzip member. The multistream format is commonly used in
/// bioinformatics, for example when using the BGZF compressed data. It's also useful
/// to compress large amounts of data in parallel where each thread produces one stream
/// for a chunk of input data.
/// A gzip file consists of a series of *members* concatenated one after another.
/// MultiGzDecoder decodes all members of a file and returns Ok(0) once the
/// underlying reader does.
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// This structure exposes a [`BufRead`] interface that will consume all gzip members
/// from the underlying reader and emit uncompressed data.
/// To handle members seperately, see [GzDecoder] or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
/// [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
///
/// # Examples
Expand Down
37 changes: 22 additions & 15 deletions src/gz/read.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,22 @@ impl<R: Read + Write> Write for GzEncoder<R> {
}
}

/// A gzip streaming decoder
/// A decoder for the first member of a [gzip file].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same changes as for bufread

///
/// This structure exposes a [`Read`] interface that will consume compressed
/// data from the underlying reader and emit uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

/// [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
/// After reading the first member of a gzip file (which is often, but not
/// always, the only member), this reader will return Ok(0) even if there
/// are more bytes available in the underlying reader. If you want to be sure
/// not to drop bytes on the floor, call `into_inner()` after Ok(0) to
/// recover the underlying reader.
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// To handle gzip files that may have multiple members, see [`MultiGzDecoder`]
/// or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
///
/// # Examples
///
Expand Down Expand Up @@ -180,21 +189,19 @@ impl<R: Read + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a [gzip file] that may have multiple members.
///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while [`GzDecoder`] will only decompress the
/// first gzip member. The multistream format is commonly used in bioinformatics,
/// for example when using the BGZF compressed data. It's also useful
/// to compress large amounts of data in parallel where each thread produces one stream
/// for a chunk of input data.
/// This structure exposes a [`Read`] interface that will consume compressed
/// data from the underlying reader and emit uncompressed data.
///
/// This structure exposes a [`Read`] interface that will consume all gzip members
/// from the underlying reader and emit uncompressed data.
/// A gzip file consists of a series of *members* concatenated one after another.
/// MultiGzDecoder decodes all members of a file and returns Ok(0) once the
/// underlying reader does.
///
/// [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
/// To handle members seperately, see [GzDecoder] or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
///
/// # Examples
///
Expand Down
35 changes: 22 additions & 13 deletions src/gz/write.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,19 @@ impl<W: Write> Drop for GzEncoder<W> {
}
}

/// A gzip streaming decoder
/// A decoder for the first member of a [gzip file].
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// This structure exposes a [`Write`] interface that will emit uncompressed data
/// to the underlying writer `W`.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
/// This structure exposes a [`Write`] interface, receiving compressed data and
/// writing uncompressed data to the underlying writer.
///
/// After decoding the first member of a gzip file, this writer will return XXX
/// to all subsequent writes.
Byron marked this conversation as resolved.
Show resolved Hide resolved
///
/// To handle gzip files that may have multiple members, see [`MultiGzDecoder`]
/// or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
Copy link
Member

@joshtriplett joshtriplett Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
/// [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html
///
/// # Examples
Expand Down Expand Up @@ -373,17 +380,19 @@ impl<W: Read + Write> Read for GzDecoder<W> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.
///
/// This structure exposes a [`Write`] interface that will consume compressed data and
/// write uncompressed data to the underlying writer.
///
/// A gzip file consists of a series of *members* concatenated one after another.
/// `MultiGzDecoder` decodes all members of a file and writes them to the
/// underlying writer one after another.
///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while `GzDecoder` will only decompress
/// the first gzip member. The multistream format is commonly used in
/// bioinformatics, for example when using the BGZF compressed data.
/// To handle members separately, see [GzDecoder] or read more
/// [in the introduction](../index.html#about-multi-member-gzip-files).
///
/// This structure exposes a [`Write`] interface that will consume all gzip members
/// from the written buffers and write uncompressed data to the writer.
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
#[derive(Debug)]
pub struct MultiGzDecoder<W: Write> {
inner: GzDecoder<W>,
Expand Down
21 changes: 21 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,33 @@
//! `Write` trait if `T: Write`. That is, the "dual trait" is forwarded directly
//! to the underlying object if available.
//!
//! # About multi-member Gzip files
//!
//! While most `gzip` files one encounters will have a single *member* that can be read
//! with the [`GzDecoder`], there may be some files which have multiple members.
//!
//! If these are read with a [`GzDecoder`], only the first member will be consumed and
//! the rest will silently be left alone, which can be surprising.
Byron marked this conversation as resolved.
Show resolved Hide resolved
//!
//! The [`MultiGzDecoder`] on the other hand will decode all members of a `gzip` file
//! into one consecutive stream of bytes, which hides the underlying *members* entirely.
//! If a file contains contains non-gzip data after the gzip data, MultiGzDecoder will
//! emit an error after decoding the gzip data. This behavior matches the `gzip`,
//! `gunzip`, and `zcat` command line tools.
//!
//! Chrome and Firefox appear to implement behavior like `GzDecoder`, ignoring data
//! after the first member. `curl` appears to implement behavior somewhat like
//! `GzDecoder`, only decoding the first member, but emitting an error if there is
//! data after the first member, whether or not it is gzip data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting for determining common practice, but the documentation should describe flate2 behavior, not that of other tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and will remove the last paragraph.

//!
//! [`read`]: read/index.html
//! [`bufread`]: bufread/index.html
//! [`write`]: write/index.html
//! [read]: https://doc.rust-lang.org/std/io/trait.Read.html
//! [write]: https://doc.rust-lang.org/std/io/trait.Write.html
//! [bufread]: https://doc.rust-lang.org/std/io/trait.BufRead.html
//! [`GzDecoder`]: read/struct.GzDecoder.html
//! [`MultiGzDecoder`]: read/struct.MultiGzDecoder.html
#![doc(html_root_url = "https://docs.rs/flate2/0.2")]
#![deny(missing_docs)]
#![deny(missing_debug_implementations)]
Expand Down