Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend MultiGzDecoder over GzDecoder in docs #324

Merged
merged 8 commits into from
Jul 30, 2023
18 changes: 7 additions & 11 deletions src/gz/bufread.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,11 @@ impl<R: BufRead + Write> Write for GzEncoder<R> {
}
}

/// A gzip streaming decoder
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

///
/// This structure consumes a [`BufRead`] interface, reading compressed data
/// from the underlying reader, and emitting uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the final text, it should stick to the RFC terminology of "member" rather than using "stream" in some places.

/// [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
///
Expand Down Expand Up @@ -397,20 +397,16 @@ impl<R: BufRead + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a complete [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a complete [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while [`GzDecoder`] will only decompress
/// the first gzip member. The multistream format is commonly used in
/// bioinformatics, for example when using the BGZF compressed data. It's also useful
/// to compress large amounts of data in parallel where each thread produces one stream
/// for a chunk of input data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

///
/// This structure exposes a [`BufRead`] interface that will consume all gzip members
/// from the underlying reader and emit uncompressed data.
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
/// [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
///
/// # Examples
Expand Down
21 changes: 7 additions & 14 deletions src/gz/read.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,11 @@ impl<R: Read + Write> Write for GzEncoder<R> {
}
}

/// A gzip streaming decoder
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

///
/// This structure exposes a [`Read`] interface that will consume compressed
/// data from the underlying reader and emit uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
/// [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

/// # Examples
///
Expand Down Expand Up @@ -180,21 +178,16 @@ impl<R: Read + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a full [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a full [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while [`GzDecoder`] will only decompress the
/// first gzip member. The multistream format is commonly used in bioinformatics,
/// for example when using the BGZF compressed data. It's also useful
/// to compress large amounts of data in parallel where each thread produces one stream
/// for a chunk of input data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

@joshtriplett joshtriplett Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

///
/// This structure exposes a [`Read`] interface that will consume all gzip members
/// from the underlying reader and emit uncompressed data.
///
/// [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
///
/// # Examples
///
Expand Down
17 changes: 8 additions & 9 deletions src/gz/write.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,11 +166,11 @@ impl<W: Write> Drop for GzEncoder<W> {
}
}

/// A gzip streaming decoder
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

///
/// This structure exposes a [`Write`] interface that will emit uncompressed data
/// to the underlying writer `W`.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

@joshtriplett joshtriplett Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

/// [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html
///
Expand Down Expand Up @@ -373,17 +373,16 @@ impl<W: Read + Write> Read for GzDecoder<W> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a full [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a full [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

///
/// A gzip member consists of a header, compressed data and a trailer. The [gzip
/// specification](https://tools.ietf.org/html/rfc1952), however, allows multiple
/// gzip members to be joined in a single stream. `MultiGzDecoder` will
/// decode all consecutive members while `GzDecoder` will only decompress
/// the first gzip member. The multistream format is commonly used in
/// bioinformatics, for example when using the BGZF compressed data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

Copy link
Member

@Byron Byron Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied this change here and to the other 2 places where the same text appears.

Edit: I reverted the change as it is contended - din't read the messages below until now.

///
/// This structure exposes a [`Write`] interface that will consume all gzip members
/// from the written buffers and write uncompressed data to the writer.
///
/// [gzip file]: https://www.rfc-editor.org/rfc/rfc1952#page-5
#[derive(Debug)]
pub struct MultiGzDecoder<W: Write> {
inner: GzDecoder<W>,
Expand Down