-
Notifications
You must be signed in to change notification settings - Fork 5
/
streaming-bytestring.cabal
233 lines (223 loc) · 12.6 KB
/
streaming-bytestring.cabal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
name: streaming-bytestring
version: 0.1.4.6
synopsis: effectful byte steams, or: bytestring io done right.
description: This is an implementation of effectful, memory-constrained
bytestrings (byte streams) and functions for streaming
bytestring manipulation, adequate for non-lazy-io.
Some examples of the use of byte streams to implement simple
shell progams can be found
<https://gist.github.com/michaelt/6c6843e6dd8030e95d58 here>.
See also the illustrations of use with e.g. @attoparsec@,
@aeson@, @http-client@, @zlib@ etc. in the
<https://hackage.haskell.org/package/streaming-utils streaming-utils>
library. Usage is as close as possible to that of @ByteString@
and lazy @ByteString@.
.
A @ByteString IO ()@ is the most natural representation of
an effectful stream of bytes arising chunkwise from a handle.
Indeed, the implementation follows the
details of @Data.ByteString.Lazy@ and @Data.ByteString.Lazy.Char8@
in unrelenting detail, omitting only transparently non-streaming
operations like @reverse@. It is just a question of replacing
the lazy bytestring type:
.
> data ByteString = Empty | Chunk Strict.ByteString ByteString
.
with the /minimal/ effectful variant:
.
> data ByteString m r = Empty r | Chunk Strict.ByteString (ByteString m r) | Go (m (ByteString m r))
.
(Constructors are necessarily hidden in internal modules in both the @Lazy@ and the @Streaming@.)
.
That's it. As a lazy bytestring is implemented internally
by a sort of list of strict bytestring chunks, a streaming bytestring is
simply implemented as a /producer/ or /generator/ of strict bytestring chunks.
Most operations are defined by simply adding a line to what we find in
@Data.ByteString.Lazy@. The only possible simplification would
involve specializing to @IO@, throughout - but this would e.g. block
the use of @ResourceT@ to manage handles and the like, and a number
of other convenient operations like @copy@, which permits one to
apply two operations simultaneously over the length of the byte stream.
.
Something like this alteration of type is of course obvious and mechanical, once the idea of
an effectful bytestring type is contemplated and lazy io is rejected.
Indeed it seems that this is the proper expression of what was
intended by lazy bytestrings to begin with. The documentation, after all,
reads
.
* \"A key feature of lazy ByteStrings is the means to manipulate large or
unbounded streams of data without requiring the entire sequence to be
resident in memory. To take advantage of this you have to write your
functions in a lazy streaming style, e.g. classic pipeline composition.
The default I/O chunk size is 32k, which should be good in most circumstances.\"
.
... which is very much the idea of this library: the default chunk size for
'hGetContents' and the like follows @Data.ByteString.Lazy@; operations
like @lines@ and @append@ and so on are tailored not to increase chunk size.
.
The present library is thus if you like nothing but /lazy bytestring done right/.
The authors of @Data.ByteString.Lazy@ must have supposed that
the directly monadic formulation of such their type
would necessarily make things slower. This appears to be a prejudice.
For example, passing a large file of short lines through
this benchmark transformation
.
> Lazy.unlines . map (\bs -> "!" <> Lazy.drop 5 bs) . Lazy.lines
> Streaming.unlines . S.maps (\bs -> chunk "!" >> Streaming.drop 5 bs) . Streaming.lines
.
gives pleasing results like these
.
> $ time ./benchlines lazy >> /dev/null
> real 0m2.097s
> ...
> $ time ./benchlines streaming >> /dev/null
> real 0m1.930s
.
For a more sophisticated operation like
.
> Lazy.intercalate "!\n" . Lazy.lines
> Streaming.intercalate "!\n" . Streaming.lines
.
we get results like these:
.
> time ./benchlines lazy >> /dev/null
> real 0m1.250s
> ...
> time ./benchlines streaming >> /dev/null
> real 0m1.531s
.
The pipes environment would express the latter as
.
> Pipes.intercalates (Pipes.yield "!\n") . view Pipes.lines
.
meaning almost exactly what we mean above, but with results like this
.
> time ./benchlines pipes >> /dev/null
> real 0m6.353s
.
The difference, however, /is emphatically not intrinsic to pipes/;
it is just that
this library depends the @streaming@ library, which is used in place
of @free@ to express the
<http://www.haskellforall.com/2013/09/perfect-streaming-using-pipes-bytestring.html "perfectly streaming">
splitting and iterated division or "chunking" of byte streams.
.
These concepts belong to the ABCs of streaming; @lines@ is just
a textbook example, and it is of course handled correctly in
@Data.ByteString.Lazy@.
But the concepts are /catastrophically mishandled/ in /all/ streaming io libraries
other than pipes. Already the @enumerator@ and @iteratee@ libraries
were completely defeated by @lines@:
see e.g. the @enumerator@ implementation of
<http://hackage.haskell.org/package/enumerator-0.4.20/docs/Data-Enumerator-Text.html#v:splitWhen splitWhen and lines>.
This will concatenate strict text forever, if that's what is coming
in. The rot spreads from there.
It is just a fact that in all of the general streaming io
frameworks other than pipes,it becomes torture to express elementary distinctions
that are transparently and immediately contained in any
idea of streaming whatsoever.
.
Though, as was said above, we barely alter signatures in @Data.ByteString.Lazy@
more than is required by the types, the point of view that emerges
is very much that of
@pipes-bytestring@ and @pipes-group@. In particular
we have these correspondences:
.
> Lazy.splitAt :: Int -> ByteString -> (ByteString, ByteString)
> Streaming.splitAt :: Int -> ByteString m r -> ByteString m (ByteString m r)
> Pipes.splitAt :: Int -> Producer ByteString m r -> Producer ByteString m (Producer ByteString m r)
.
and
.
> Lazy.lines :: ByteString -> [ByteString]
> Streaming.lines :: ByteString m r -> Stream (ByteString m) m r
> Pipes.lines :: Producer ByteString m r -> FreeT (Producer ByteString m) m r
.
where the @Stream@ type expresses the sequencing of @ByteString m _@ layers
with the usual \'free monad\' sequencing.
.
Interoperation with @pipes-bytestring@ uses this isomorphism:
.
> Streaming.ByteString.unfoldrChunks Pipes.next :: Monad m => Producer ByteString m r -> ByteString m r
> Pipes.unfoldr Streaming.ByteString.nextChunk :: Monad m => ByteString m r -> Producer ByteString m r
.
Interoperation with @io-streams@ is thus:
.
> IOStreams.unfoldM Streaming.ByteString.unconsChunk :: ByteString IO () -> IO (InputStream ByteString)
> Streaming.ByteString.reread IOStreams.read :: InputStream ByteString -> ByteString IO ()
.
and similarly for other rational streaming io libraries.
.
Problems and questions about the library can be put as issues on
the github page, or mailed to the
<https://groups.google.com/forum/#!forum/haskell-pipes pipes list>.
.
A tutorial module is in the works;
<https://gist.github.com/michaelt/6c6843e6dd8030e95d58 here>,
for the moment,
is a sequence of simplified implementations of familiar shell utilities.
The same programs are implemented at the end of the excellent
<http://hackage.haskell.org/package/io-streams-1.3.2.0/docs/System-IO-Streams-Tutorial.html io-streams tutorial>.
It is generally much simpler; in some case simpler than what
you would write with lazy bytestrings.
<https://gist.github.com/michaelt/2dcea1ba32562c091357 Here>
is a simple GET request that returns a byte stream.
.
license: BSD3
license-file: LICENSE
author: michaelt
maintainer: [email protected]
-- copyright:
category: Data, Pipes, Streaming
build-type: Simple
extra-source-files: README.md
cabal-version: >=1.10
stability: Experimental
homepage: https://github.com/michaelt/streaming-bytestring
bug-reports: https://github.com/michaelt/streaming-bytestring/issues
source-repository head
type: git
location: https://github.com/michaelt/streaming-bytestring
library
exposed-modules: Data.ByteString.Streaming
, Data.ByteString.Streaming.Char8
, Data.ByteString.Streaming.Internal
-- other-modules:
other-extensions: CPP, BangPatterns, ForeignFunctionInterface, DeriveDataTypeable, Unsafe
build-depends: base <5.0
, deepseq
, bytestring
, mtl >=2.1 && <2.3
, mmorph >=1.0 && <1.2
, transformers >=0.3 && <0.6
, transformers-base
, streaming >= 0.1.4.0 && < 0.1.4.8
, resourcet
, exceptions
if impl(ghc < 7.8)
build-depends:
bytestring < 0.10.4.0
, bytestring-builder
else
build-depends:
bytestring >= 0.10.4
default-language: Haskell2010
ghc-options: -O2
test-suite test
default-language:
Haskell2010
type:
exitcode-stdio-1.0
hs-source-dirs:
tests
main-is:
test.hs
build-depends:
base >= 4 && < 5
, transformers
, tasty >= 0.11.0.4
, tasty-smallcheck >= 0.8.1
, smallcheck >= 1.1.1
, streaming
, streaming-bytestring
, bytestring