Buffer #816

shlevy · 2014-01-30T06:26:17Z

As proposed in #802, this implements Data.Buffer, a pure interface to contiguous chunks of memory. At the interface level, this consists of the following terms in Data.Buffer:

An abstract Buffer : Nat -> Type, where the Nat represents the number of valid (i.e. initialized) bytes in the Buffer
An allocate : ( sizeHint : Nat ) -> Buffer Z, where sizeHint gives a hint as to how much space should be reserved for future appends to the Buffer
A copy : Buffer n -> Buffer n. Multiple Buffer objects may point to the same underlying storage, which may mean that some relatively small Buffer is keeping a large chunk from being freed. copy is semantically equivalent to id except that the underlying storage should be initially uniquely pointed-to by the returned Buffer.
A series of append functions for appending values to the buffer in various endiannesses. They are named append{Type}{EN}, where Type is one of [ Bits8, Bits16, Bits32, Bits64, Buffer ] and EN is one of [ Native, BE, LE ]. Each such function has type Buffer n -> ( count : Nat ) -> ( val : {Type} ) -> Buffer (n + (sizeof {Type}) * count and adds count repetitions of val represented in the relevant byte-order. For example, appendBits32LE : Buffer n -> ( count : Nat ) -> ( val : Bits32 ) -> Buffer (n + 4 * count) will append the passed Bits32, in little-endian byte order, to the Buffer count times. For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.
A series of peek functions for retrieving values from the buffer in various endiannesses. They are named peek{Type}{EN}, where Type is one of [ Bits8, Bits16, Bits32, Bits64, Buffer ] and EN is one of [ Native, BE, LE ]. Each such function has type Buffer (sizeof {Type} + n) -> ( offset : Fin S n ) -> ( val : {Type} ) and reads val represented in the relevant byte-order at the given byte offset in the Buffer. For example, peekBits16Native : Buffer (2 + n) -> ( offset : Fin S n ) -> ( val : Bits16 ) will read Bits16, in machine-native byte order from the Buffer starting at offset bytes. For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.

Under the hood, each function is implemented on top of a primitive that takes Bits64s in place of Fin ns and Nats, some of which also explicitly take the Buffer size as a Bits64. As I commented in the Data.Buffer module, there are a few issues with this interface:

It may be theoretically nice to represent Buffer size as Fin (2 ^ WORD_BITS) instead of Nat
Primitives take Bits64 when really they should take the equivalent of C's size_t (ideally unboxed)
If we had access to host system information, we could reduce the needed primitives by implementing the LE/BE variants on top of the native variant plus a possible swab function
Would be nice to be able to peek/append Int, Char, and Float, all have fixed (though possibly implementation-dependent) widths. Currently not in place due to lack of host system introspection.
Would be nice to be able to peek/append the vector types, but for now I'm only touching the C backend which AFAICT doesn't support them.
Conversion from Fin to Bits64 (which, re 2, should eventually be a fixed-width implementation-dependent type) is likely inefficient relative to conversion from Nat to Bits64

For now, I've only implemented the Buffer primitives in the C codegen. There was a fair amount of duplication there, which might be reducible somewhat with macros. The implementation uses a fill value and power-of-2 allocation for efficient appends in many cases, see #802 or the code for details. I can try my hand at the LLVM codegen implementation once this is merged in some form.

Fixes #802

JanBessai · 2014-01-30T08:10:56Z

Very nice work!

I don't see any road blockers for a port to Java, which I will do next.

edwinb · 2014-01-30T10:16:31Z

Travis is complaining for reasons I don't understand, so I've restarted the build... otherwise this looks like a good thing to have, but I have a couple of questions (I might have missed things due to only looking over the code quickly):

It seems that append, copy, etc are making new buffers rather than mutating in place, is this right?
Some mutation could be useful (indeed, I have a possible use for this already...) by e.g. writing ints, chars, strings, other buffers, to a particular point in the buffer. But of course the interface to this ought to live in IO.
Why is there a need for the %assert_totals? At first glance it doesn't seem that they should be necessary.

shlevy · 2014-01-30T14:42:51Z

Conceptually, none of these mutate in place. Under the hood, in many cases append uses already-allocated storage and doesn't require allocating any new values (just bumping a fill value and filling in the appended data into previously-uninitialized space), and peekBuffer doesn't do any copying, just makes a new view over the existing data. copy is only useful exactly because it makes a new buffer (otherwise it's just id)
I agree that mutation could be very useful, I can take a look at that for future work. Note that the Memory effect already supports mutable buffers, though it requires explicit memory management
The primitive functions are not total as their types do not express the proper boundary rules. prim__NativeBuffer doesn't have a Nat index and, with the exception of allocate, all of the primitives take some sort of information about the size of the prim__NativeBuffer or just assume that there is enough space to do any needed access. When accessed via the Data.Buffer interface, though, all of these constraints are expressed in the types so there is no risk of out-of-bounds access or the like.

shlevy · 2014-01-30T16:09:35Z

With my last 2 commits, the

For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.

comments no longer apply, and instead there is just peekBuffer, appendBuffer, peekBits8, and appendBits8

shlevy · 2014-01-30T16:12:21Z

@JanBessai Note that the list of primitives needed has changed slightly with my last two commits.

shlevy · 2014-01-30T17:34:06Z

Alright, the latest rebase moves the test into buffer001 to avoid clashes and disables the test on non-C platforms. I don't know of any issues blocking merge now. ( @edwinb anything else to check?)

edwinb · 2014-01-30T17:56:20Z

Should be okay to go now. And thanks for fixing the .gitignore - I forgot! There still appears to be a merge conflict though...

Also, don't forget to add yourself to the CONTRIBUTORS file, for credit where it is due :).

shlevy · 2014-01-30T18:17:12Z

@edwinb rebased to fix merge conflict and added myself to CONTRIBUTORS

shlevy · 2014-01-30T18:49:13Z

@edwinb tests seem to be failing due to problems in the distribution tarball. I can't seem to get cabal sdist working locally (complains about not being able to find Version_idris, but if cabal has decent wildcard support we might be able to change most of the dist lines in the .cabal file to test/*???/*.idr, test/*???/expected, and test/*???/run, right?

shlevy · 2014-01-30T18:49:36Z

Note that tests work successfully for all backends locally.

shlevy · 2014-01-31T03:50:26Z

@edwinb ok latest rebase fixes sdist. All green!

Implements Buffer, allocate, copy, append*, and peek* in terms of as-yet non-existent (unsafe) primitives.

Buffers don't have endianness, though values stored in buffers do.

It's just the show instance for Endianness

This simplifies the native code significantly, including removing GETTYs at every Buffer-related function call, at the cost of: * Needing to store an offset of 0 in cases where the old code could just refer directly to the struct Buffer * Boxing of the offset * When an append to a buffer requires an allocation + copy, all of the data before the offset needs to be allocated and copied too (as there doesn't seem to be a good way for the append functions to tell the idris code when the offset needs to be changed and when not) I think the simplification outweighs the cost, but I don't have hard numbers so YMMV.

Buffer

shlevy added 14 commits February 1, 2014 17:17

Add test case for simple Buffer functionality

47b3278

Add Data.Buffer module

c22e4ec

Implements Buffer, allocate, copy, append*, and peek* in terms of as-yet non-existent (unsafe) primitives.

Add Buffer type and primitives to the core

149e816

Replace Fin ( n + 1 ), which doesn't reduce, with Fin ( S n )

5750cb5

Add C codegen implementation of buffer primitives

f0fe3ff

Remove some unnecessary copies

ad58d9e

Missing comma

5b70c05

Rename functions implying Buffers have endianness

d6418d7

Buffers don't have endianness, though values stored in buffers do.

Endianness is redundant for Bits8

a2332b0

Remove redundant enName function

a47adaa

It's just the show instance for Endianness

Disable buffer tests on non-C backends for now

49bede9

Add self to contributors

dafb098

Add buffer tests to the distribution

55b241a

edwinb added a commit that referenced this pull request Feb 1, 2014

Merge pull request #816 from shlevy/buffer

17ddd68

Buffer

edwinb merged commit 17ddd68 into idris-lang:master Feb 1, 2014

shlevy deleted the buffer branch February 1, 2014 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer #816

Buffer #816

shlevy commented Jan 30, 2014

JanBessai commented Jan 30, 2014

edwinb commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

edwinb commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 31, 2014

Buffer #816

Buffer #816

Conversation

shlevy commented Jan 30, 2014

JanBessai commented Jan 30, 2014

edwinb commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

edwinb commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 30, 2014

shlevy commented Jan 31, 2014