Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer #816

Merged
merged 14 commits into from
Feb 1, 2014
Merged

Buffer #816

merged 14 commits into from
Feb 1, 2014

Conversation

shlevy
Copy link
Contributor

@shlevy shlevy commented Jan 30, 2014

As proposed in #802, this implements Data.Buffer, a pure interface to contiguous chunks of memory. At the interface level, this consists of the following terms in Data.Buffer:

  • An abstract Buffer : Nat -> Type, where the Nat represents the number of valid (i.e. initialized) bytes in the Buffer
  • An allocate : ( sizeHint : Nat ) -> Buffer Z, where sizeHint gives a hint as to how much space should be reserved for future appends to the Buffer
  • A copy : Buffer n -> Buffer n. Multiple Buffer objects may point to the same underlying storage, which may mean that some relatively small Buffer is keeping a large chunk from being freed. copy is semantically equivalent to id except that the underlying storage should be initially uniquely pointed-to by the returned Buffer.
  • A series of append functions for appending values to the buffer in various endiannesses. They are named append{Type}{EN}, where Type is one of [ Bits8, Bits16, Bits32, Bits64, Buffer ] and EN is one of [ Native, BE, LE ]. Each such function has type Buffer n -> ( count : Nat ) -> ( val : {Type} ) -> Buffer (n + (sizeof {Type}) * count and adds count repetitions of val represented in the relevant byte-order. For example, appendBits32LE : Buffer n -> ( count : Nat ) -> ( val : Bits32 ) -> Buffer (n + 4 * count) will append the passed Bits32, in little-endian byte order, to the Buffer count times. For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.
  • A series of peek functions for retrieving values from the buffer in various endiannesses. They are named peek{Type}{EN}, where Type is one of [ Bits8, Bits16, Bits32, Bits64, Buffer ] and EN is one of [ Native, BE, LE ]. Each such function has type Buffer (sizeof {Type} + n) -> ( offset : Fin S n ) -> ( val : {Type} ) and reads val represented in the relevant byte-order at the given byte offset in the Buffer. For example, peekBits16Native : Buffer (2 + n) -> ( offset : Fin S n ) -> ( val : Bits16 ) will read Bits16, in machine-native byte order from the Buffer starting at offset bytes. For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.

Under the hood, each function is implemented on top of a primitive that takes Bits64s in place of Fin ns and Nats, some of which also explicitly take the Buffer size as a Bits64. As I commented in the Data.Buffer module, there are a few issues with this interface:

  1. It may be theoretically nice to represent Buffer size as Fin (2 ^ WORD_BITS) instead of Nat
  2. Primitives take Bits64 when really they should take the equivalent of C's size_t (ideally unboxed)
  3. If we had access to host system information, we could reduce the needed primitives by implementing the LE/BE variants on top of the native variant plus a possible swab function
  4. Would be nice to be able to peek/append Int, Char, and Float, all have fixed (though possibly implementation-dependent) widths. Currently not in place due to lack of host system introspection.
  5. Would be nice to be able to peek/append the vector types, but for now I'm only touching the C backend which AFAICT doesn't support them.
  6. Conversion from Fin to Bits64 (which, re 2, should eventually be a fixed-width implementation-dependent type) is likely inefficient relative to conversion from Nat to Bits64

For now, I've only implemented the Buffer primitives in the C codegen. There was a fair amount of duplication there, which might be reducible somewhat with macros. The implementation uses a fill value and power-of-2 allocation for efficient appends in many cases, see #802 or the code for details. I can try my hand at the LLVM codegen implementation once this is merged in some form.

Fixes #802

@JanBessai
Copy link
Contributor

Very nice work!

I don't see any road blockers for a port to Java, which I will do next.

@edwinb
Copy link
Contributor

edwinb commented Jan 30, 2014

Travis is complaining for reasons I don't understand, so I've restarted the build... otherwise this looks like a good thing to have, but I have a couple of questions (I might have missed things due to only looking over the code quickly):

  • It seems that append, copy, etc are making new buffers rather than mutating in place, is this right?
  • Some mutation could be useful (indeed, I have a possible use for this already...) by e.g. writing ints, chars, strings, other buffers, to a particular point in the buffer. But of course the interface to this ought to live in IO.
  • Why is there a need for the %assert_totals? At first glance it doesn't seem that they should be necessary.

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

  1. Conceptually, none of these mutate in place. Under the hood, in many cases append uses already-allocated storage and doesn't require allocating any new values (just bumping a fill value and filling in the appended data into previously-uninitialized space), and peekBuffer doesn't do any copying, just makes a new view over the existing data. copy is only useful exactly because it makes a new buffer (otherwise it's just id)
  2. I agree that mutation could be very useful, I can take a look at that for future work. Note that the Memory effect already supports mutable buffers, though it requires explicit memory management
  3. The primitive functions are not total as their types do not express the proper boundary rules. prim__NativeBuffer doesn't have a Nat index and, with the exception of allocate, all of the primitives take some sort of information about the size of the prim__NativeBuffer or just assume that there is enough space to do any needed access. When accessed via the Data.Buffer interface, though, all of these constraints are expressed in the types so there is no risk of out-of-bounds access or the like.

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

With my last 2 commits, the

For Buffer and Bits8 the endianness has no real meaning, but the BE and LE variants are provided for symmetry.

comments no longer apply, and instead there is just peekBuffer, appendBuffer, peekBits8, and appendBits8

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

@JanBessai Note that the list of primitives needed has changed slightly with my last two commits.

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

Alright, the latest rebase moves the test into buffer001 to avoid clashes and disables the test on non-C platforms. I don't know of any issues blocking merge now. ( @edwinb anything else to check?)

@edwinb
Copy link
Contributor

edwinb commented Jan 30, 2014

Should be okay to go now. And thanks for fixing the .gitignore - I forgot! There still appears to be a merge conflict though...

Also, don't forget to add yourself to the CONTRIBUTORS file, for credit where it is due :).

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

@edwinb rebased to fix merge conflict and added myself to CONTRIBUTORS

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

@edwinb tests seem to be failing due to problems in the distribution tarball. I can't seem to get cabal sdist working locally (complains about not being able to find Version_idris, but if cabal has decent wildcard support we might be able to change most of the dist lines in the .cabal file to test/*???/*.idr, test/*???/expected, and test/*???/run, right?

@shlevy
Copy link
Contributor Author

shlevy commented Jan 30, 2014

Note that tests work successfully for all backends locally.

@shlevy
Copy link
Contributor Author

shlevy commented Jan 31, 2014

@edwinb ok latest rebase fixes sdist. All green!

shlevy added 14 commits February 1, 2014 17:17
Implements Buffer, allocate, copy, append*, and peek* in terms of as-yet
non-existent (unsafe) primitives.
Buffers don't have endianness, though values stored in buffers do.
It's just the show instance for Endianness
This simplifies the native code significantly, including removing GETTYs
at every Buffer-related function call, at the cost of:

* Needing to store an offset of 0 in cases where the old code could just
  refer directly to the struct Buffer
* Boxing of the offset
* When an append to a buffer requires an allocation + copy, all of the
  data before the offset needs to be allocated and copied too (as
  there doesn't seem to be a good way for the append functions to tell
  the idris code when the offset needs to be changed and when not)

I think the simplification outweighs the cost, but I don't have hard
numbers so YMMV.
edwinb added a commit that referenced this pull request Feb 1, 2014
@edwinb edwinb merged commit 17ddd68 into idris-lang:master Feb 1, 2014
@shlevy shlevy deleted the buffer branch February 1, 2014 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement byte-arrays
3 participants