Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

This is related to #27051. Essentially, we want to use big arrays for
byte reusage when reading and writing to channels. Unfortunately, big
arrays are currently designed to expand forever. This does not fit with
reading from a channel where from time to time you will want to drop
bytes when a message is fully read.

This commit refactors big arrays to only have one array for both
pages and recyclers. This is a change from the current situation where
these are stored in two different arrays. This change simplifies the
array manipulation that is necessary when dropping bytes from the head.

Second, it adds a method dropFromHead that will remove and release
pages up until the provided index. This change also requires the
introduction of potential "offsets" for big arrays.

@Tim-Brooks
Copy link
Contributor Author

I implemented a test for dropping bytes from a ByteArray. I still need to replicate this test for other array types. But I wanted to go ahead and push this PR up to get some feedback in case anyone had an issue with my approach.

@jpountz
Copy link
Contributor

jpountz commented Nov 2, 2017

To me big arrays are a bit too complex already so I'm a bit on the fence about making them even more complex. I'm wondering whether we should build another abstraction on top of the page cache recycler instead of reusing big arrays here?

@s1monw
Copy link
Contributor

s1monw commented Nov 2, 2017

To me big arrays are a bit too complex already so I'm a bit on the fence about making them even more complex. I'm wondering whether we should build another abstraction on top of the page cache recycler instead of reusing big arrays here?

I am ok with this, @tbrooks8 WDYT?

@Tim-Brooks
Copy link
Contributor Author

To me big arrays are a bit too complex already so I'm a bit on the fence about making them even more complex. I'm wondering whether we should build another abstraction on top of the page cache recycler instead of reusing big arrays here?

I have a few thoughts.

  1. The PageCacheRecycler is not really something that we expose. We pass BigArrays to the transport upon creation, so that is what I have access to. We can change that, but that is what exists now.

  2. I imagine that we are still going to want to want to use BigArrays for the outbound byte messages for right now. Which means that we still need access to BigArrays in the transport. Although we can reconsider that at some point. One limitation of passing a ByteReference as the message is that we must wait until the message is completely written before we start releasing pages. Obviously, with a specialized nio transport and a different data structure, we can release pages incrementally as we write a message (maybe an improvement for large messages).

  3. I'm not sure I completely follow when you say this makes BigArrays more complicated. There are two parts of this PR: 1. Some unification between the different big array types (same code paths for page allocation, for resizing, same page array, etc) in AbstractBigArray. I'm not really sure that is more complicated opposed to different. 2. Introduction of offsets and dropping pages from the front of an array. That is more complicated.

I guess what I'm seeing is that I can create a different data structure. But I there is still some stuff in AbstractBigArray that is valuable (all of the power of two indexing work). Do we prefer that I essentially rip out the common offset and "drop from head" work and implement a new AbstractBigArray (hypothetically CircularBigByteArray) that has the specialized offset and dropping logic? In this scenario I would need to keep the unification of the pages and recycler arrays as the old AbstractBigArray is not designed to release pages in a world where the pages have moved around. Essentially I would want to keep some of the work related to part 1 of this PR.

Or should I create an accessor for the PageCacheRecycler from BigArrays and when BigArrays is passed to a transport I access the PageCacheRecycler and create a new allocator thing (hypothetically ChannelBuffers). In this scenario I would probably look to extract a new super class of AbstractBigArray that shares the power of two logic (hypothetically PowerOfTwoArray).

@jpountz
Copy link
Contributor

jpountz commented Nov 2, 2017

I'm not sure I completely follow when you say this makes BigArrays more complicated.

I think my main concern is the introduction of a new way of consuming big arrays via the addition of dropFromHead and offset/size to AbstractBigArray.

Do we prefer that I essentially rip out the common offset and "drop from head" work and implement a new AbstractBigArray (hypothetically CircularBigByteArray) that has the specialized offset and dropping logic?

That would work for me.

Or should I create an accessor for the PageCacheRecycler from BigArrays and when BigArrays is passed to a transport I access the PageCacheRecycler and create a new allocator thing (hypothetically ChannelBuffers).

I don't like exposing the internals of BigArrays, could we pass the PageCacheRecycler in addition to BigArrays to the transport? If yes, then it would work for me too.

In this scenario I would probably look to extract a new super class of AbstractBigArray that shares the power of two logic (hypothetically PowerOfTwoArray).

That logic is simple enough that I wouldn't mind it to be duplicated.

@Tim-Brooks
Copy link
Contributor Author

Thanks @jpountz. Your last comment gives me some approaches to work with.

@s1monw
Copy link
Contributor

s1monw commented Nov 22, 2017

@tbrooks8 is this still on or can we close it?

@Tim-Brooks
Copy link
Contributor Author

Closed. We are going to go with a different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants