Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -95,23 +95,32 @@ private short put(byte[] array, int offset, int length) {
byte[] stored = new byte[length];
Bytes.putBytes(stored, 0, array, offset, length);

if (currSize < initSize) {
// There is space to add without evicting.
if (indexToNode[currSize] == null) {
indexToNode[currSize] = new Node();
}
indexToNode[currSize].setContents(stored, 0, stored.length);
setHead(indexToNode[currSize]);
short ret = (short) currSize++;
nodeToIndex.put(indexToNode[ret], ret);
return ret;
Node node = new Node();
node.setContents(stored, 0, stored.length);
if (nodeToIndex.containsKey(node)) {
short index = nodeToIndex.get(node);
node = indexToNode[index];
moveToHead(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to discuss in the email, if here always use the previous node, then how can it ensure that the previous node is a completed one?

Copy link
Contributor Author

@thangTang thangTang Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand what "completed one" means.
But this patch does not actually change the logic of the previous use of this LRUCache:
RingBufferEventHandler#onEvent -> RingBufferEventHandler#append -> ProtobufLogWriter#append -> CompressedKvEncoder#write -> LRUDictionary#findEntry
For this actually used write link, findEntry uses the previously existing node.
We did find NPE on the read link, but the root reason is not the implementation of this LRUCache (this patch), but that the LRUCache is polluted.
I just unified the logic of addEntry with the actual logic on the write link, which I think is more elegant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call trace is WALEntryStream#tryAdvanceEntry->ProtobufLogReader#readNext->CompressedKVDecoder#readIntoArray->LRUDirectory#addEntry->then here the changed BidirectionalLRUMap#put.
Your change only makes the newly some node will not be added to the directory, but the old same node may have uncompleted data, e.g. tailing the WAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i got your point. As we discussed in email, to solve the problem you mentioned, we need to rebuild the LRUCache every time when we re-seek to somewhere(will done it in 26849), and in the future we could try to implement a 'versioned' cache for replication.
This patch is just for code optimization, not to solve the problem. So it's an "Improvement", not a "bug".
Could this answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new value maybe a completed one, this improvement can not prove using the old value is always better than the new value, except the performance improvement.
I think an umbrella should be created to track the problem mentioned in the email, and this issue can be a child of it. So before the umbrella issue is completed, all the child codes can be tested together.
Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a stability or performance standpoint, I don't think it's a good or bad/right or wrong question since it doesn't change the existing logic.
But from a code architecture point of view, I think this way is better. The original implementation is to put the logic of "find the existing node and return" into findEntry, and directly expose addEntry to the outside, which leads to the possibility of inconsistent behavior between the two. So I think we can completely encapsulate the same logic in addEntry (although this does not bring any stability improvement for now).
But if you would like to wait for 26849 to finish and watch it together, I think it's OK~

Copy link
Contributor

@apurtell apurtell Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow the discussion because in the proposed improvement the old node is not reused if the contents being stored are different.

  Node node = new Node();
  node.setContents(stored, 0, stored.length);
  if (nodeToIndex.containsKey(node)) {
     // new logic reusing existing entry and index
     // ...
  } else {
     // original logic adding new entry
     // ...
  }      

containsKey will use hashcode of Node, which is Bytes.hashCode over the contents. A previous short read and a current full read will have different contents so different hashcode, right? If so, this just reuses an entry that has equivalent data, which I agree is an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old node is not reused if the contents being stored are different.

Completely correct.
This patch only reuse SAME node.
Actually, In the previous implementation, if the nodes are same, the existing nodes will also be reused too, the only difference is this logic were in findIdx:

private short findIdx(byte[] array, int offset, int length) {
      Short s;
      final Node comparisonNode = new Node();
      comparisonNode.setContents(array, offset, length);
      if ((s = nodeToIndex.get(comparisonNode)) != null) {
        moveToHead(indexToNode[s]);
        return s;
      } else {
        return -1;
      }
    }

For the write link:

CompressedKvEncoder#write
->
LRUDictionary#findEntry (LRUDictionary#findIdx)
->
LRUDictionary#addEntry

But for the read link:

CompressedKVDecoder#readIntoArray
->
LRUDirectory#addEntry

We could see, on the read link, it just addEntry directly, without findIdx(reuse the existing same node).
So, I just thought it would be more beautiful to write this way.

return index;
} else {
short s = nodeToIndex.remove(tail);
tail.setContents(stored, 0, stored.length);
// we need to rehash this.
nodeToIndex.put(tail, s);
moveToHead(tail);
return s;
if (currSize < initSize) {
// There is space to add without evicting.
if (indexToNode[currSize] == null) {
indexToNode[currSize] = new Node();
}
indexToNode[currSize].setContents(stored, 0, stored.length);
setHead(indexToNode[currSize]);
short ret = (short) currSize++;
nodeToIndex.put(indexToNode[ret], ret);
return ret;
} else {
short s = nodeToIndex.remove(tail);
tail.setContents(stored, 0, stored.length);
// we need to rehash this.
nodeToIndex.put(tail, s);
moveToHead(tail);
return s;
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,25 @@ public void testPassingEmptyArrayToFindEntry() {
@Test
public void testPassingSameArrayToAddEntry() {
// Add random predefined byte array, in this case a random byte array from
// HConstants. Assert that when we add, we get new index. Thats how it
// works.
// HConstants.
// Assert that when we add, we get old index.
// Because we DO NOT need to write a new one.
int len = HConstants.CATALOG_FAMILY.length;
int index = testee.addEntry(HConstants.CATALOG_FAMILY, 0, len);
assertFalse(index == testee.addEntry(HConstants.CATALOG_FAMILY, 0, len));
assertFalse(index == testee.addEntry(HConstants.CATALOG_FAMILY, 0, len));
assertTrue(index == testee.addEntry(HConstants.CATALOG_FAMILY, 0, len));
assertTrue(index == testee.addEntry(HConstants.CATALOG_FAMILY, 0, len));
}

@Test
public void testPassingSameArrayToAddEntryThenEvict() {
testee.init(3);
byte[] byte0 = Bytes.toBytes(0);
byte[] byte1 = Bytes.toBytes(1);
assertEquals(0, testee.addEntry(byte0, 0, byte0.length));
assertEquals(0, testee.addEntry(byte0, 0, byte0.length));
assertEquals(0, testee.addEntry(byte0, 0, byte0.length));
assertEquals(1, testee.addEntry(byte1, 0, byte1.length));
assertEquals(0, testee.addEntry(byte0, 0, byte0.length));
}

@Test
Expand Down