Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

  1. Freon validators read each item to be validated completely into a byte[] buffer. This allows timing only the read (and buffer allocation), but not the subsequent digest calculation. However, it also means that memory required for running the validators is proportional to key size. I propose to add a command-line flag (-s / --stream) which, when specified, makes Freon calculate the digest while reading the input stream. This changes timing results a bit, since values will include the time required for digest calculation. On the other hand, Freon will be able to validate huge keys with limited memory.
  2. Reduce the memory requirement of the non-stream version by allocating a buffer exactly the size of the key. This adds a bit of overhead in time, since key info needs to be fetched, too. But it eliminates ByteArrayOutputStream, which allocates incrementally larger and larger buffers. The latter can lead to memory requirement twice the actual key size in the worst case (since 2^n > 2^n-1 + 2^n-2 + ...).
  3. Get rid of code duplication between SameKeyReader and OzoneClientKeyValidator.
  4. Allow OzoneClientKeyGenerator to create > 2GB keys.

https://issues.apache.org/jira/browse/HDDS-2467

How was this patch tested?

Created and validated keys using Freon. Verified that even 2.5GB key can be created and validated with --stream. Verified that streaming is forced for such a large key, since it won't fit any array. Verified that smaller keys can be validated both ways.

export HADOOP_OPTS='-Xmx1024M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockg -t 1 -F ONE -n 1 -p 2_5GB -s 2684354560
ozone freon ockg -t 1 -F ONE -n 1 -p 256MB -s  268435456
ozone freon ockg -t 1 -F ONE -n 1 -p 128MB -s  134217728
ozone freon ockg -t 1 -F ONE -n 1 -p  64MB -s   67108864
ozone freon ockg -t 1 -F ONE -n 1 -p  10KB -s      10240

export HADOOP_OPTS='-Xmx128M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockv -t 1 -n 1 -p  10KB
ozone freon ockv -t 1 -n 1 -p  64MB

export HADOOP_OPTS='-Xmx64M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockv -t 1 -n 1 -p  10KB -s
ozone freon ockv -t 1 -n 1 -p  64MB -s
ozone freon ockv -t 1 -n 1 -p 128MB -s
ozone freon ockv -t 1 -n 1 -p 256MB -s
ozone freon ockv -t 1 -n 1 -p 2_5GB -s
ozone freon ockv -t 1 -n 1 -p 2_5GB

ozone freon ockg -t 1 -F ONE -n 100 -p 1KB -s 1024
ozone freon ockv -n 100 -p 1KB

ozone freon ocokr -t 4 -k  '64MB/0' -n 32 -s
ozone freon ocokr -t 8 -k '256MB/0' -n 16 -s

export HADOOP_OPTS='-Xmx1024M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ocokr -t 2 -k '256MB/0' -n 16


// force stream if key is too large for byte[]
// (limit taken from ByteArrayOutputStream)
if (referenceKeySize > Integer.MAX_VALUE - 8) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can force this much sooner, say 1 GB ? or even 100MB. Otherwise, the virtual memory reservation would be huge.
And practically, does it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any lower limit would be arbitrary. Actual memory requirement depends on the number of threads. One can always add the flag explicitly. I just wanted to preserve existing behavior as a courtesy.

Hope that makes sense :)

Copy link
Contributor

@anuengineer anuengineer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. LGTM. I have a minor comment, you can fix that in some other JIRA too. if I commit this patch before you get a chance to see my comment.

Copy link
Contributor

@dineshchitlangia dineshchitlangia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai thanks for the contribution, @anuengineer thanks for the review.

@dineshchitlangia dineshchitlangia merged commit 2fea0af into apache:master Nov 20, 2019
@adoroszlai adoroszlai deleted the HDDS-2467 branch November 20, 2019 06:06
@adoroszlai
Copy link
Contributor Author

Thanks @anuengineer the review and @dineshchitlangia for review/merge.

ptlrs pushed a commit to ptlrs/ozone that referenced this pull request Mar 8, 2025
…e manager double buffer batch (apache#7188) (apache#152)

(cherry picked from commit 5feb9ea)
Change-Id: Iee80b0b1aef2a7585b45c60d3826c08a9c926247

Co-authored-by: Swaminathan Balachandran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants