Skip to content

Conversation

@kaijchen
Copy link
Member

@kaijchen kaijchen commented Nov 23, 2022

What changes were proposed in this pull request?

Introduced a new flush thread in ECKeyOutputStream, pipelining the encode stage and flush stage.
The producer thread does the buffering and encoding, and the consumer thread does the flushing.
They are connected by a ArrayBlockingQueue, whose size is "ec.stripe.queue.size", which defaults to 2.
Necessary refactors has been done to seperate the two stages.

EOFDummyStripe is used to denote the end of the key, CheckpointDummyStripe is used for syncing in tests.
Synchronization was added in tests.

NOTE: Currently the exception handling needs another look. If exception were thrown in the flush thread,
the main thread will not get it until calling Future.get() in close().
This has been fixed in d710aff.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6615

How was this patch tested?

It's covered by existing test. Synchronization was added before checking state.
Mannual test of writing 1 key by freon shows ~20% performance improvement.

@kaijchen kaijchen changed the title HDDS-6615. EC: improve write performance by pipelining encode and flush HDDS-6615. EC: Improve write performance by pipelining encode and flush Nov 23, 2022
@adoroszlai adoroszlai added the EC label Nov 23, 2022
@adoroszlai adoroszlai requested a review from sodonnel November 24, 2022 21:04
@sodonnel
Copy link
Contributor

This looks like a good change. I need to review in more detail next week (out of the office on Friday).

Out of interest, have you benchmarked Ratis writes vs EC writes? How do they compare?

I also wonder if it would be possible to do the writes in writeDataCells and writeParityCells in parallel? Looks like we are writing to one DN, waiting, write to the next, etc. If this is possible to parallel, it would be better in a followup Jira. This one has enough in it already.

@kaijchen
Copy link
Member Author

I also wonder if it would be possible to do the writes in writeDataCells and writeParityCells in parallel? Looks like we are writing to one DN, waiting, write to the next, etc. If this is possible to parallel, it would be better in a followup Jira. This one has enough in it already.

@Override
public void write(byte[] b, int off, int len) throws IOException {
this.currentChunkRspFuture =
writeChunkToContainer(ChunkBuffer.wrap(ByteBuffer.wrap(b, off, len)));
updateWrittenDataLength(len);
}

BlockOutputStream#writeChunkToContainer() is async, and the result of future is checked in ECBlockOutputStreamEntry#getFailedStreams(). So we are syncing at each stripe instead of each cell.

@kaijchen
Copy link
Member Author

Out of interest, have you benchmarked Ratis writes vs EC writes? How do they compare?

I have tested RATIS/THREE vs EC/RS-10-4-1024K before, and here is the rough result:

Writing many keys in EC is slightly faster (~10%) than Ratis, but not as fast as in theory (because EC writes less data).
Reading many keys in EC is slightly slower (~8%) than Ratis, probably because of more connections and smaller blocks.
Comparing single key is unfair, because EC is utilizing more machines than Ratis.

When writing many large keys (1000x 5GB / 100 threads), EC works fine but Ratis cannot complete the write.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for working on this, as it seems a good change.

@sodonnel sodonnel merged commit eeeb0b5 into apache:master Nov 30, 2022
@kaijchen
Copy link
Member Author

kaijchen commented Dec 1, 2022

Thanks @sodonnel for the review.

@kaijchen kaijchen deleted the HDDS-6615 branch December 1, 2022 02:42
Galsza pushed a commit to Galsza/ozone that referenced this pull request Dec 7, 2022
errose28 added a commit to errose28/ozone that referenced this pull request Dec 12, 2022
* master: (110 commits)
  HDDS-7472. EC: Fix NSSummaryEndpoint#getDiskUsage for EC keys (apache#3987)
  HDDS-5704. Ozone URI syntax description in help content needs to mention about ozone service id (apache#3862)
  HDDS-7555. Upgrade Ratis to 2.4.2-8b8bdda-SNAPSHOT. (apache#4028)
  HDDS-7541. FSO recursive delete directory with hierarchy takes much time for cleanup (apache#4008)
  HDDS-7581. Fix update-jar-report for snapshot (apache#4034)
  HDDS-7253. Fix exception when '/' in key name (apache#4038)
  HDDS-7579. Use Netty 4.1.77 for consistency (apache#4031)
  HDDS-7562. Suppress warning about long filenames in tar (apache#4017)
  HDDS-7563. Add a handler for under replicated Ratis containers in RM (apache#4025)
  HDDS-7497. Fix mkdir does not update bucket's usedNamespace (apache#3969)
  HDDS-7567. Invalid entries in LICENSE (apache#4020)
  HDDS-7575. Correct showing of RATIS-THREE icon in Recon UI (apache#4026)
  HDDS-7540. Let reusable workflow inherit secrets (apache#4012)
  HDDS-7568. Bump copyright year in NOTICE (apache#4018)
  HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric (apache#3878)
  HDDS-7510. Recon: Return number of open containers in `/clusterState` endpoint (apache#3989)
  HDDS-7561. Improve setquota, clrquota CLI usage (apache#4016)
  HDDS-6615. EC: Improve write performance by pipelining encode and flush (apache#3994)
  HDDS-7554. Recon UI should show DORMANT in pipeline status filter (apache#4010)
  HDDS-7540. Separate scheduled CI from push/PR workflows (apache#4004)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants