Skip to content

Conversation

@guohao-rosicky
Copy link
Contributor

What changes were proposed in this pull request?

The XceiverClient cache is often expired when using ec reads

I am currently using ec for large concurrent reads and found that on the client side, a large number of XceiverClient cache remove logs .

Now each read builds a new pipeline, I think we can use the datanode id, as the pipeline id, so that we can take the client's XceiverClient cache.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8922

How was this patch tested?

Existing test

@guohao-rosicky
Copy link
Contributor Author

hi @ChenSammi @sodonnel What are your thoughts on this change?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guohao-rosicky for the patch.

Tested it locally using Freon, reading the same 3MB EC key 1024 times concurrently in 16 threads. Cache stats from XceiverClientManager confirm the fix.

Before:

CacheStats{hitCount=3075, missCount=3075, loadSuccessCount=3075, loadExceptionCount=0, totalLoadTime=3789988382, evictionCount=2819}

After:

CacheStats{hitCount=6147, missCount=3, loadSuccessCount=3, loadExceptionCount=0, totalLoadTime=324556550, evictionCount=0}

@adoroszlai adoroszlai changed the title HDDS-8922. The XceiverClient cache is often expired when using ec reads HDDS-8922. Random EC pipeline ID causes XceiverClient cache churn Jun 25, 2023
@adoroszlai adoroszlai merged commit 12caaea into apache:master Jun 25, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* master: (79 commits)
  HDDS-8914. Datanode may fail to start due to duplicate VolumeInfoMetrics (apache#4966)
  HDDS-8921. Add support for EC in Freon SCM block generator (apache#4982)
  HDDS-8927. Metadata scanner should not scan unhealthy containers. (apache#4976)
  HDDS-8929. Avoid list allocation for pipeline search (apache#4980)
  HDDS-8778. Support recursive volume delete using Ozone sh command. (apache#4842)
  HDDS-8885. Quota repair count enable quota feature for old bucket/volume. (apache#4941)
  HDDS-8771. Refactor volume level tmp directory for generic usage. (apache#4838)
  HDDS-8922. Random EC read pipeline ID causes XceiverClient cache churn (apache#4971)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* master:
  HDDS-8914. Datanode may fail to start due to duplicate VolumeInfoMetrics (apache#4966)
  HDDS-8921. Add support for EC in Freon SCM block generator (apache#4982)
  HDDS-8927. Metadata scanner should not scan unhealthy containers. (apache#4976)
  HDDS-8929. Avoid list allocation for pipeline search (apache#4980)
  HDDS-8778. Support recursive volume delete using Ozone sh command. (apache#4842)
  HDDS-8885. Quota repair count enable quota feature for old bucket/volume. (apache#4941)
  HDDS-8771. Refactor volume level tmp directory for generic usage. (apache#4838)
  HDDS-8922. Random EC read pipeline ID causes XceiverClient cache churn (apache#4971)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants