Skip to content

Conversation

@wenyuzhao
Copy link
Member

@wenyuzhao wenyuzhao commented Dec 8, 2021

This PR adds a lock-free block queue and uses it for Immix's clean block allocation (as BlockPageResource) and recyclable block allocation (as a reusable block queue).

The lock-free block queue has per-thread producer endpoints. Only GC workers can add blocks to the queue. Workers will add the blocks to their thread-local queues first, and flush them to the global pool if the queue is full.

Block allocation can only happen at block granularity. The consumers still share a single endpoint. But on block allocation fast-path, the cost is just a simple atomic increment to update the allocation cursor and pop a block. However, for BlockPageResource, considering the current design of Space.acquire, it still acquires a lock before calling pr.get_new_pages.

Performance

Zen 2 12/24 core:

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/4HVaWb

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/JYCRMu (NEW)

Screenshot 2022-07-11 at 11 01 58 am

TODO

  • Fix h2
  • Bug fixes
  • Performance on shrew (16/32 core Zen 3)

@wenyuzhao wenyuzhao force-pushed the blockpageresource branch from 6049468 to 68fef66 Compare May 12, 2022 01:23
@wenyuzhao wenyuzhao added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Oct 21, 2022
@wenyuzhao wenyuzhao marked this pull request as ready for review October 23, 2022 05:45
@wenyuzhao
Copy link
Member Author

Pushing this PR forward as this will likely solve the Immix scalability issue @clairexhuang is looking at.

@wenyuzhao wenyuzhao requested a review from qinsoon October 23, 2022 05:48
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned with two things:

  1. Using drop() for executing some 'epilogue' code.
  2. We may be able to implement BlockPageResource on top of the other page resources, so we do not need to worry about coping with contiguous/discontiguous spaces.

Copy link
Member

@caizixian caizixian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyuzhao could you run the experiments again with one GC thread? @clairexhuang is seeing some results that the new implementation might be slower than the old one for some benchmarks for low thread counts.

@wenyuzhao
Copy link
Member Author

@wenyuzhao could you run the experiments again with one GC thread? @clairexhuang is seeing some results that the new implementation might be slower than the old one for some benchmarks for low thread counts.

Do you know which benchmarks were slowed down?

@clairexhuang
Copy link
Contributor

Here's the plotty: http://squirrel.anu.edu.au/plotty/claireh/barrier-performance/p/rP6fn3
I was running a normal mmtk build and a build using the blockpageresource branch on skunk (i9, 8/16 thread) and new benchmarks on 2x heap. jython, lusearch and tomcat seem to be particularly bad.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments related to the public API changes in this PR. We should try keep our API unchanged unless it is necessary.

@wenyuzhao
Copy link
Member Author

wenyuzhao commented Nov 30, 2022

I have a few comments related to the public API changes in this PR. We should try keep our API unchanged unless it is necessary.

The API check CI is now passed. No public API changes were introduced.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenyuzhao wenyuzhao requested a review from caizixian November 30, 2022 07:29
@qinsoon qinsoon merged commit 117000f into master Nov 30, 2022
@qinsoon qinsoon deleted the blockpageresource branch November 30, 2022 22:51
wenyuzhao added a commit to wenyuzhao/mmtk-core that referenced this pull request Mar 20, 2023
This PR adds a lock-free block queue and uses it for Immix's clean block allocation (as `BlockPageResource`) and recyclable block allocation (as a reusable block queue).

The lock-free block queue has per-thread producer endpoints. Only GC workers can add blocks to the queue. Workers will add the blocks to their thread-local queues first, and flush them to the global pool if the queue is full.

Block allocation can only happen at block granularity. The consumers still share a single endpoint. But on block allocation fast-path, the cost is just a simple atomic increment to update the allocation cursor and pop a block.  _However, for `BlockPageResource`, considering the current design of `Space.acquire`, it still acquires a lock before calling `pr.get_new_pages`._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants