Skip to content

Conversation

@kouvel
Copy link
Contributor

@kouvel kouvel commented Mar 7, 2024

Fixed the usage of ThreadpoolMgr::RecycledLists to use the same processor count upon access as what is used to allocate the array.

Customer Impact

The current processor number is used to access the recycled lists array in the thread pool, and the array is allocated using a processor count determined at initialization time. There are some cases where the current processor number may be greater than the predetermined processor count. An out-of-bounds access to the relevant array can lead to a hang, crash, or corruption. A hang was seen on Win11 by a 1p customer, where scheduling changes led to the initializing thread running on a different CPU group from the primary CPU group of the process, the processor count determined using GetSystemInfo returns the number of processors in the CPU group the calling thread is running on, and the primary CPU group had more processors.

Regression?

No

Testing

Verified that the issue is resolved on a multi-CPU-group machine.

Risk

Low, the fix ensures that all accesses into the relevant array are within bounds.

Fixed the usage of ThreadpoolMgr::RecycledLists to use the same processor count upon access as what is used to allocate the array.
@kouvel kouvel added this to the 6.0.x milestone Mar 7, 2024
@kouvel kouvel requested review from ChrisAhna and jkotas March 7, 2024 01:55
@kouvel kouvel self-assigned this Mar 7, 2024
@ghost
Copy link

ghost commented Mar 7, 2024

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixed the usage of ThreadpoolMgr::RecycledLists to use the same processor count upon access as what is used to allocate the array.

Customer Impact

The current processor number is used to access the recycled lists array in the thread pool, and the array is allocated using a processor count determined at initialization time. There are some cases where the current processor number may be greater than the predetermined processor count. An out-of-bounds access to the relevant array can lead to a hang, crash, or corruption. A hang was seen on Win11 by a 1p customer, where scheduling changes led to the initializing thread running on a different CPU group from the primary CPU group of the process, the processor count determined using GetSystemInfo returns the number of processors in the CPU group the calling thread is running on, and the primary CPU group had more processors.

Regression?

No

Testing

Verified that the issue is resolved on a multi-CPU-group machine.

Risk

Low, the fix ensures that all accesses into the relevant array are within bounds.

Author: kouvel
Assignees: kouvel
Labels:

area-System.Threading

Milestone: 6.0.x

Copy link
Member

@ChrisAhna ChrisAhna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. we will take for consideration in 6.0.x

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Mar 7, 2024
@rbhanda rbhanda modified the milestones: 6.0.x, 6.0.29 Mar 7, 2024
@rbhanda rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Mar 7, 2024
@carlossanlop
Copy link
Contributor

@kouvel @jeffschwMSFT today is Code Complete for the April Release. If this is ready, please go ahead and merge it before 4pm PT, otherwise it will have to wait until the May Release.

@mangod9
Copy link
Member

mangod9 commented Mar 11, 2024

Merging since the single test failure is in eventsource

@mangod9 mangod9 merged commit 5dd6fcb into dotnet:release/6.0-staging Mar 11, 2024
@kouvel kouvel deleted the TpFix6 branch March 11, 2024 21:22
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Threading Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants