Add in-process concurrency support to the PS worker #123

daxian-dbw · 2019-01-05T04:16:21Z

Add in-process concurrency support to the PS worker:

The major changes are:

Use PSThreadOptions.ReuseThread for the InitialSessionState when creating Runspace, so that every PowerShellManager only creates one thread and then reuse it afterwards. The default behavior is to create a new thread every time PowerShell.Invoke is called.
Update RequestProcessor to process InvocationRequest in asynchronously via tasks.
Implement PowerShellManagerPool using BlockingCollection
- make upper bound of the pool configurable via an environment variable PSWorkerInProcConcurrencyUpperBound
- make the pool able to expand in a lazy way
- checkout PowerShellManager via CheckoutIdleWorker on the main thread. Once getting an idle instance back, the main thread will queue a task to process an invocation request on a thread-pool thread and forget about it -- the main thread then can go ahead to process the next message.
Update the RpcLogger and make every PowerShellManager have its own logger instance.
- also update the way to set the RequestId and InvocationId for logger. The original way to setup the context only works for single-thread design.
Update MessagingStream to use a BlockingCollection to hold all messages that are about to be written out, then use a single thread-pool thread to take out items and write them to the gRPC channel.
- currently, the way we write out response/log messages is completely task-based/async, using a semaphore for synchronization. However, this approach doesn't guarantee the order of the message.
- this is because there could be multiple tasks blocked on the semaphore, and releasing the semaphore allows a blocked task to enter the semaphore, but there is no guaranteed order, such as first-in-first-out, for blocked threads to enter the semaphore.
- so, the unblocked task could be a random one, and thus change the arrival order of the message when writing the message to the gRPC channel.
Remove the two system logging we have in our worker, because they drastically worsen the processing time per an invocation request when there are a lot in-coming invocation requests.
- the logging for "TriggerMetadata" parameter is not that useful, and should be removed
- the execution time logging is good to have, but not necessary, especially when it impact the throughput.
- below are some data for comparing before/after removing the 2 system logging. The AveInvTime represents the average processing time (ms) for a invocation request, and it's 94 vs. 2.6 ms, huge improvement.

Before Removing

Request,   InvRequest,   InvDone,   AveGetReqTime,     AveInvTime,        AveFetchFromPoolTime,   AveReturnToPoolTime,   AveWriteResponseTime
6008,      6000,         5994,      28.7243675099867,  93.2944611277945,  17.835,                 0.000166833500166834,  0.0191858525191859
6108,      6100,         6095,      28.2540929927963,  94.2059064807219,  18.02868852,            0.000164068908941756,  0.0188679245283019
6208,      6200,         6194,      27.7989690721649,  94.585728123991,   18.11709678,            0.000161446561188247,  0.0185663545366484

After Removing

Request,   InvRequest,   InvDone,   AveGetReqTime,     AveInvTime,        AveFetchFromPoolTime,   AveReturnToPoolTime,   AveWriteResponseTime
7008,      7000,         6999,      35.5720605022831,  2.67166738105444,  0.00557142857142857,    0.000142877553936277,  0.00257179597085298
7108,      7100,         7099,      35.3432751828925,  2.67333427243274,  0.00549295774647887,    0.000140864910550782,  0.00253556838991407
7208,      7200,         7199,      35.1571864594895,  2.67356577302403,  0.00541666666666667,    0.000138908181691902,  0.00250034727045423

Throughput investigation

Design Change

The original design is to not block the main thread -- it reads a request message, start a task for processing and forget about it, then it reads the next message.
With this design, every in-coming request will be handled immediately -- a task will be created for it. So, a large amount of tasks will be created rapidly, and thus a large amount of thread-pool threads will be assigned to run those tasks. All those threads will compete for an idle PowerShellManager from the pool, which results in lock contention in the pool. The average time for fetching an idle PowerShellManager instance gets worse dramatically when velocity of in-coming requests reaches about 70/s -- it reaches 994 ms in my measurements, and becomes a huge bottleneck.

Therefore, I changed the design to make the main thread the only one that checks out idle PowerShellManager instance. When the main thread is able to check out one, it creates a task with the psManager for processing the invocation request in a thread-pool thread. When the main thread is not able to check out one, it's blocked on the pool until an idle instance becomes available in the pool.
With this design, the PS worker creates tasks only if there is a PowerShellManager instance available, so the number of tasks running in parallel is limited to the size of the pool. In this way, lock contention is minimized when checking out instance from the pool, and the pressure on TaskScheduler is also minimized.

Below are some data for comparing before/after the design change. Pay attention to the AveFetchFromPoolTime column. The average time to check out a PowerShellManager instance from the pool for an invocation request was ~900 ms with the original design when there are fast in-coming requests. The average time for the same with the new design drops drastically. Be noted that, the AveFetchFromPoolTime drops to 0.0054 ms after removing the 2 system logging because it takes much less time for processing every invocation request, and thus an PowerShellManager instance can become idle and reclaimed by the pool much faster.

System Logging Mystery

In a AzF worker, a system logging is in fact a log message written out by calling Console.WriteLine. AzF host redirects the worker process's STDOUT, and thus can intercept the message and make it a system log.
I don't know why the system logging causing such a big impact to the invocation processing time when there are fast in-coming requests. Will discuss this with @pragnagopa offline.

The Throughput Test Results

I use LOCUST as the infrastructure for load testing locally, so it's guaranteed that only one PS worker is involved (it will auto-scale if testing against Azure).
Here is the benchmark I got for C# script function, JavaScript function and PowerShell (new design + no system logging) function:

Runtime (simulating 300 users)	Requests Per Second
C# (csx)	47.5 RPS
Node.js	44.6 RPS
PowerShell-pool-size-1	45.3 RPS
PowerShell-pool-size-5	46.2 RPS

As you can see, we are at the same level with C# script function and JavaScript function.
However, using a pool of 5 instances doesn't show a better concurrency performance comparing with a pool of 1 instance. Based on the data I collected from the measurements, I believe that's because the AzF Host (Function Runtime) becomes the bottleneck when the in-coming requests become very fast (> 70/s), and thus the PS worker hasn't got to the point where a pool with more instances benefits more from the extra PowerShellManager instances. Take a look at the AveFetchFromPoolTime column of the last two tables below (pool-5 vs. pool-1), the average time for fetching an idle instance from the pool is 100x faster for the pool-5 case. However, even though 100x slower, the pool-1 case can still get an idle instance in about 0.28 ms on average, and that's why we don't see a difference in RPS.

Again, I will have an in-depth discussion with @pragnagopa offline about the bottleneck in the AzF host.

Original Design

InvRequest,   InvDone,   AveInvTime,   AveFetchFromPoolTime
5900,         5788,      94,           884
6000,         5886,      95,           919
6100,         5980,      96,           955
6200,         6080,      97,           994

New Design with 2 System Logging (pool size - 5)

Request,   InvRequest,   InvDone,   AveGetReqTime,     AveInvTime,        AveFetchFromPoolTime,   AveReturnToPoolTime,   AveWriteResponseTime
6008,      6000,         5994,      28.7243675099867,  93.2944611277945,  17.835,                 0.000166833500166834,  0.0191858525191859
6108,      6100,         6095,      28.2540929927963,  94.2059064807219,  18.02868852,            0.000164068908941756,  0.0188679245283019
6208,      6200,         6194,      27.7989690721649,  94.585728123991,   18.11709678,            0.000161446561188247,  0.0185663545366484

New Design without 2 System Logging (pool size - 5)

Request,   InvRequest,   InvDone,   AveGetReqTime,     AveInvTime,        AveFetchFromPoolTime,   AveReturnToPoolTime,   AveWriteResponseTime
7008,      7000,         6999,      35.5720605022831,  2.67166738105444,  0.00557142857142857,    0.000142877553936277,  0.00257179597085298
7108,      7100,         7099,      35.3432751828925,  2.67333427243274,  0.00549295774647887,    0.000140864910550782,  0.00253556838991407
7208,      7200,         7199,      35.1571864594895,  2.67356577302403,  0.00541666666666667,    0.000138908181691902,  0.00250034727045423

New Design without 2 System Logging (pool size - 1)

Request,   InvRequest,   InvDone,   AveGetReqTime,     AveInvTime,        AveFetchFromPoolTime,   AveReturnToPoolTime,   AveWriteResponseTime
7608,      7600,         7599,      35.6027865404837,  2.59323595209896,  0.279473684210526,      0.00013159626266614,    0.00157915515199368
7708,      7700,         7699,      35.4072392319668,  2.59527211326146,  0.280519480519481,      0.000129886998311469,  0.00155864397973763
7808,      7800,         7799,      35.2155481557377,  2.59648672906783,  0.281666666666667,      0.000128221566867547,  0.00153865880241057

…rivial to test

…g output

src/RequestProcessor.cs

Co-Authored-By: daxian-dbw <[email protected]>

…ule path env variable

src/Logging/RpcLogger.cs

src/PowerShell/PowerShellManagerPool.cs

src/RequestProcessor.cs

TylerLeonhardt

Looks good - just a couple questions

Co-Authored-By: daxian-dbw <[email protected]>

TylerLeonhardt

🚢it

daxian-dbw · 2019-01-07T19:57:13Z

Thanks for the review, the in-proc concurrency support is now merged.

daxian-dbw added 8 commits January 2, 2019 13:27

Support in-proc concurrency in the PS worker

25f5cf2

Update 'PowerShellManagerPool' to make it non-blocking

d6395e6

Update the pool to have more accurate semaphore tracking

4dd7178

Set the default upper bound of the pool to 1 for now

8325cab

Remove the test for 'ProcessWorkerInitRequest' as the method is too t…

f268e9c

…rivial to test

Minimize the number of created tasks

56d2b32

Clean up the code

2c0b093

Use a queue to hold all response/log message to guarantee order of lo…

e0a7f98

…g output

daxian-dbw requested review from TylerLeonhardt and SteveL-MSFT January 5, 2019 04:16

SteveL-MSFT requested changes Jan 5, 2019

View reviewed changes

src/RequestProcessor.cs Outdated Show resolved Hide resolved

Fix an exception message

193ebf8

Co-Authored-By: daxian-dbw <[email protected]>

SteveL-MSFT approved these changes Jan 5, 2019

View reviewed changes

Run xunit tests in sequential order to avoid conflicts on testing mod…

a5b07fd

…ule path env variable