-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFE_ES_PerfLogAdd needs better mutual exclusion #456
Comments
Topic for CCB discussion: This is the tradeoff -- do we value the constant time, nonblocking behavior or do we value memory efficiency as higher value? And how does the CCB feel about dynamic allocation? Backround: Historically the If we simply change to using a conventional OSAL mutex - this means we no longer will have a deterministic performance log function, because if two threads call it at the same time, one will block. One option to preserve the non-blocking nature of the performance log is to give each task a dedicated block of entries in the log. Writes will be fast and efficient but major downside are that this is a much less efficient use of memory, as we either have to do some dynamic allocation based on how often the task uses this API, or we divide the log into fixed size segments which means tasks that don't even use the API will get memory and waste it. Or we can add a new API so that a task can control the size of the performance log at runtime. (this could leverage the ES mempool API, not necessarily "malloc"). Looking for input from CCB. |
Dynamic allocation after initialization is against GSFC coding standard. |
This would be ES memory pools, not malloc. Note that Software Bus and Table Services all currently allocate buffers from ES pools after initialization. That being said, preferably this would be during initialization, as part of task creation, not after it, otherwise you would lose the constant time behavior which is the whole point. During init, a task would have to indicate how large of a performance monitoring buffer it needs, and the memory for that performance log would be pulled out of a pool at that time. The biggest issue with that approach is of course that all existing tasks would have to be updated and/or be issued a minimal buffer by default. |
Alternatively, instead of using an ES memory pool, the app could theoretically declare its own storage buffer (however big it wants) as a global and then register it with ES as part of start up. That might be a viable alternative? |
Copy, memory pool is fine. App declaring it's own storage seems fine also, just need a way to query for reporting. Or is there a way to handle allocating and reporting atomically to avoid the race conditions? Atomically tag an entry with the task ID (or whatever), tasks should skip over tagged entries and to avoid race when reporting set a field to skip processing from task calls (assumes task processing is higher priority than reporting)? Or similar? |
Nevermind, register makes sense. |
I like your local storage idea the more I think about it. |
Any single-buffer (shared memory) architecture would necessitate a read-compare-write operation somewhere, which needs to be atomic for thread safety, so this would necessitate either a mutex or using an atomic compare-and-swap routine which is not standardized until C11.
The biggest issue with local storage is that it is a breaking change without much of a backward compatibility option. Currently apps just register themselves (via With a mempool based approach, apps could at least be issued a minimal performance buffer to (somewhat?) preserve the exiting performance log behavior if they don't make any changes. With a local allocation, This would require an extra registration step for each of the perf IDs that an app plans on using, which could be considered a good thing. If the app does not register the perf ID then logs can be silently dropped because there is nowhere to store it. Maybe that's an OK tradeoff? In the end it just means that Perf Log operations become no-ops until the app is updated to register its own perf log buffer. |
Can we make this - silently dropped unless debug print is enabled? |
Getting further into this code, and noted that the performance log data is actually stored within the "PSP Reset Area" which is really just a block of memory provided by the PSP which is (theoretically) supposed to be maintained across a processor reset. Obviously, this won't be the case if each app provides its own memory. Going back to CFE requirements, this log seems to be covered by cES1021 and cES1022, neither of which indicate any requirement for the performance analyzer log to be preserved across a processor reset. Should there be any sort of requirement for preservation of this log in the context of a processor reset? (i.e. do we need to add a requirement) -OR- Can we move the structure to regular RAM and just accept that it will be purged with a processor reset. ALSO NOTE - I don't see any guarantee that the underlying system tick counter provided by the PSP will be consistent across a reset. In fact most likely it is not. Therefore perf samples acquired prior to a reset will not be comparable to samples acquired after a reset, even if the log itself is preserved. IMO this substantially reduces the value of trying to save the log, if it isn't really usable data after the reset. |
Looking for input from @jwilmot and @acudmore - can you provide insight on whether it is important to store the performance log in the reset area memory provided by the PSP? Or can we move it to normal RAM? The data in the log will still be comparable to other samples in the log before the reset, so it does have some value in determining what happened before the processor reset occurred, even if new data that goes into the log is not comparable. We are also changing the exception handling, so the other possibility is that the PSP simply takes a snapshot of the log when an exception occurs. That would still preserve it, and would be better as it keeps it isolated from new data coming in. Please advise on what you think is the best way forward! |
My two cents - remove this undocumented functionality (no requirement to store performance across resets). |
That approach would certainly be simpler, but I do see value in at least having an interface where the PSP could capture and store the data as part of its exception handling. It could be valuable info when debugging why an unexpected exception/reset occurred. I assume that's the reason it was put into the reset area in the first place, although I'm not convinced it is the best way to accomplish that goal (assuming that was the goal). |
I don't think it's the best way, and doesn't justify the extra complexity. Trace/context, sure... performance? Questionable. |
Performance data is also context data, which is why I think it could be very relevant when debugging a failure. The data will show (for instance) if two tasks attempted to run at the same time and/or if one preempted the other prior to the exception/reset occurring. This is why I think it could be very useful and beneficial to at least have a method to capture the data as part of exception handling. |
Update - While I was initially gravitating toward giving each task its own dedicated lockless memory block for logging, this creates some challenges in implementing the "trigger" semantics, which need to operate at a global scope across all tasks. Although I think it is still possible to do this by implementing a background task to merge all the realtime task data and handle the trigger logic, it will considerably more complex than just using a mutex in the perf data log and implementing the logic in-line as it is today, so I'm not convinced it meets the cost-benefit tradeoff. So - my new thinking is to lean toward simplicity and use a mutex. The big downside is that when using a mutex, the Still looking for feedback as to whether the user community would consider this an issue. |
I prefer the mutex. We can document as part of the limitations, historically I've found this good enough for general software timing in a generic sense for the majority of cases. There's other mechanisms projects can use if they need more accurate numbers that are hardware specific (twiddle a register to control an IO line for a logic analyzer) if what we provide isn't sufficient. I don't think we need to implement anything heroic. edit - bad english. |
OK - will proceed with the mutex-based approach, which is a fairly straightforward change. |
I agree that the mutex based approach works. As for preserving the log. several GSFC missions have not used the performance log in flight, only for performance analysis during development. Is it worth a poll to ask the community? |
Yes, a spinlock is better for situations like this, where multiple tasks may try to access a shared resource but only for short periods of time (i.e. atomic read-modify-write ops) and simply re-trying the operation a second time is likely to avoid the conflict. Someday if/when OSAL gets a spinlock API, that may be a good alternative. Just have to make sure whatever spinlock is used falls back to sane behavior if/when running on a single core processor or the CFE is locked to a single core. |
Use a normal mutex for access control to the perf log structure Perform file write in background using a dedicated background task
@jphickey do you think this fix will be ready soon? I'm debating whether to wait for it or go ahead with the integration merge without nasa/osal#396 |
Just got some initial buy-in on the WIP branches and I need to complete the unit test updates to match the proposed implementation. It should be ready by the next CCB (4/15). |
My recommendation would be to push the current integration candidate and accept that it does not build with omit deprecated. This is why other CI systems let one "quarantine" certain failures that we know are temporarily broken but we still want to watch for additional failures. We can't do that, so it is what it is. |
Optionally wrap the interrupt lock calls in the deprecated define (or just remove) to fix in the short term... it's what I did to test, and the performance log has race issues either way until fully fixed w/ this fix. |
Fix #456, Perf log threading and concurrency issues
Describe the bug
The
CFE_ES_PerfLogAdd()
routine usesOS_IntLock()
in an attempt to get exclusive access to a common global data structure to record performance metrics/state. This is insufficient.The
OS_IntLock
function is generally not implemented on SMP, and even if it is, it probably only affects the current core. Either way, it will not provide exclusivity, because the other cores can still access the data even when interrupts are disabled.This function is also a no-op in the POSIX OSAL.
To Reproduce
Enable performance monitoring on a POSIX system and observe that occasionally samples occur in the log out of order or otherwise appear corrupted. This is likely due to concurrent writes to the same entry related to insufficient locking.
Expected behavior
The function should use some form of primitive that actually does provide exclusivity between threads (such as a mutex/spinlock) and not an interrupt lock.
Code snips
cFE/fsw/cfe-core/src/es/cfe_es_perf.c
Lines 439 to 440 in 2b27dfc
System observed on:
Ubuntu 18.04 LTS 64-bit
Reporter Info
Joseph Hickey, Vantage Systems, Inc.
The text was updated successfully, but these errors were encountered: