-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the concurrent map access crashing problem #162
Conversation
Codecov Report
@@ Coverage Diff @@
## main #162 +/- ##
==========================================
+ Coverage 92.17% 92.21% +0.03%
==========================================
Files 46 45 -1
Lines 3323 3339 +16
==========================================
+ Hits 3063 3079 +16
Misses 191 191
Partials 69 69
Continue to review full report at Codecov.
|
removed the unnecessary lock, and comment on the functions which needn't the lock. |
thanks, and I feel there still might be a problem because the result data is still updated by every prober... |
New Solution
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After carefully checking the code, I found the SLA Report also has the same problem, but it has a super low possibility. I will do the refactoring for the SLA report after #158 is merged, otherwise, there will be many conflicts. So, this PR is held at this moment. |
Note: push force because of syncing up with the main branch. |
Updated Solution
|
LGTM, the deepcopy is a great way to fix race condition |
/LGTM |
I've tested this PR for 20 hours as below
Everything's fine so far. So, I am going to merge this PR. |
Is this problem resolved? I got the same error yesterday, the stake trace of the crash as below:
|
can I have a full stack? |
full stack is here easeprobe.txt |
@samanhappy thanks to report this issue. I can see the following stack
but the current source code - probe/data.go:102 is not the so, it looks like you use the old source code, could you please use the latest code? |
got it, thanks, I was using docker, the image is not as new as github |
Oh, this patch hasn't been released officially. You can build the docker image by yourself
|
Background
We have a map to collect all of the probers' probe results, all of the probers would set its result into this map concurrently in a different key(this is not a problem), however, we have another go routine that persistent this map periodically, this causes the "concurrent map iteration and map write" problem and it makes EaseProbe crash.
the stake trace of the crash as below:
the crash code line as blow:
Investigation
The problem here is the prober's result (statistics data) would be consumed by the following modules:
All of the above modules only read the statistics data, the probers would update the statistics data.
So, this is the read-write concurrent problem.
Solution
We transfer all of the probers statistics data into the Save Manager (via a channel), and all of the consumers(Saving, SLA Report, Web Server) retrieve the statistics data from Save Manager instead of the probers directly.