-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Move event emitting off the main thread to avoid deadlocks #1314
fix: Move event emitting off the main thread to avoid deadlocks #1314
Conversation
04e35fa
to
20ba91c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, but I have a few suggestions
@chrfwow has been contributing significantly to our Java implementations: - finally hook change in SDK: open-feature/java-sdk#1246 - flag metadata support in the flagd java provider: open-feature/java-sdk-contrib#1122 And most recently identified a deadlock in the Java SDK, opened an issue describing it, and has been helping to guide the fix by a new contributor: open-feature/java-sdk#1314 I'm nominating them for Java approver. @chrfwow please approve or 👍 this PR for signal your interest. Signed-off-by: Todd Baert <[email protected]>
b9f3afe
to
8c36616
Compare
This all makes sense to me. I see 2 test failures which I'm not really surprised about. It's possible that there's actually bugs here, but more likely it's inter-test interference or test-only race conditions revealed by this change. You can try running the failing tests individually and seeing if they work as expected. |
public abstract class EventProvider implements FeatureProvider { | ||
private EventProviderListener eventProviderListener; | ||
private final ExecutorService emitterExecutor = Executors.newCachedThreadPool(runnable -> { | ||
final Thread thread = new Thread(runnable); | ||
thread.setDaemon(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daemon can have it stopped before emitting the remaining events and lose data.
The shutdown() method which is later called can gracefully shut it down with a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes look good, just make sure to fix the failing tests then this PR can be merged I believe 🚀
I am looking at the test failures locally but the seem to be passing, when run on their own, I continue to check what makes the CI run fail, I think there is some interaction going on. Just wanted to quickly update that I'm looking at this 👍 |
26df9e7
to
2b953e0
Compare
When stacking event emitting inside an EventProvider, when using sychronization the EventProvider can deadlock, to avoid this move the event emitting of the main thread. Signed-off-by: Philipp Fehre <[email protected]>
Test provider should respect the init-delay during all test Signed-off-by: Philipp Fehre <[email protected]>
With the events being executed on a different thread, we need to wait to make sure the thread is scheduled to have the events emitted. Signed-off-by: Philipp Fehre <[email protected]>
5c57681
to
121c99a
Compare
@@ -265,6 +265,8 @@ | |||
<artifactId>maven-surefire-plugin</artifactId> | |||
<version>3.5.2</version> | |||
<configuration> | |||
<forkCount>1</forkCount> | |||
<reuseForks>false</reuseForks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is obviously not a solution, but I want to know what the reason for the test failures are, are they actual issues ore is it cross test interactions? It seems we are suffering from global setup issues due to the Singleton that is OpenFeatureApi. @toddbaert any suggestion how to get around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests are failing? Today we merged something which increases the stability of the spectests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a test annotation to run some tests in isolation, which may be the best solution here. If you cannot narrow down the exact suite I'd support you adding forkCount 1 for now, and I can look into it later (or somebody else).
If you want to keep digging at it yourself, you're more than welcome, but it's not 100% related to your improvement, so I don't want to force you to do it if you are sure it's a bad test interaction.
Up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ Execution(SAME_THREAD) annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks! Yes it is always the same tests so I'm gonna annotate them and remove this global one. And then this should be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I tried this out locally and got some fixes but still have some failing tests, with annotations it is not possible to get the same isolation as with no reuse for forks. I am gonna spent more time tomorrow on this and if I can't resolve it I would suggest to move this to a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of merging as is and following up with improvements in a separate PR. @toddbaert @liran2000 any concerns about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections.
Signed-off-by: Philipp Fehre <[email protected]>
121c99a
to
eb10367
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1314 +/- ##
============================================
- Coverage 93.51% 93.22% -0.30%
- Complexity 463 467 +4
============================================
Files 43 43
Lines 1110 1121 +11
Branches 89 90 +1
============================================
+ Hits 1038 1045 +7
- Misses 44 48 +4
Partials 28 28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines not covered by tests are fine for me
|
@sideshowcoder thanks for all your efforts on this! Well done. It will be available in our next release. Consider joining the org: open-feature/community#427 |
All code based on work by @chrfwow, based on #1299.
My current understanding is that when stacking event emitting inside an EventProvider, when using sychronization the EventProvider can deadlock, to avoid this move the event emitting of the main thread. Similar to how this is handled in EventSupport.
Still requires cleanup and moving the tests to a better location, I'm still not sure I 100% understand the reason for the deadlock, but wanted to get the work started by porting the testcase provided by @chrfwow.