-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix EventFileWriter deadlock on exception in background thread #6168
Conversation
@groszewn the issue with build seems to be unrelated and I fixed the linter warning. Could you please re-run the workflow? |
@groszewn sorry, now fixed the linter warning for real:
Could you please restart CI once again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @crassirostris, really appreciate the well-documented issue and PR! Left a few comments.
Signed-off-by: Mik Vyatskov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the thorough review, @groszewn! Fixed behavior for flush
, could you please take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the fix!
…rflow#6168) ## Motivation for features / changes To address tensorflow#6167 ## Technical description of changes This is a bug fix for possible deadlock when writing events through `EventFileWriter`. The PR adds logic in `_AsyncWriterThread` to catch exception to propagate it to the calling thread and adds logic to `_AsyncWriter` to propagate exception raised in `_AsyncWriterThread` ## Detailed steps to verify changes work correctly (as executed by you) New unit test that is not passing on master ## Alternate designs / implementations considered * Instead of popping an item from the queue on exception, it's possible to make `wait`/`flush` methods re-check the status periodically * Instead of raising an exception in the foreground thread, it's possible to ignore the raised exception altogether and just start dropping events * It's possible to drop the data after it cannot be added to the queue for a certain period of time Signed-off-by: Mik Vyatskov <[email protected]>
…rflow#6168) ## Motivation for features / changes To address tensorflow#6167 ## Technical description of changes This is a bug fix for possible deadlock when writing events through `EventFileWriter`. The PR adds logic in `_AsyncWriterThread` to catch exception to propagate it to the calling thread and adds logic to `_AsyncWriter` to propagate exception raised in `_AsyncWriterThread` ## Detailed steps to verify changes work correctly (as executed by you) New unit test that is not passing on master ## Alternate designs / implementations considered * Instead of popping an item from the queue on exception, it's possible to make `wait`/`flush` methods re-check the status periodically * Instead of raising an exception in the foreground thread, it's possible to ignore the raised exception altogether and just start dropping events * It's possible to drop the data after it cannot be added to the queue for a certain period of time Signed-off-by: Mik Vyatskov <[email protected]>
Motivation for features / changes
To address #6167
Technical description of changes
This is a bug fix for possible deadlock when writing events through
EventFileWriter
. The PR adds logic in_AsyncWriterThread
to catch exception to propagate it to the calling thread and adds logic to_AsyncWriter
to propagate exception raised in_AsyncWriterThread
Detailed steps to verify changes work correctly (as executed by you)
New unit test that is not passing on master
Alternate designs / implementations considered
wait
/flush
methods re-check the status periodically