-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-client] Fixed crash when using Python logger #10981
Conversation
There're a chance of memory leak theoretically but it might never happen. // copy constructor
LoggerWrapper(const LoggerWrapper& other) {
/* ... */
fallbackLogger = other.fallbackLogger;
}
// copy assignment operator
LoggerWrapper& operator=(const LoggerWrapper& other) { /* ... */ } If you managed a raw pointer There're two ways to fix it. One way is just to remove these two methods. Because currently no copy of The other way is use a Just change Logger* fallbackLogger; to std::shared_ptr<Logger> fallbackLogger; And change fallbackLogger = factory->getLogger(filename); to fallbackLogger.reset(factory->getLogger(filename)); Finally remove |
### Motivation In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always). For example, when doing Django migrations which include sending a message: ``` ... [2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] ... [2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK Error in sys.excepthook: Original exception was: Failed to migrate dashboard! Return code was: -6 ``` This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client. ### Modifications Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly). Now it looks like this: ``` ... [2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] ... [2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK 2021-06-19 06:45:16.200 INFO [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed 2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled 2021-06-19 06:45:16.201 INFO [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection 2021-06-19 06:45:16.201 INFO [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }] Successfully migrated dashboard ``` (cherry picked from commit fc8ce64)
### Motivation In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always). For example, when doing Django migrations which include sending a message: ``` ... [2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] ... [2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK Error in sys.excepthook: Original exception was: Failed to migrate dashboard! Return code was: -6 ``` This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client. ### Modifications Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly). Now it looks like this: ``` ... [2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] ... [2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK 2021-06-19 06:45:16.200 INFO [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed 2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled 2021-06-19 06:45:16.201 INFO [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection 2021-06-19 06:45:16.201 INFO [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }] Successfully migrated dashboard ```
Motivation
In some cases, the Python client would crash when using the new
logger
option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always).For example, when doing Django migrations which include sending a message:
This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client.
Modifications
Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly).
Now it looks like this:
Verifying this change
It's very hard to write a test for this, or at least I have no idea how to do it.
Does this pull request potentially affect one of the following parts:
Doesn't affect anything.
Documentation
No documentation needed, it's a bugfix.