[python-client] Fixed crash when using Python logger #10981

lbenc135 · 2021-06-19T07:17:54Z

Motivation

In some cases, the Python client would crash when using the new logger option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always).

For example, when doing Django migrations which include sending a message:

...
[2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] 
...
[2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
Error in sys.excepthook:

Original exception was:
Failed to migrate dashboard! Return code was: -6

This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client.

Modifications

Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly).

Now it looks like this:

...
[2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] 
...
[2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
2021-06-19 06:45:16.200 INFO  [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed
2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled
2021-06-19 06:45:16.201 INFO  [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection
2021-06-19 06:45:16.201 INFO  [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }]
Successfully migrated dashboard

Verifying this change

Make sure that the change passes the CI checks.

It's very hard to write a test for this, or at least I have no idea how to do it.

Does this pull request potentially affect one of the following parts:

Doesn't affect anything.

Documentation

No documentation needed, it's a bugfix.

…gger

…lity

pulsar-client-cpp/python/src/config.cc

BewareMyPower · 2021-06-23T02:50:50Z

There're a chance of memory leak theoretically but it might never happen.

// copy constructor
LoggerWrapper(const LoggerWrapper& other) {
    /* ... */
    fallbackLogger = other.fallbackLogger;
}
// copy assignment operator
LoggerWrapper& operator=(const LoggerWrapper& other) { /* ... */ }

If you managed a raw pointer fallbackLogger, these methods may cause that two LoggerWrapper objects share the same fallbackLogger but no reference count is recorded. So the pointer (fallbackLogger) could be deleted twice for each destructor of LoggerWrapper.

There're two ways to fix it. One way is just to remove these two methods. Because currently no copy of Logger happens in Pulsar C++ library's implementation. These two methods could never be called so I said the memory leak might never happen before.

The other way is use a std::shared_ptr<Logger> instead the raw pointer Logger. std::shared_ptr maintains a reference count for the pointer and the destructor would be called only once when the reference count became zero.

Just change

    Logger* fallbackLogger;

to

    std::shared_ptr<Logger> fallbackLogger;

And change

fallbackLogger = factory->getLogger(filename);

to

fallbackLogger.reset(factory->getLogger(filename));

Finally remove delete fallbackLogger in the destructor.

### Motivation In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always). For example, when doing Django migrations which include sending a message: ``` ... [2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] ... [2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK Error in sys.excepthook: Original exception was: Failed to migrate dashboard! Return code was: -6 ``` This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client. ### Modifications Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly). Now it looks like this: ``` ... [2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] ... [2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK 2021-06-19 06:45:16.200 INFO [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed 2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled 2021-06-19 06:45:16.201 INFO [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection 2021-06-19 06:45:16.201 INFO [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }] Successfully migrated dashboard ``` (cherry picked from commit fc8ce64)

### Motivation In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always). For example, when doing Django migrations which include sending a message: ``` ... [2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] ... [2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK Error in sys.excepthook: Original exception was: Failed to migrate dashboard! Return code was: -6 ``` This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client. ### Modifications Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly). Now it looks like this: ``` ... [2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650 [2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker [2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool [2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] ... [2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic ..... Applying dashboard.0001_initial... OK Applying templating.0001_initial... OK 2021-06-19 06:45:16.200 INFO [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed 2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled 2021-06-19 06:45:16.201 INFO [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection 2021-06-19 06:45:16.201 INFO [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }] Successfully migrated dashboard ```

livio added 2 commits June 19, 2021 08:48

fixed crash when Python logger is finalizing with fallback console lo…

d0e6b40

…gger

replace _Py_IsFinalizing with Py_IsInitialized for Python 2 compatibi…

1e89085

…lity

BewareMyPower assigned lbenc135 Jun 21, 2021

BewareMyPower added the component/python label Jun 21, 2021

BewareMyPower requested changes Jun 21, 2021

View reviewed changes

pulsar-client-cpp/python/src/config.cc Outdated Show resolved Hide resolved

BewareMyPower added the component/c++ label Jun 21, 2021

BewareMyPower requested review from aahmed-se, jiazhai, merlimat and sijie June 21, 2021 15:27

fix fallback logger memory leaks

8a172b2

BewareMyPower requested changes Jun 23, 2021

View reviewed changes

pulsar-client-cpp/python/src/config.cc Outdated Show resolved Hide resolved

remove unused copy methods on LoggerWrapper

7f1cec4

lbenc135 requested a review from BewareMyPower June 23, 2021 07:15

BewareMyPower approved these changes Jun 23, 2021

View reviewed changes

sijie approved these changes Jul 6, 2021

View reviewed changes

sijie added this to the 2.9.0 milestone Jul 14, 2021

sijie merged commit fc8ce64 into apache:master Jul 14, 2021

lbenc135 deleted the fix/python_logger branch July 14, 2021 13:56

BewareMyPower mentioned this pull request Aug 28, 2021

Using the Python pulsar client with a logger can cause arbitrary/unrelated async Python functions to return None #11823

Closed

BewareMyPower added the release/2.8.2 label Aug 28, 2021

codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Sep 2, 2021

hangc0276 added release/2.8.1 and removed release/2.8.2 labels Sep 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-client] Fixed crash when using Python logger #10981

[python-client] Fixed crash when using Python logger #10981

lbenc135 commented Jun 19, 2021

BewareMyPower commented Jun 23, 2021

[python-client] Fixed crash when using Python logger #10981

[python-client] Fixed crash when using Python logger #10981

Conversation

lbenc135 commented Jun 19, 2021

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

BewareMyPower commented Jun 23, 2021