Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-client] Fixed crash when using Python logger #10981

Merged
merged 4 commits into from
Jul 14, 2021

Conversation

lbenc135
Copy link
Contributor

Motivation

In some cases, the Python client would crash when using the new logger option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always).

For example, when doing Django migrations which include sending a message:

...
[2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] 
...
[2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
Error in sys.excepthook:

Original exception was:
Failed to migrate dashboard! Return code was: -6

This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client.

Modifications

Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly).

Now it looks like this:

...
[2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] 
...
[2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
2021-06-19 06:45:16.200 INFO  [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed
2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled
2021-06-19 06:45:16.201 INFO  [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection
2021-06-19 06:45:16.201 INFO  [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }]
Successfully migrated dashboard

Verifying this change

  • Make sure that the change passes the CI checks.

It's very hard to write a test for this, or at least I have no idea how to do it.

Does this pull request potentially affect one of the following parts:

Doesn't affect anything.

Documentation

No documentation needed, it's a bugfix.

@BewareMyPower
Copy link
Contributor

There're a chance of memory leak theoretically but it might never happen.

// copy constructor
LoggerWrapper(const LoggerWrapper& other) {
    /* ... */
    fallbackLogger = other.fallbackLogger;
}
// copy assignment operator
LoggerWrapper& operator=(const LoggerWrapper& other) { /* ... */ }

If you managed a raw pointer fallbackLogger, these methods may cause that two LoggerWrapper objects share the same fallbackLogger but no reference count is recorded. So the pointer (fallbackLogger) could be deleted twice for each destructor of LoggerWrapper.

There're two ways to fix it. One way is just to remove these two methods. Because currently no copy of Logger happens in Pulsar C++ library's implementation. These two methods could never be called so I said the memory leak might never happen before.

The other way is use a std::shared_ptr<Logger> instead the raw pointer Logger. std::shared_ptr maintains a reference count for the pointer and the destructor would be called only once when the reference count became zero.

Just change

    Logger* fallbackLogger;

to

    std::shared_ptr<Logger> fallbackLogger;

And change

fallbackLogger = factory->getLogger(filename);

to

fallbackLogger.reset(factory->getLogger(filename));

Finally remove delete fallbackLogger in the destructor.

@lbenc135 lbenc135 requested a review from BewareMyPower June 23, 2021 07:15
@sijie sijie added this to the 2.9.0 milestone Jul 14, 2021
@sijie sijie merged commit fc8ce64 into apache:master Jul 14, 2021
@lbenc135 lbenc135 deleted the fix/python_logger branch July 14, 2021 13:56
codelipenghui pushed a commit that referenced this pull request Sep 2, 2021
### Motivation

In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always).

For example, when doing Django migrations which include sending a message:
```
...
[2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] 
...
[2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
Error in sys.excepthook:

Original exception was:
Failed to migrate dashboard! Return code was: -6
```

This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client.

### Modifications

Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly).

Now it looks like this:
```
...
[2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] 
...
[2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
2021-06-19 06:45:16.200 INFO  [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed
2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled
2021-06-19 06:45:16.201 INFO  [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection
2021-06-19 06:45:16.201 INFO  [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }]
Successfully migrated dashboard
```

(cherry picked from commit fc8ce64)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Sep 2, 2021
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
### Motivation

In some cases, the Python client would crash when using the new `logger` option. This happens when a Pulsar message is sent asynchronously, but soon after the program exits (and even then, not always).

For example, when doing Django migrations which include sending a message:
```
...
[2021-06-19 06:53:57.691] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:53:57.693] [INFO]: [127.0.0.1:36536 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:53:57.695] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:53:57.707] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:36536 -> 127.0.0.1:6650] 
...
[2021-06-19 06:53:57.728] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
Error in sys.excepthook:

Original exception was:
Failed to migrate dashboard! Return code was: -6
```

This happens because Pulsar tries to log messages after Python already started finalizing, so the client can't get a GIL lock, which crashes the whole client.

### Modifications

Following the instructions at https://docs.python.org/3/c-api/init.html#c.PyGILState_Ensure, I added a check for when Python is finalizing, and if it is, we fallback to the default console logger (the log level is still respected correctly).

Now it looks like this:
```
...
[2021-06-19 06:45:15.561] [INFO]: Created connection for pulsar://localhost:6650
[2021-06-19 06:45:15.563] [INFO]: [127.0.0.1:35930 -> 127.0.0.1:6650] Connected to broker
[2021-06-19 06:45:15.568] [INFO]: [persistent://public/default/dashboard-global_context-emit, ] Getting connection from pool
[2021-06-19 06:45:15.586] [INFO]: [persistent://public/default/zaba-dashboard-global_context-emit, ] Created producer on broker [127.0.0.1:35930 -> 127.0.0.1:6650] 
...
[2021-06-19 06:45:15.604] [DEBUG]: Sending message to topic .....
  Applying dashboard.0001_initial... OK
  Applying templating.0001_initial... OK
2021-06-19 06:45:16.200 INFO  [139853253269312] ClientConnection:1446 | [127.0.0.1:35930 -> 127.0.0.1:6650] Connection closed
2021-06-19 06:45:16.200 ERROR [139853099652672] ClientConnection:531 | [127.0.0.1:35930 -> 127.0.0.1:6650] Read failed: Operation canceled
2021-06-19 06:45:16.201 INFO  [139853253269312] ClientConnection:261 | [127.0.0.1:35930 -> 127.0.0.1:6650] Destroyed connection
2021-06-19 06:45:16.201 INFO  [139853253269312] ProducerImpl:561 | Producer - [persistent://public/default/dashboard-global_context-emit, standalone-0-120] , [batchMessageContainer = { BatchMessageContainer [size = 0] [bytes = 0] [maxSize = 1000] [maxBytes = 131072] [topicName = persistent://public/default/dashboard-global_context-emit] [numberOfBatchesSent_ = 1] [averageBatchSize_ = 1] }]
Successfully migrated dashboard
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life release/2.8.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants