Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread-safety in C API's PredictSingleRow #3771

Merged
merged 1 commit into from
Jan 21, 2021

Conversation

AlbertoEAF
Copy link
Contributor

@AlbertoEAF AlbertoEAF commented Jan 16, 2021

The single row predictor methods in the C APi are not thread safe as reported in #3751 and #3675. This fixes both reports. Thank you https://github.com/tarunreddy1018 for the detailed reports!

By using a unique lock instead of the shared lock the timings are very similar, but predictions are correct.

Remarks for future work:

  • Even so, by designing a small C++ benchmark with a very simple LGBM model, more threads on a simple model are slower than the single-thread case. This is probably due to very small work units, the lock contention overhead increases.
  • We should in the future benchmark with more complex models to see if supporting
    threading on these calls is worth it in performance gains.
  • If not, then we could choose to not to provide thread-safety and remove the locks altogether for maximal throughput (though people that didn't read the documentation carefully might be surprised that such methods were not thread-safe).
    • We should in that case use non-shared resources so there is no synchronization required and multiple calls can be issued in parallel with no lock contention.

See #3751 for timings of the small benchmark and the benchmark gist https://gist.github.com/AlbertoEAF/5972db15a27c294bab65b97e1bc4c315

By using a unique lock instead of the shared lock the timings are very similar,
but predictions are correct.

Even so, by designing a small C++ benchmark with a very simple LGBM model,more threads on a simple model are slower than
the single-thread case. This is probably due to very small work units,
the lock contention overhead increases.

We should in the future benchmark with more complex models to see if supporting
threading on these calls is worth it in performance gains.

If not, then we could choose to not to provide thread-safety and remove
the locks altogether for maximal throughput.

See microsoft#3751 for timings.

See gist for benchmark code:

  https://gist.github.com/AlbertoEAF/5972db15a27c294bab65b97e1bc4c315
@AlbertoEAF AlbertoEAF changed the title Fix thread-safety in PredictSingleRow Fix thread-safety in C API's PredictSingleRow Jan 16, 2021
@StrikerRUS StrikerRUS added the fix label Jan 16, 2021
@AlbertoEAF
Copy link
Contributor Author

Hello, can anyone merge this fix?

@StrikerRUS StrikerRUS merged commit 4ae4abb into microsoft:master Jan 21, 2021
@guolinke
Copy link
Collaborator

Hi @AlbertoEAF
Is that possible to expose single row related APIs in python/R package?

@AlbertoEAF
Copy link
Contributor Author

Hello @guolinke :)
It's totally possible, their API are very similar to the non-single row methods so it shouldn't be difficult at all.

In python's case we could replace the input pandas dataframe by something lighter like a numpy array.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants