-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further optimisations for gpu_hist. #4283
Conversation
- Fuse final update position functions into a single more efficient kernel - Refactor gpu_hist with a more explicit ellpack matrix representation
A quick review of my optimisation work over the last month. My improvements have been:
On a 10M*100 dense input matrix, boosting for 500 iterations, the performance improvement is approximately: In particular multi-GPU scalability seems to have improved considerably. |
@@ -485,10 +485,10 @@ class LearnerImpl : public Learner { | |||
this->PerformTreeMethodHeuristic(train); | |||
|
|||
monitor_.Start("PredictRaw"); | |||
this->PredictRaw(train, &preds_); | |||
this->PredictRaw(train, &preds_[train]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that DMatrix pointers are increasingly used as cache indices, the parameter should probably be changed from DMatrix*
to shared_ptr<DMatrix>
in all those places. We can then use weak_ptr<DMatrix>
as the index into the cache.
This can be done in another pull request, however.
src/tree/updater_gpu_hist.cu
Outdated
GradientSumT* d_node_hist, | ||
const GradientPair* d_gpair, | ||
size_t segment_begin, size_t n_elements) { | ||
extern __shared__ char smem[]; | ||
GradientSumT* smem_arr = reinterpret_cast<GradientSumT*>(smem); // NOLINT | ||
for (auto i : dh::BlockStrideRange(0, null_gidx_value)) { | ||
for (auto i : dh::BlockStrideRange(0, matrix.null_gidx_value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a function like matrix.BinCount()
, just to make the code more readable? null_gidx_value
can then be used in cases where it means 'no value'
51269ae
to
8751221
Compare
Fuse final update position functions into a single more efficient kernel
Refactor gpu_hist with a more explicit ellpack matrix representation