Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Use omp threads for cpu data loader #15379

Merged
merged 3 commits into from
Jul 2, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions src/io/iter_image_recordio_2.cc
Original file line number Diff line number Diff line change
Expand Up @@ -134,19 +134,25 @@ inline void ImageRecordIOParser2<DType>::Init(
record_param_.InitAllowUnknown(kwargs);
batch_param_.InitAllowUnknown(kwargs);
normalize_param_.InitAllowUnknown(kwargs);
PrefetcherParam prefetch_param;
prefetch_param.InitAllowUnknown(kwargs);
n_parsed_ = 0;
overflow = false;
rnd_.seed(kRandMagic + record_param_.seed);
int maxthread, threadget;
#pragma omp parallel
{
// be conservative, set number of real cores
maxthread = std::max(omp_get_num_procs(), 1);
}
param_.preprocess_threads = std::min(maxthread, param_.preprocess_threads);
#pragma omp parallel num_threads(param_.preprocess_threads)
{
threadget = omp_get_num_threads();
if (prefetch_param.ctx == PrefetcherParam::CtxType::kCPU) {
threadget = engine::OpenMP::Get()->GetRecommendedOMPThreadCount();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since prefetching is happening in parallel with op execution. won't this cause too many threads to be launched when we are also useing openmp threads to execute operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. For CPU optimized data loader, prefetching isn't happened.
Prefetching will flush cache, causing CPU op execution much slower. So for CPU optimized data loader, the loader itself works as a normal operator, which will be executed exclusively, utilizing all omp threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh okay, I didnt realize that the code paths are different for CPU and GPU iterators now. LGTM :)

} else {
#pragma omp parallel
{
// be conservative, set number of real cores
maxthread = std::max(omp_get_num_procs() / 2, 1);
}
param_.preprocess_threads = std::min(maxthread, param_.preprocess_threads);
#pragma omp parallel num_threads(param_.preprocess_threads)
{
threadget = omp_get_num_threads();
}
}
param_.preprocess_threads = threadget;

Expand Down Expand Up @@ -822,7 +828,8 @@ class ImageRecordIter2Wrapper : public IIterator<DataBatch> {
dtype = prefetch_param.dtype.value();
}
if (prefetch_param.ctx == PrefetcherParam::CtxType::kCPU) {
LOG(INFO) << "Create ImageRecordIter2 optimized for CPU backend.";
LOG(INFO) << "Create ImageRecordIter2 optimized for CPU backend."
<< "Use omp threads instead of preprocess_threads.";
switch (dtype) {
case mshadow::kFloat32:
record_iter_ = new ImageRecordIter2CPU<float>();
Expand Down