-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
3e72fd7
to
ad43824
Compare
omniscidb/QueryEngine/Execute.cpp
Outdated
@@ -2790,66 +2790,109 @@ std::vector<std::unique_ptr<ExecutionKernel>> Executor::createHeterogeneousKerne | |||
|
|||
CHECK(!ra_exe_unit.input_descs.empty()); | |||
|
|||
const bool use_multifrag_kernel = eo.allow_multifrag && is_agg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads as group-bys would not use a multi-fragment policy. Needs cleanup I guess.
omniscidb/QueryEngine/Execute.cpp
Outdated
<< " for kernel per fragment execution path."; | ||
throw CompilationRetryNewScanLimit(max_frag_size); | ||
if (use_multifrag_kernel) { | ||
LOG(INFO) << "use_multifrag_kernel=" << use_multifrag_kernel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is debug information. Same for others.
@@ -100,6 +100,56 @@ class QueryFragmentDescriptor { | |||
} | |||
} | |||
|
|||
template <typename DISPATCH_FCN> | |||
void assignFragsToMultiHeterogeneousDispatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a separate method is a bit ugly (that was the reason we didn't merge it long time ago), but I think we can live with that for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispatching becomes hard to follow, I think it needs more abstractions/simplifications.
omniscidb/QueryEngine/Execute.cpp
Outdated
const auto device_count = deviceCount(device_type); | ||
const int device_count = config_->exec.heterogeneous.enable_heterogeneous_execution | ||
? available_cpus + available_gpus.size() | ||
: deviceCount(query_mem_descs.begin()->first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks error-prone. Maybe a least add a check for the case when we expect only one query memory descriptor to be present in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device_count
differed before also only by this flag, but I can add a check :). It was also passed as a parameter with no real goal. Do we even need it except for >0
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove it if you make sure a device is always available.
omniscidb/QueryEngine/Execute.cpp
Outdated
// execution_kernels_per_device_. | ||
const ExecutorDeviceType intended_dt = intended_dt_itr.first; | ||
const ExecutorDeviceType actual_dt = intended_dt_itr.second->getDeviceType(); | ||
LOG(INFO) << "Query was inteded for " << intended_dt << ", will actually run on " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the situations in which we would want to reschedule some (not all) of the kernels to the CPU? In other words, do we need such logic at all? Maybe have a fallback policy instead?
omniscidb/QueryEngine/Execute.cpp
Outdated
<< actual_dt; | ||
|
||
if (actual_dt == ExecutorDeviceType::GPU && eo.allow_multifrag && | ||
(!uses_lazy_fetch || is_agg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this condition come from? And what do these parameters have to do with multi-fragment dispatching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were there previously. And when they are ignored we have validation issues on some tests, so I guess they are there for a reason.
omniscidb/QueryEngine/Execute.cpp
Outdated
if (actual_dt == ExecutorDeviceType::GPU && eo.allow_multifrag && | ||
(!uses_lazy_fetch || is_agg)) { | ||
policy->devices_dispatch_modes.at(intended_dt) = | ||
ExecutorDispatchMode::MultifragmentKernel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not modify the policy during the scheduling. The policy is decided by the cost model is then used by the scheduling algorithm to distribute work.
bool dispatch_finished = false; | ||
while (!dispatch_finished) { | ||
dispatch_finished = true; | ||
for (const auto& device_type_itr : execution_kernels_per_device_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this inner loop when you have the very same outer one?
for (const auto& device_type_itr : execution_kernels_per_device_) | ||
for (const auto& device_itr : device_type_itr.second) { | ||
auto& kernel_idx = | ||
execution_kernel_index[device_type_itr.first][device_itr.first]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why populate the execution_kernel_index
at all? Can't you just merge the two loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are inside the while (!dispatch_finished){}
loop, so we may actually revisit for(const auto& device_itr : device_type_itr.second){}
, but to not schedule already scheduled kernels, we keep track of the last scheduled kernel idx to continue from that index on
b5e1067
to
4980599
Compare
omniscidb/QueryEngine/Execute.cpp
Outdated
query_mem_descs_owned.insert(std::make_pair(dt, std::move(query_mem_desc_owned))); | ||
const ExecutorDeviceType compiled_for_dt{query_comp_desc_owned->getDeviceType()}; | ||
if (!query_comp_descs_owned.count( | ||
compiled_for_dt)) { // Can a fallback to CPU generate different kernels? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the latest compilation (after the fallback)
4980599
to
1784a5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK, some minor comments.
@@ -19,7 +19,9 @@ | |||
namespace policy { | |||
|
|||
ProportionBasedExecutionPolicy::ProportionBasedExecutionPolicy( | |||
std::map<ExecutorDeviceType, unsigned>&& propotion) { | |||
std::map<ExecutorDeviceType, unsigned>&& propotion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
omniscidb/QueryEngine/Execute.cpp
Outdated
|
||
auto device_types_for_query = getDeviceTypesForQuery( | ||
ra_exe_unit, query_infos, co.device_type, max_groups_buffer_entry_guess, eo); | ||
CHECK(device_types_for_query.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK(device_types_for_query.size()); | |
CHECK_GT(device_types_for_query.size(), size_t(0)); |
const auto exe_policy = | ||
getExecutionPolicy(is_agg, query_mem_descs_owned, ra_exe_unit, query_infos, eo); | ||
const ExecutorDeviceType fallback_device{ | ||
exe_policy->hasDevice(co.device_type) ? co.device_type : ExecutorDeviceType::CPU}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be always CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now main has it this way: can return requested_device_type (i.e., co.device_type) as fallback_device. Is current main wrong?
omniscidb/QueryEngine/Execute.cpp
Outdated
devices_count += get_available_gpus(data_mgr_).size(); | ||
} | ||
} | ||
CHECK(devices_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK(devices_count); | |
CHECK_GT(devices_count, size_t(0)); |
// modes. | ||
const std::map<ExecutorDeviceType, ExecutorDispatchMode> devices_dispatch_modes_{ | ||
{ExecutorDeviceType::CPU, ExecutorDispatchMode::KernelPerFragment}, | ||
{ExecutorDeviceType::GPU, ExecutorDispatchMode::KernelPerFragment}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are set in constructors it seems there is no reason to have the default values here (unless we use default constructors somewhere which is likely incorrect). The comment together with const
is confusing.
omniscidb/QueryEngine/Execute.cpp
Outdated
for (const auto& dt_query_desc : query_mem_descs) { | ||
if (policy->getExecutionMode(dt_query_desc.first) == | ||
ExecutorDispatchMode::KernelPerFragment) { | ||
VLOG(1) << "Creating one execution kernel per fragment"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the log is out of place. Mb move it to where kernels are actually created.
1784a5d
to
89478b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This patch integrates the device multifragment policy (currently disabled) in its proper form for heterogeneous kernels, that is (depending on query):