Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Dumping IR after every optimization pass #528

Merged
merged 1 commit into from
Aug 10, 2023
Merged

Conversation

akroviakov
Copy link
Contributor

This PR introduces a new config flag dump-after-all, which when set to 1 will dump module-level IR before and after optimizations (IR_UNOPT/IR_OPT). When set to 2 will dump IR before optimizations, after each transformation pass (IR_AFTER_#passCounter_#passName) and after all transformations.

The intent behind this feature was to make the debugging of problematic queries easier.

With the introduction of the new pass builder and pass managers (e.g., FunctionPassManager), the transformation passes can now run more often, but on a smaller granularity scale (e.g., earlier it was one pass for all functions, now it is one pass per function). This way we cannot "summarize" passes (e.g., after GVN, after SROAP), since now the entire functions transformation pipeline runs for each function fully, so expect many (really many) files for even seemingly simple queries when the severity is set to 2.

The example usage is in IntelGPUEnablingTest.
You can call it like e.g., ./IntelGPUEnablingTest --gtest_filter=JoinTest.SimpleJoin --dump-after-all=1

@akroviakov akroviakov requested a review from kurapov-peter June 14, 2023 17:17
Copy link
Contributor

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice in general to have the ability to dump LLVM IR. The existing IR channel wouldn't be very convenient if we dumped IR after each pass into the same file. On the other hand, it has a nice feature of creating a new file on each test run and it has a static counter so that I can easily find a proper module in logs.
How would these new dumps work in this situation? A simple scenario I'd like to have is to run the same test with different flags and then compare generated IRs.

@@ -83,6 +83,7 @@ struct CodegenConfig {
bool null_mod_by_zero = false;
bool hoist_literals = true;
bool enable_filter_function = true;
short dump_after_all_severity{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to move it to debug config structure. Also, rename it to dump_llvm_ir_after_each_pass or similar, simple dump is too generic since we have different IRs on various levels.

llvm::Any IR,
const llvm::PreservedAnalyses&) -> void {
std::error_code ec;
llvm::raw_fd_ostream os("IR_AFTER_" + std::to_string(pass_counter++) + "_" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it rewrite or append?

@@ -1001,6 +1001,13 @@ int main(int argc, char* argv[]) {
"Dump IR and PTX for all executed queries to file."
" Currently only supports single node tests.");

desc.add_options()("dump-after-all",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have it in ConfigBuilder so that it would be more widely available? It was supposed to replace all these options in each separate test binary.

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, two minor comments

opt_desc.add_options()("dump-after-all",
po::value<short>(&config_->debug.dump_llvm_ir_after_each_pass)
->default_value(config_->debug.dump_llvm_ir_after_each_pass)
->implicit_value(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the implicit value to be a non-zero value because if you specify the option it means you want the dumps to be generated.

size_t pass_counter{1};
std::string ir_dump_dir{};
if (co.dump_llvm_ir_after_each_pass) {
work_unit_meta.w_unit_counter++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going to happen in the heterogeneous mode when you potentially have two compilations running simultaneously? Should the work_unit_meta live in the work unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the timestamp can be saved in the module for debug purposes and then you can ask for it in optimize_ir(). But there are lots of ways to get into optimize_ir(), so where do you embed the timestamp into the module?
E.g., you can embed the timestamp in Executor::compileWorkUnit(), but CPU codegen path might go through Executor::reduceMultiDeviceResultSets() -> ResultSetReductionJIT::codegen() -> ..., omitting Executor::compileWorkUnit(), so how many other places should have code that embeds the timestamp? Right now it seems to me that this is the least amount of code that does the job, or am I missing something?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants