Skip to content

Conversation

@Fznamznon
Copy link
Contributor

Signed-off-by: Mariya Podchishchaeva [email protected]

@AGindinson AGindinson requested a review from kbobrovs September 12, 2019 12:37
Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great documentation!
But could you put an SVG version of your nice picture instead?

@Fznamznon
Copy link
Contributor Author

Great documentation!
But could you put an SVG version of your nice picture instead?

Yes, if you tell me how I can do this :)

BTW, What do you think about this feature and it's design? It's not implemented yet and we are finding the better ways how to implement it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitter pass is supposed to undo what llvm-link did at previous step.
This seems like an opportunity for optimization.

Did you consider using llvm-link to build the table of dependencies and importing them w/o "linking all LLVM modules using llvm-link"?
I'm not an expert here, but these llvm-link options seem relevant:

  --import=<function:filename>             - Pair of function name and filename, where function should be imported from bitcode in filename
  --import-all-index                       - Import all external functions in index.
  --only-needed                            - Link only needed symbols
  --summary-file=<string>                  - The summary file to use for function importing.
  --summary-index=<filename>               - Module summary index filename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, linking all device modules then split them back to separate modules sounds strange, but we need it support SYCL_EXTERNAL.

Approach with using llvm-link to build the table of dependencies and importing them w/o doesn't sound flexible for extending kernels grouping scheme. For example we will still need some splitting mechanism if we decide leave one kernel per module.

I took a brief look into these options, but It looks like all of them do not allow building the table of dependencies, only using them, except only-needed which just links only needed symbols to the destination module.
So, with using llvm-link options we still need figure out how to tell llvm-link which symbols we want to import for each module.
Or run llvm-link -only-needed for all combinations of device modules which doesn't seem effective. And we will still need generate info about symbols which each module contains to feed it to clang-offload-wrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach with using llvm-link to build the table of dependencies and importing them w/o doesn't sound flexible for extending kernels grouping scheme. For example we will still need some splitting mechanism if we decide leave one kernel per module.

If user cares about separating kernels into stand-alone modules, it can be done by following "one kernel per translation unit" code organization rule. No compiler changes are required for that.

So, with using llvm-link options we still need figure out how to tell llvm-link which symbols we want to import for each module.
Or run llvm-link -only-needed for all combinations of device modules which doesn't seem effective. And we will still need generate info about symbols which each module contains to feed it to clang-offload-wrapper.

We can link all module into one big module upfront and use llvm-link -only-needed to import all the necessary dependencies. Although this doesn't sound optimal neither it doesn't require new passes - just changes to the driver, which seems to be needed for original proposed too.

Copy link
Contributor Author

@Fznamznon Fznamznon Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with using llvm-link options we still need figure out how to tell llvm-link which symbols we want to import for each module.
Or run llvm-link -only-needed for all combinations of device modules which doesn't seem effective. And we will still need generate info about symbols which each module contains to feed it to clang-offload-wrapper.

We can link all module into one big module upfront and use llvm-link -only-needed to import all the necessary dependencies. Although this doesn't sound optimal neither it doesn't require new passes - just changes to the driver, which seems to be needed for original proposed too.

It's unclear how to import all the necessary dependencies from fully linked module. Could you please clarify?
I'm thinking about reusing llvm-extract tool or re-using GVExtractor pass, these solves problem of extraction of functions with dependencies from some single big module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using llvm-link to build the table of dependencies and importing them w/o "linking all LLVM modules using llvm-link"?

I think the potential a solution based on llvm-link building export/import tables and their analysis can be considered, but at this point it is too raw, and we don't even know will it work or not.
Doing full llvm-link is, in contrast, known to give desired result, and the design based on it is very simple. So I think we should start with it. Besides, llvm-link is very fast, so it is not quite clear what the potential optimizations will give.

Yes, linking all device modules then split them back to separate modules sounds strange

This is not quite "splitting back". This is rather re-grouping based on callgraph knowledge, which produces very different modules compared to what was the input. Reliable building the callgraph based on just import/export tables w/o linking would be very tricky, I suppose.

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great documentation!
But could you put an SVG version of your nice picture instead?

Yes, if you tell me how I can do this :)

I guess that if you are able to write a SYCL compiler by yourself, explaining you how to select the SVG format when you save your picture in your drawing software sounds like insulting you and I do not want to be rude... ;-)

BTW, What do you think about this feature and it's design? It's not implemented yet and we are finding the better ways how to implement it.

You are trying to solve a problem I have not thought about in this way, so I do not know.
Probably your method works for your problem.

My plan in triSYCL was to generate as many kernel IR as there are different targets by calling the device compiler with a specific preprocessor macro so the C++ code can be potentially different for each target to pick a different kernel implementation better fitted for each target. It solves the extension problem, but if you think to it you might need also a different host code to use or not these extensions, so it is not that simple.

This is why I presented an extension related somehow to this problem during the last SYCL F2F with this example
https://github.com/triSYCL/triSYCL/blob/master/tests/scope/queue.cpp
which, besides introducing the concept of scope, replace actually the runtime polymorphic SYCL object by compile-time static objects (which are actually C++ concepts) that allow you to have different host+device code for each device.

But all this is outside of the scope of this PR which is more related with a JIT approach.
I am not very inspired here since I only target embedded devices with AoT compilation flow...

keryell
keryell previously approved these changes Sep 16, 2019
Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the SVG picture.
It will be compatible with the high-definition displays of the future. :-)

kbobrovs
kbobrovs previously approved these changes Sep 20, 2019
@Fznamznon Fznamznon dismissed stale reviews from kbobrovs and keryell via c2d686e September 25, 2019 09:37
@Fznamznon Fznamznon force-pushed the private/mpodchis/device-code-split-design branch from c06f874 to c2d686e Compare September 25, 2019 09:37
@Fznamznon
Copy link
Contributor Author

Rebased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this ID is used later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ID will be used to group kernels per translation unit.
But I think it can be better clarified in this document, I'll made a change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How kernels are going to be split with this approach? One kernel (+ its dependencies) per module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we will group kernels per translation unit basis and I'd like to implement possibility to emit one kernel with dependencies per module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sycl-split"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl-post-link.

Signed-off-by: Mariya Podchishchaeva <[email protected]>
Signed-off-by: Mariya Podchishchaeva <[email protected]>
Move picture to images folder
Apply comment

Signed-off-by: Mariya Podchishchaeva <[email protected]>
@Fznamznon Fznamznon force-pushed the private/mpodchis/device-code-split-design branch from c2d686e to 41091ba Compare November 30, 2019 10:58
@Fznamznon Fznamznon changed the title [SYCL][Doc] Add device code splitting feature design [SYCL][Doc] Add device code split feature design Nov 30, 2019
@romanovvlad romanovvlad merged commit 3d37c91 into intel:sycl Nov 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants