Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build][Docs] Improve build documentation #459

Open
hjabird opened this issue Mar 13, 2024 · 14 comments
Open

[Build][Docs] Improve build documentation #459

hjabird opened this issue Mar 13, 2024 · 14 comments
Labels
documentation A request to change/fix/improve the documentation

Comments

@hjabird
Copy link
Contributor

hjabird commented Mar 13, 2024

Summary

oneMKL can be difficult to build and use. The reasons for this are:

  • The build documentation is fragmented and at times outdated
  • The compiler ecosystem and compatibility is a little confusing

Problem statement

The current build documentation has the following problems:

  • It suggests oneMKL supports Conan. Conan is no longer supported since Should Conan package manager support be dropped from this oneMKL Interfaces project? #267
    • Consequently, you have scroll some way to find the CMake instructions.
  • In most cases, building oneMKL backends is a case of setting -DENABLE_XYZ_BACKEND=ON. This isn't clearly communicated by the documentation, that makes building look far more complicated.
  • The docs suggests setting TARGET_DOMAINS even thought it is set automatically.
  • At times the documentation is worded in a way that suggests that backends cannot be enabled simultaneously, even thought I think they can.
  • In most cases, building a backend with hipSYCL (which is now AdaptiveCpp) is similar to building with clang++ or icpx, but we have duplication of information instead.

We also support a range of compilers.

  • I believe that some backends build with ICPX and Codeplay's plugins for Nvidia and AMD GPUs, although this is undocumented. We should consider supporting and documenting these.
  • In cases where a backend is dependent on a particular compiler, we should consider trying to generate useful CMake errors.

Preferred solution

  • Consider rewriting building_the_project.rst to address the above problems.
    • Some aspects (e.g. additional options for building with AdaptiveCpp) could potentially move to new files.
  • Generate more errors in CMake to reflect compiler support for different backends.
@cdgarland
Copy link
Contributor

Hi @hjabird We are looking for community support for this project. Are you (or any others) willing to pitch in and help improve the documentation?

@Rbiessy
Copy link
Contributor

Rbiessy commented Mar 22, 2024

Hi Craig, yes we will try to help with some of these issues when time permits. It would be useful to have some feedback before we start spending time on it.

To continue the discussion from #458 (comment) I was thinking it could make sense to document some configurations that are expected to work but not regularly tested. This could help users try some configurations and report issues. I think the combination of all the domains, platforms, compilers and OS lead to a number of configurations too high to be regularly tested. I'm curious to have @mmeterel's opinion on this.

@mmeterel
Copy link
Contributor

Hi Craig, yes we will try to help with some of these issues when time permits. It would be useful to have some feedback before we start spending time on it.

To continue the discussion from #458 (comment) I was thinking it could make sense to document some configurations that are expected to work but not regularly tested. This could help users try some configurations and report issues. I think the combination of all the domains, platforms, compilers and OS lead to a number of configurations too high to be regularly tested. I'm curious to have @mmeterel's opinion on this.

@Rbiessy I support adding documentation for untested configurations as long as how to use them are clearly documented and reproducible by internal and external developers. Main concern with not testing is, as the repo keeps evolving, how will we ensure these untested configurations will continue working? What do you think for such cases?
Another point about documenting which configs are tested and which are not. I support being transparent to users by providing this information (like our AMDs backend are not tested in CI yet), but we should make plans to close the gaps in CI in a timely manner. (For example adding AMD backends in CI is in progress)

@Rbiessy
Copy link
Contributor

Rbiessy commented Mar 25, 2024

I don't have a good solution to ensure that untested configurations will continue working. I would say that if people are trying to use them we will eventually get an issue if something is broken. That is my understanding of a community-driven project.

Regarding the CI I have some doubts about how much details we should share. I believe that Alexey (@toxicscum) is working on closing the gap but I doubt everything in the README can be tested.

@mmeterel
Copy link
Contributor

I don't have a good solution to ensure that untested configurations will continue working. I would say that if people are trying to use them we will eventually get an issue if something is broken. That is my understanding of a community-driven project.

Regarding the CI I have some doubts about how much details we should share. I believe that Alexey (@toxicscum) is working on closing the gap but I doubt everything in the README can be tested.

Relying on user input when/if something is broken degrades the quality of the project/repo imho. One option would be, guaranteeing all the mentioned configurations work in every release. Then, at least we have a time stamp where the user can get a stable repo.

Regarding details about CI: When we mention about a configuration and say it is not tested regularly, we are already sharing this information. Ideally, in a repo like this, all CI should be public IMHO.

@Rbiessy
Copy link
Contributor

Rbiessy commented Mar 25, 2024

So we've had internal discussions about the release of oneMKL. We are planning to create releases of oneMKL interface and ship them with the oneAPI base toolkit. These will be tested by the oneAPI core team at Codeplay using icpx and the Codeplay Nvidia and AMD plugins. As I far as I know this is the only recommended way for users to get DPC++ running on all hardware. I am planning to start a separate discussion on the oneMKL interface release eventually.

I hope it makes more sense why we think we should document some configurations that are expected to work but may not be regularly tested yet.

@mmeterel
Copy link
Contributor

So we've had internal discussions about the release of oneMKL. We are planning to create releases of oneMKL interface and ship them with the oneAPI base toolkit. These will be tested by the oneAPI core team at Codeplay using icpx and the Codeplay Nvidia and AMD plugins. As I far as I know this is the only recommended way for users to get DPC++ running on all hardware. I am planning to start a separate discussion on the oneMKL interface release eventually.

I hope it makes more sense why we think we should document some configurations that are expected to work but may not be regularly tested yet.

This is news to me (releasing oneMKL interfaces as part of base tool kit) Is this in replacement of the releases done in oneMKL interfaces repo or in addition to it? Tagging Irina and Maria for their information and inputs. @mkrainiuk @itopinsk

@Rbiessy
Copy link
Contributor

Rbiessy commented Mar 25, 2024

I would prefer to keep the discussion on releases separate. I will try to start the discussion this week.

@hdelan
Copy link
Contributor

hdelan commented Mar 26, 2024

I agree that there should be better documentation for building and running tests in oneMKL.

It also seems that there isn't support (as far as I can tell) for building with non standard ROCm installation. If the build process was better documented, the documentation effort could potentially also act as a survey on what works and what doesn't. This would be valuable IMO

@hjabird
Copy link
Contributor Author

hjabird commented May 28, 2024

As part of this, the documentation for consuming oneMKL with CMake could also be improved. The documentation does not detail CMake targets. This was bought up in #501.

@vbm23
Copy link

vbm23 commented Jun 10, 2024

Hi Team @Rbiessy @mmeterel @hjabird @mkrainiuk @hdelan, is there timeline in place for the same as was expecting we could meet Q2 timeline of June 28th for uxlfoundation/open-source-working-group#38

@hjabird
Copy link
Contributor Author

hjabird commented Jun 10, 2024

HI @vbm23, there should be a PR in a few hours! Working branch here.

@vbm23
Copy link

vbm23 commented Jun 11, 2024

@hjabird Since #510 would partially help close this issue, would the other one be #501?
@Rbiessy Feel free to comment

@hjabird
Copy link
Contributor Author

hjabird commented Jun 11, 2024

@vbm23 I think the other thing we can do is generate useful errors.

  • [CMake] Remove FindRoc* files #490 resolved errors that you'd get from HIP_PATH being wrong.
  • [Docs] Rewrite build documentation #510 improves the build documentation
    • I also added documentation on consuming oneMKL with CMake as part of this PR.
  • Finally we can do more to generate more useful errors. For instance, using AdaptiveCpp with unsupported backends gives you compiler errors for features that not included in AdaptiveCpp. Ideally we'd generate an error along the lines of "AdaptiveCpp is not supported for this backend" instead.

I think we're getting there, but not quite there yet!

@Rbiessy Rbiessy added the documentation A request to change/fix/improve the documentation label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation A request to change/fix/improve the documentation
Projects
None yet
Development

No branches or pull requests

6 participants