-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++20] [Modules] Should module partitions be embedded in the primary BMI? #62837
Comments
@llvm/issue-subscribers-clang-modules |
The length of command line may not be compelling to me. Given @tschuett mentions that we can use response file to include the command lines. You can also find the example by displaying the command line from cmake. Also the size of the pcm files is a main concern for me. More than one people complain the size of pcm files to me. So if we combine several pcm files together, the size will become larger which looks bad... And for distributed build, given the size of pcm files now, it is still unclear which one is better: to build source files in the remote or passing the pcm files. I'm worrying passing pcm files will run out all of the network bandwidths. Given it may require a non-trivial refactorization to embed module files into another, I don't see a big benefit from the proposal. |
Note that it came up that this "repack BMIs" process doesn't really have a place to "live" in a static build graph (as CMake uses it). As an example: module X:part; will implicitly import
While it may seem useful, I don't think it's something a build system without access to dynamic nodes (basically ruling out So while I think this might be a useful option, mandating it sounds like the wrong decision to me. |
Seems like this is too niche after all. It's not clear whether the benefits of reduced number cache queries and stronger module encapsulation in distributed builds actually outweigh the reduced parallelism and larger BMI size. I'll have to test it eventually and I'll let you know if there are any interesting results performance-wise, but at the moment I feel that this issue only distracts from more important issues related to modules. Thank you all for the feedback ❤️ |
After e22fa1d it is now required to explicitly specify all dependencies that make up a module. This is a good change in terms of correctness and ease of downstream build system integration, but it exposed some issues regarding target encapsulation and command line length. For instance, below is how the command line for a
hello world
looks like withlibcxx
modulestd
in rules_ll:Details
This is not an issue for build systems, but it makes it hard to write command lines by hand that depend on e.g. the
std
module. It's also not ideal if one wants to debug a command line since there is so much noise caused by all the partitions.My main gripe though is that this makes it hard to clearly impose boundaries on a target. A module partition, which I'd generally consider an implementation detail, will leak transitively into all downstream targets. It will always need to be present, it's path needs to be known, and it will always influence all downstream command lines.
A potential solution could be to embed partitions in the primary module interface unit by default. This would allow for shorter command lines while still being explicit. It would come at the cost of having the binary of every partition twice - once in the original partition BMI and once in the BMI of the primary module interface. This is somewhat similar to object/archive and object/shared-object relationships.
Intuitively, I believe that this is also beneficial for distributed builds, as we'd only need to query remote caches once for the existence of a module's primary BMI instead of having to query (in the case of
std
) 50 partitions. I haven't benchmarked this though, so I might be wrong, (the benefits of less duplication in caches might outweigh the queries?).Note that you can't
import
a module partition outside of the module it belongs to. The primary module interface can always be thought of as a bottleneck target regardless of whether it embeds its partitions or not. In other words, we always need all BMIs that make up a primary module interface unit if we want to use it in a downstream target. It's just that with embedded partitions the command lines of consumers become shorter and we'd have more intuitive perceived target boundaries at the level of primary module interfaces.Related:
cc @ChuanqiXu9 @mathstuf @dwblaikie @ruoso @vsapsai
The text was updated successfully, but these errors were encountered: