[C++20] [Modules] Don't import initializer/pending implicit instantiations from other named module#167468
Conversation
…tions from other named module Close llvm#166068 The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time. And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules Author: Chuanqi Xu (ChuanqiXu9) ChangesClose #166068 The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time. And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time. Full diff: https://github.com/llvm/llvm-project/pull/167468.diff 3 Files Affected:
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index a04041c10b4ba..634bf991b2aee 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4087,10 +4087,14 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
std::errc::illegal_byte_sequence,
"Invalid PENDING_IMPLICIT_INSTANTIATIONS block");
- for (unsigned I = 0, N = Record.size(); I != N; /* in loop */) {
- PendingInstantiations.push_back(
- {ReadDeclID(F, Record, I),
- ReadSourceLocation(F, Record, I).getRawEncoding()});
+ // For standard C++20 module, we will only reads the instantiations
+ // if it is the main file.
+ if (!F.StandardCXXModule || F.Kind == MK_MainFile) {
+ for (unsigned I = 0, N = Record.size(); I != N; /* in loop */) {
+ PendingInstantiations.push_back(
+ {ReadDeclID(F, Record, I),
+ ReadSourceLocation(F, Record, I).getRawEncoding()});
+ }
}
break;
@@ -6438,10 +6442,13 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
case SUBMODULE_INITIALIZERS: {
if (!ContextObj)
break;
- SmallVector<GlobalDeclID, 16> Inits;
- for (unsigned I = 0; I < Record.size(); /*in loop*/)
- Inits.push_back(ReadDeclID(F, Record, I));
- ContextObj->addLazyModuleInitializers(CurrentModule, Inits);
+ // Standard C++ module has its own way to initialize variables.
+ if (!F.StandardCXXModule || F.Kind == MK_MainFile) {
+ SmallVector<GlobalDeclID, 16> Inits;
+ for (unsigned I = 0; I < Record.size(); /*in loop*/)
+ Inits.push_back(ReadDeclID(F, Record, I));
+ ContextObj->addLazyModuleInitializers(CurrentModule, Inits);
+ }
break;
}
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 821e7df1bce53..e4618d60a8acb 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3247,7 +3247,7 @@ void ASTWriter::WriteSubmodules(Module *WritingModule, ASTContext *Context) {
// Emit the reachable initializers.
// The initializer may only be unreachable in reduced BMI.
- if (Context) {
+ if (Context && !GeneratingReducedBMI) {
RecordData Inits;
for (Decl *D : Context->getModuleInitializers(Mod))
if (wasDeclEmitted(D))
@@ -5827,17 +5827,19 @@ void ASTWriter::WriteSpecialDeclRecords(Sema &SemaRef) {
Stream.EmitRecord(UNUSED_LOCAL_TYPEDEF_NAME_CANDIDATES,
UnusedLocalTypedefNameCandidates);
- // Write the record containing pending implicit instantiations.
- RecordData PendingInstantiations;
- for (const auto &I : SemaRef.PendingInstantiations) {
- if (!wasDeclEmitted(I.first))
- continue;
+ if (!GeneratingReducedBMI) {
+ // Write the record containing pending implicit instantiations.
+ RecordData PendingInstantiations;
+ for (const auto &I : SemaRef.PendingInstantiations) {
+ if (!wasDeclEmitted(I.first))
+ continue;
- AddDeclRef(I.first, PendingInstantiations);
- AddSourceLocation(I.second, PendingInstantiations);
+ AddDeclRef(I.first, PendingInstantiations);
+ AddSourceLocation(I.second, PendingInstantiations);
+ }
+ if (!PendingInstantiations.empty())
+ Stream.EmitRecord(PENDING_IMPLICIT_INSTANTIATIONS, PendingInstantiations);
}
- if (!PendingInstantiations.empty())
- Stream.EmitRecord(PENDING_IMPLICIT_INSTANTIATIONS, PendingInstantiations);
// Write the record containing declaration references of Sema.
RecordData SemaDeclRefs;
diff --git a/clang/test/Modules/pr166068.cppm b/clang/test/Modules/pr166068.cppm
new file mode 100644
index 0000000000000..b6944b591d264
--- /dev/null
+++ b/clang/test/Modules/pr166068.cppm
@@ -0,0 +1,38 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+//
+// RUN: %clang_cc1 -std=c++20 %t/flyweight.cppm -emit-reduced-module-interface -o %t/flyweight.pcm
+// RUN: %clang_cc1 -std=c++20 %t/account.cppm -emit-reduced-module-interface -o %t/account.pcm -fprebuilt-module-path=%t
+// RUN: %clang_cc1 -std=c++20 %t/core.cppm -emit-reduced-module-interface -o %t/core.pcm -fprebuilt-module-path=%t
+// RUN: %clang_cc1 -std=c++20 %t/core.cppm -fprebuilt-module-path=%t -emit-llvm -disable-llvm-passes -o - | FileCheck %t/core.cppm
+
+//--- flyweight.cppm
+module;
+template <typename> struct flyweight_core {
+ static bool init() { (void)__builtin_operator_new(2); return true; }
+ static bool static_initializer;
+};
+template <typename T> bool flyweight_core<T>::static_initializer = init();
+export module flyweight;
+export template <class> void flyweight() {
+ (void)flyweight_core<int>::static_initializer;
+}
+
+//--- account.cppm
+export module account;
+import flyweight;
+export void account() {
+ (void)::flyweight<int>;
+}
+
+//--- core.cppm
+export module core;
+import account;
+
+extern "C" void core() {}
+
+// Fine enough to check it won't crash.
+// CHECK-NOT: init
+// CHECK-NOT: static_initializer
+// CHECK: define {{.*}}@core(
|
|
I've tested this in internal workloads and it looks good. |
| ReadSourceLocation(F, Record, I).getRawEncoding()}); | ||
| // For standard C++20 module, we will only reads the instantiations | ||
| // if it is the main file. | ||
| if (!F.StandardCXXModule || F.Kind == MK_MainFile) { |
There was a problem hiding this comment.
So is there a way to consolidate this check into a function? This feels like something that may change and someone may forget to update both places.
There was a problem hiding this comment.
So is there a way to consolidate this check into a function?
If you mean have a function returning the condition "(!F.StandardCXXModule || F.Kind == MK_MainFile)", yes we can do that. But I don't have a good name/good abstraction model for it yet. I don't have a strong feeling but I have the current style looks straight forward.
This feels like something that may change and someone may forget to update both places.
I don't feel they may change as the C++20 modules goes stable nowadays. For updating, on the one hand, the current serialization/deserialization framework almost always asks us to update the both places. On the other hand, for the change itself, technically we don't have to update both places, as the change in the writer side says "we don't need to write it" and the reader side says "we don't need to read it". It is actually "fine" to omit either side.
Close #166068
The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time.
And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.