Skip to content

Comments

[C++20] [Modules] Don't import initializer/pending implicit instantiations from other named module#167468

Merged
ChuanqiXu9 merged 1 commit intollvm:mainfrom
ChuanqiXu9:DontImportInitOrPendingInstantiation
Nov 12, 2025
Merged

[C++20] [Modules] Don't import initializer/pending implicit instantiations from other named module#167468
ChuanqiXu9 merged 1 commit intollvm:mainfrom
ChuanqiXu9:DontImportInitOrPendingInstantiation

Conversation

@ChuanqiXu9
Copy link
Member

Close #166068

The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time.

And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.

…tions from other named module

Close llvm#166068

The cause of the problem is that we would import initializers and pending
implicit instantiations from other named module. This is very bad and it
may waste a lot of time.

And we didn't observe it as the weak symbols can live together and the
strong symbols would be removed by other mechanism. So we didn't observe
the bad behavior for a long time. But it indeeds waste compilation time.
@ChuanqiXu9 ChuanqiXu9 added clang:modules C++20 modules and Clang Header Modules skip-precommit-approval PR for CI feedback, not intended for review labels Nov 11, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Close #166068

The cause of the problem is that we would import initializers and pending implicit instantiations from other named module. This is very bad and it may waste a lot of time.

And we didn't observe it as the weak symbols can live together and the strong symbols would be removed by other mechanism. So we didn't observe the bad behavior for a long time. But it indeeds waste compilation time.


Full diff: https://github.com/llvm/llvm-project/pull/167468.diff

3 Files Affected:

  • (modified) clang/lib/Serialization/ASTReader.cpp (+15-8)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+12-10)
  • (added) clang/test/Modules/pr166068.cppm (+38)
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index a04041c10b4ba..634bf991b2aee 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4087,10 +4087,14 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             std::errc::illegal_byte_sequence,
             "Invalid PENDING_IMPLICIT_INSTANTIATIONS block");
 
-      for (unsigned I = 0, N = Record.size(); I != N; /* in loop */) {
-        PendingInstantiations.push_back(
-            {ReadDeclID(F, Record, I),
-             ReadSourceLocation(F, Record, I).getRawEncoding()});
+      // For standard C++20 module, we will only reads the instantiations
+      // if it is the main file.
+      if (!F.StandardCXXModule || F.Kind == MK_MainFile) {
+        for (unsigned I = 0, N = Record.size(); I != N; /* in loop */) {
+          PendingInstantiations.push_back(
+              {ReadDeclID(F, Record, I),
+               ReadSourceLocation(F, Record, I).getRawEncoding()});
+        }
       }
       break;
 
@@ -6438,10 +6442,13 @@ llvm::Error ASTReader::ReadSubmoduleBlock(ModuleFile &F,
     case SUBMODULE_INITIALIZERS: {
       if (!ContextObj)
         break;
-      SmallVector<GlobalDeclID, 16> Inits;
-      for (unsigned I = 0; I < Record.size(); /*in loop*/)
-        Inits.push_back(ReadDeclID(F, Record, I));
-      ContextObj->addLazyModuleInitializers(CurrentModule, Inits);
+      // Standard C++ module has its own way to initialize variables.
+      if (!F.StandardCXXModule || F.Kind == MK_MainFile) {
+        SmallVector<GlobalDeclID, 16> Inits;
+        for (unsigned I = 0; I < Record.size(); /*in loop*/)
+          Inits.push_back(ReadDeclID(F, Record, I));
+        ContextObj->addLazyModuleInitializers(CurrentModule, Inits);
+      }
       break;
     }
 
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 821e7df1bce53..e4618d60a8acb 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3247,7 +3247,7 @@ void ASTWriter::WriteSubmodules(Module *WritingModule, ASTContext *Context) {
 
     // Emit the reachable initializers.
     // The initializer may only be unreachable in reduced BMI.
-    if (Context) {
+    if (Context && !GeneratingReducedBMI) {
       RecordData Inits;
       for (Decl *D : Context->getModuleInitializers(Mod))
         if (wasDeclEmitted(D))
@@ -5827,17 +5827,19 @@ void ASTWriter::WriteSpecialDeclRecords(Sema &SemaRef) {
     Stream.EmitRecord(UNUSED_LOCAL_TYPEDEF_NAME_CANDIDATES,
                       UnusedLocalTypedefNameCandidates);
 
-  // Write the record containing pending implicit instantiations.
-  RecordData PendingInstantiations;
-  for (const auto &I : SemaRef.PendingInstantiations) {
-    if (!wasDeclEmitted(I.first))
-      continue;
+  if (!GeneratingReducedBMI) {
+    // Write the record containing pending implicit instantiations.
+    RecordData PendingInstantiations;
+    for (const auto &I : SemaRef.PendingInstantiations) {
+      if (!wasDeclEmitted(I.first))
+        continue;
 
-    AddDeclRef(I.first, PendingInstantiations);
-    AddSourceLocation(I.second, PendingInstantiations);
+      AddDeclRef(I.first, PendingInstantiations);
+      AddSourceLocation(I.second, PendingInstantiations);
+    }
+    if (!PendingInstantiations.empty())
+      Stream.EmitRecord(PENDING_IMPLICIT_INSTANTIATIONS, PendingInstantiations);
   }
-  if (!PendingInstantiations.empty())
-    Stream.EmitRecord(PENDING_IMPLICIT_INSTANTIATIONS, PendingInstantiations);
 
   // Write the record containing declaration references of Sema.
   RecordData SemaDeclRefs;
diff --git a/clang/test/Modules/pr166068.cppm b/clang/test/Modules/pr166068.cppm
new file mode 100644
index 0000000000000..b6944b591d264
--- /dev/null
+++ b/clang/test/Modules/pr166068.cppm
@@ -0,0 +1,38 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+//
+// RUN: %clang_cc1 -std=c++20 %t/flyweight.cppm -emit-reduced-module-interface -o %t/flyweight.pcm
+// RUN: %clang_cc1 -std=c++20 %t/account.cppm -emit-reduced-module-interface -o %t/account.pcm -fprebuilt-module-path=%t
+// RUN: %clang_cc1 -std=c++20 %t/core.cppm -emit-reduced-module-interface -o %t/core.pcm -fprebuilt-module-path=%t
+// RUN: %clang_cc1 -std=c++20 %t/core.cppm -fprebuilt-module-path=%t -emit-llvm -disable-llvm-passes -o - | FileCheck %t/core.cppm
+
+//--- flyweight.cppm
+module;
+template <typename> struct flyweight_core {
+  static bool init() { (void)__builtin_operator_new(2); return true; }
+  static bool static_initializer;
+};
+template <typename T> bool flyweight_core<T>::static_initializer = init();
+export module flyweight;
+export template <class> void flyweight() {
+  (void)flyweight_core<int>::static_initializer;
+}
+
+//--- account.cppm
+export module account;
+import flyweight;
+export void account() {
+  (void)::flyweight<int>;
+}
+
+//--- core.cppm
+export module core;
+import account;
+
+extern "C" void core() {}
+
+// Fine enough to check it won't crash.
+// CHECK-NOT: init
+// CHECK-NOT: static_initializer
+// CHECK: define {{.*}}@core(

@ChuanqiXu9
Copy link
Member Author

I've tested this in internal workloads and it looks good.

@ChuanqiXu9 ChuanqiXu9 merged commit ae2b303 into llvm:main Nov 12, 2025
14 checks passed
ReadSourceLocation(F, Record, I).getRawEncoding()});
// For standard C++20 module, we will only reads the instantiations
// if it is the main file.
if (!F.StandardCXXModule || F.Kind == MK_MainFile) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is there a way to consolidate this check into a function? This feels like something that may change and someone may forget to update both places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is there a way to consolidate this check into a function?

If you mean have a function returning the condition "(!F.StandardCXXModule || F.Kind == MK_MainFile)", yes we can do that. But I don't have a good name/good abstraction model for it yet. I don't have a strong feeling but I have the current style looks straight forward.

This feels like something that may change and someone may forget to update both places.

I don't feel they may change as the C++20 modules goes stable nowadays. For updating, on the one hand, the current serialization/deserialization framework almost always asks us to update the both places. On the other hand, for the change itself, technically we don't have to update both places, as the change in the writer side says "we don't need to write it" and the reader side says "we don't need to read it". It is actually "fine" to omit either side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category skip-precommit-approval PR for CI feedback, not intended for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang] Assertion "predeclared global operator new/delete is missing" hit at CGExprCXX.cpp:1384 when using C++20 modules

3 participants