Skip to content

[C++20][Modules] Improve namespace look-up performance for modules. (Attempt #2)#177255

Merged
mpark merged 3 commits intollvm:mainfrom
mpark:modules-perf-namespaces-2
Jan 22, 2026
Merged

[C++20][Modules] Improve namespace look-up performance for modules. (Attempt #2)#177255
mpark merged 3 commits intollvm:mainfrom
mpark:modules-perf-namespaces-2

Conversation

@mpark
Copy link
Member

@mpark mpark commented Jan 21, 2026

This is a reapplication of #171769 which was reverted in #174783. This adds clang/test/Modules/pr171769.cpp which is a repro provided by @rupprecht .

TL;DR: forEachImportedKeyDecls is insufficient to iterate over the decls of each module we need to emit. This PR uses an existing logic CollectFirstDeclFromEachModule in ASTWriterDecl.cpp instead to collect all the decls we need from the modules involved.

Problem Surfaced in the Repro

The repro copied inline here:

// RUN: rm -rf %t
// RUN: mkdir -p %t
// RUN: split-file %s %t
// RUN: cd %t
//

// RUN: %clang_cc1 -fmodule-name=A -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules A.cppmap -o A.pcm
// RUN: %clang_cc1 -fmodule-name=B -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=A.pcm B.cppmap -o B.pcm
// RUN: %clang_cc1 -fmodule-name=C -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules C.cppmap -o C.pcm
// RUN: %clang_cc1 -fmodule-name=D -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=C.pcm D.cppmap -o D.pcm
// RUN: %clang_cc1 -fmodule-name=E -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=B.pcm E.cppmap -o E.pcm
// RUN: %clang_cc1 -fmodule-name=F -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=D.pcm F.cppmap -o F.pcm
// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=E.pcm -fmodule-file=F.pcm G.cppmap -o G.pcm
// RUN: %clang_cc1 -fno-cxx-modules -fmodules -fmodule-file=G.pcm src.cpp \
// RUN:   -o /dev/null

//--- A.cppmap
module "A" { header "A.h" }

//--- A.h
int x;

//--- B.cppmap
module "B" {}

//--- C.cppmap
module "C" { header "C.h" }

//--- C.h
namespace xyz {}

//--- D.cppmap
module "D" {}

//--- E.cppmap
module "E" {}

//--- F.cppmap
module "F" { header "F.h" }

//--- F.h
namespace xyz { inline void func() {} }

//--- G.cppmap
module "G" { header "G.h" }

//--- G.h
#include "F.h"
namespace { void func2() { xyz::func(); } }

//--- hdr.h
#include "F.h"
namespace xyz_ns = xyz;

//--- src.cpp
#include "hdr.h"

When we go to build module G, before #171769 it emitted both C::xyz and F::xyz. After #171769 it started to only emit C::xyz without F::xyz. This is the core of what is addressed in this PR.

The main strategy is still the same, but with a tweak to the AST writing half of the problem. In #171769 I wrote about how we can use the KeyDecls data structure:

The other half of the problem is to write out all of the external namespaces that we used to store in StoredDeclsList but no longer. For this, we take advantage of the KeyDecls data structure in ASTReader. KeyDecls is roughly a map<Decl *, vector<GlobalDeclID>>, and it stores a mapping from the canonical decl of a redeclarable decl to a list of GlobalDeclIDs where each ID represents a "key declaration" from each imported module. More to the point, if we read external namespaces N1, N2, N3 in ASTReader, we'll either have N1 mapped to [N2, N3], or some newly local canonical decl mapped to [N1, N2, N3]. Either way, we can visit N1, N2, and N3 by doing ASTReader::forEachImportedKeyDecls(N1, Visitor), and we leverage this to maintain the current behavior of writing out all of the imported namespace decls in ASTWriter.

However, it turns out this is insufficient. Specifically, population of KeyDecls does not account for unimported modules in the dependency properly. We need to use a strategy already used in ASTWriterDecl.cpp, which is to CollectFirstDeclFromEachModule and emit those instead. For example, in building module G, I would've expected forEachImportedKeyDecls to visit F::xyz as well as C::xyz. But because C is transitively baked into module F without actually being imported anywhere, it gets pulled in through F where F::xyz is considered a redecl of C::xyz. So in building G, forEachImportedKeyDecls visits C::xyz but not F::xyz. Anyway, the conclusion is that CollectFirstDeclFromEachModule does the correct thing of visiting both C::xyz and F::xyz.

I moved the CollectFirstDeclFromEachModule in ASTWriterDecl over to ASTWriter::CollectFirstDeclFromEachModule and this is used in ASTWriter.cpp as well the existing two use cases in ASTWriterDecl.cpp.

The Issue of Merging MultiOnDiskHashTable

This repro is extremely finicky however, where for example, removing the -fmodule-file=E.pcm from this:

// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=E.pcm -fmodule-file=F.pcm G.cppmap -o G.pcm

to

// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=F.pcm G.cppmap -o G.pcm

the test passes.

Module E is literally empty, and it depends on B which is literally empty, and depends on A which has a dummy int x; declaration... None of this should have anything to do with namespace xyz... What's going on here?

The issue here is that MultiOnDiskHashTable will merge after a specified number of modules are added. In our case, this number is 4. So without module E, module F which depends on D which depends on C, is under that threshold of merging. This makes it such that N-ary look-up is performed on each module. Once E is introduced, we're past the threshold of 4, so merging occurs. Once merging occurs, the merged table is output in the PCM along with a list of "overriden files", which removes the tables provided by the modules before any table look-up. It gets even more complicated though, because the entries from the merged table is output only if there is not already an entry in the generator:

        // Add all merged entries from Base to the generator.
        for (auto &KV : Merged->Data) {
          if (!Gen.contains(KV.first, Info))  // <-- ignored if there is already an entry!
            Gen.insert(KV.first, Info.ImportData(KV.second), Info);

In our case, we had an entry in the generator already, which makes the merged table drop its entry. All of this is rather complicated machinery... I'm mostly leaving here as a note.

@mpark mpark requested a review from ChuanqiXu9 January 21, 2026 22:05
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jan 21, 2026
@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2026

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Michael Park (mpark)

Changes

This is a reapplication of #171769 which was reverted in #174783. This adds clang/test/Modules/pr171769.cpp which is a repro provided by @rupprecht .

TL;DR: forEachImportedKeyDecls is insufficient to iterate over the decls of each module we need to emit. This PR uses an existing logic CollectFirstDeclFromEachModule in ASTWriterDecl.cpp instead to collect all the decls we need from the modules involved.

Problem Surfaced in the Repro

The repro copied inline here:

// RUN: rm -rf %t
// RUN: mkdir -p %t
// RUN: split-file %s %t
// RUN: cd %t
//

// RUN: %clang_cc1 -fmodule-name=A -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules A.cppmap -o A.pcm
// RUN: %clang_cc1 -fmodule-name=B -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=A.pcm B.cppmap -o B.pcm
// RUN: %clang_cc1 -fmodule-name=C -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules C.cppmap -o C.pcm
// RUN: %clang_cc1 -fmodule-name=D -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=C.pcm D.cppmap -o D.pcm
// RUN: %clang_cc1 -fmodule-name=E -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=B.pcm E.cppmap -o E.pcm
// RUN: %clang_cc1 -fmodule-name=F -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=D.pcm F.cppmap -o F.pcm
// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=E.pcm -fmodule-file=F.pcm G.cppmap -o G.pcm
// RUN: %clang_cc1 -fno-cxx-modules -fmodules -fmodule-file=G.pcm src.cpp \
// RUN:   -o /dev/null

//--- A.cppmap
module "A" { header "A.h" }

//--- A.h
int x;

//--- B.cppmap
module "B" {}

//--- C.cppmap
module "C" { header "C.h" }

//--- C.h
namespace xyz {}

//--- D.cppmap
module "D" {}

//--- E.cppmap
module "E" {}

//--- F.cppmap
module "F" { header "F.h" }

//--- F.h
namespace xyz { inline void func() {} }

//--- G.cppmap
module "G" { header "G.h" }

//--- G.h
#include "F.h"
namespace { void func2() { xyz::func(); } }

//--- hdr.h
#include "F.h"
namespace xyz_ns = xyz;

//--- src.cpp
#include "hdr.h"

When we go to build module G, before #171769 it emitted both C::xyzs from and F::xyz. After #171769 it started to only emit C::xyz without F::xyz. This is the core of what is addressed in this PR.

The main strategy is still the same, but with a tweak to the AST writing half of the problem. In #171769 I wrote about how we can use the KeyDecls data structure:

> The other half of the problem is to write out all of the external namespaces that we used to store in StoredDeclsList but no longer. For this, we take advantage of the KeyDecls data structure in ASTReader. KeyDecls is roughly a map&lt;Decl *, vector&lt;GlobalDeclID&gt;&gt;, and it stores a mapping from the canonical decl of a redeclarable decl to a list of GlobalDeclIDs where each ID represents a "key declaration" from each imported module. More to the point, if we read external namespaces N1, N2, N3 in ASTReader, we'll either have N1 mapped to [N2, N3], or some newly local canonical decl mapped to [N1, N2, N3]. Either way, we can visit N1, N2, and N3 by doing ASTReader::forEachImportedKeyDecls(N1, Visitor), and we leverage this to maintain the current behavior of writing out all of the imported namespace decls in ASTWriter.

However, it turns out this is insufficient. Specifically, population of KeyDecls does not account for unimported modules in the dependency properly. We need to use a strategy already used in ASTWriterDecl.cpp, which is to CollectFirstDeclFromEachModule and emit those instead. For example, in building module G, I would've expected forEachImportedKeyDecls to visit F::xyz as well as C::xyz. But because C is transitively baked into module F without actually being imported anywhere, it gets pulled in through F where F::xyz is considered a redecl of C::xyz. So in building G, forEachImportedKeyDecls visits C::xyz but not F::xyz. Anyway, the conclusion is that CollectFirstDeclFromEachModule does the correct thing of visiting both C::xyz and F::xyz.

I introduced ASTWriter::CollectFirstDeclFromEachModule and this is used in ASTWriter.cpp as well the existing two use cases in ASTWriterDecl.cpp.

The Issue of Merging MultiOnDiskHashTable

This repro is extremely finicky however, where for example, removing the -fmodule-file=E.pcm from this:

// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=E.pcm -fmodule-file=F.pcm G.cppmap -o G.pcm

to

// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
// RUN:   -fmodules -fmodule-file=F.pcm G.cppmap -o G.pcm

the test passes.

Module E is literally empty, and it depends on B which is literally empty, and depends on A which has a dummy int x; declaration... None of this should have anything to do with namespace xyz... What's going on here?

The issue here is that MultiOnDiskHashTable will merge after a specified number of modules are added. In our case, this number is 4. So without module E, module F which depends on D which depends on C, is under that threshold of merging. This makes it such that N-ary look-up is performed on each module. Once E is introduced, we're past the threshold of 4, so merging occurs. Once merging occurs, the merged table is output in the PCM along with a list of "overriden files", which removes the tables provided by the modules before any table look-up. It gets even more complicated though, because the entries from the merged table is output only if there is not already an entry in the generator:

        // Add all merged entries from Base to the generator.
        for (auto &amp;KV : Merged-&gt;Data) {
          if (!Gen.contains(KV.first, Info))  // &lt;-- ignored if there is already an entry!
            Gen.insert(KV.first, Info.ImportData(KV.second), Info);

In our case, we had an entry in the generator already, which makes the merged table drop its entry. All of this is rather complicated machinery... I'm mostly leaving here as a note.


Patch is 20.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/177255.diff

7 Files Affected:

  • (modified) clang/include/clang/Serialization/ASTWriter.h (+6-1)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+47-11)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+37-5)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+10-28)
  • (added) clang/test/Modules/pr171769.cpp (+63)
  • (modified) clang/unittests/Serialization/CMakeLists.txt (+1)
  • (added) clang/unittests/Serialization/NamespaceLookupTest.cpp (+247)
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index d3029373ed2f7..1c4025f7482d9 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -773,13 +773,18 @@ class ASTWriter : public ASTDeserializationListener,
   /// Is this a local declaration (that is, one that will be written to
   /// our AST file)? This is the case for declarations that are neither imported
   /// from another AST file nor predefined.
-  bool IsLocalDecl(const Decl *D) {
+  bool IsLocalDecl(const Decl *D) const {
     if (D->isFromASTFile())
       return false;
     auto I = DeclIDs.find(D);
     return (I == DeclIDs.end() || I->second >= clang::NUM_PREDEF_DECL_IDS);
   };
 
+  /// Collect the first declaration from each module file that provides a
+  /// declaration of D.
+  llvm::MapVector<serialization::ModuleFile *, const Decl *>
+  CollectFirstDeclFromEachModule(const Decl *D, bool IncludeLocal);
+
   void AddLookupOffsets(const LookupBlockOffsets &Offsets,
                         RecordDataImpl &Record);
 
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 17801b59963a0..f3902d57e3d1f 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -555,7 +555,25 @@ namespace {
 
 using MacroDefinitionsMap =
     llvm::StringMap<std::pair<StringRef, bool /*IsUndef*/>>;
-using DeclsMap = llvm::DenseMap<DeclarationName, SmallVector<NamedDecl *, 8>>;
+
+class DeclsSet {
+  SmallVector<NamedDecl *, 64> Decls;
+  llvm::SmallPtrSet<NamedDecl *, 8> Found;
+
+public:
+  operator ArrayRef<NamedDecl *>() const { return Decls; }
+
+  bool empty() const { return Decls.empty(); }
+
+  bool insert(NamedDecl *ND) {
+    auto [_, Inserted] = Found.insert(ND);
+    if (Inserted)
+      Decls.push_back(ND);
+    return Inserted;
+  }
+};
+
+using DeclsMap = llvm::DenseMap<DeclarationName, DeclsSet>;
 
 } // namespace
 
@@ -8729,14 +8747,23 @@ bool ASTReader::FindExternalVisibleDeclsByName(const DeclContext *DC,
     return false;
 
   // Load the list of declarations.
-  SmallVector<NamedDecl *, 64> Decls;
-  llvm::SmallPtrSet<NamedDecl *, 8> Found;
+  DeclsSet DS;
 
   auto Find = [&, this](auto &&Table, auto &&Key) {
     for (GlobalDeclID ID : Table.find(Key)) {
       NamedDecl *ND = cast<NamedDecl>(GetDecl(ID));
-      if (ND->getDeclName() == Name && Found.insert(ND).second)
-        Decls.push_back(ND);
+      if (ND->getDeclName() != Name)
+        continue;
+      // Special case for namespaces: There can be a lot of redeclarations of
+      // some namespaces, and we import a "key declaration" per imported module.
+      // Since all declarations of a namespace are essentially interchangeable,
+      // we can optimize namespace look-up by only storing the key declaration
+      // of the current TU, rather than storing N key declarations where N is
+      // the # of imported modules that declare that namespace.
+      // TODO: Try to generalize this optimization to other redeclarable decls.
+      if (isa<NamespaceDecl>(ND))
+        ND = cast<NamedDecl>(getKeyDeclaration(ND));
+      DS.insert(ND);
     }
   };
 
@@ -8771,8 +8798,8 @@ bool ASTReader::FindExternalVisibleDeclsByName(const DeclContext *DC,
     Find(It->second.Table, Name);
   }
 
-  SetExternalVisibleDeclsForName(DC, Name, Decls);
-  return !Decls.empty();
+  SetExternalVisibleDeclsForName(DC, Name, DS);
+  return !DS.empty();
 }
 
 void ASTReader::completeVisibleDeclsMap(const DeclContext *DC) {
@@ -8790,7 +8817,16 @@ void ASTReader::completeVisibleDeclsMap(const DeclContext *DC) {
 
     for (GlobalDeclID ID : It->second.Table.findAll()) {
       NamedDecl *ND = cast<NamedDecl>(GetDecl(ID));
-      Decls[ND->getDeclName()].push_back(ND);
+      // Special case for namespaces: There can be a lot of redeclarations of
+      // some namespaces, and we import a "key declaration" per imported module.
+      // Since all declarations of a namespace are essentially interchangeable,
+      // we can optimize namespace look-up by only storing the key declaration
+      // of the current TU, rather than storing N key declarations where N is
+      // the # of imported modules that declare that namespace.
+      // TODO: Try to generalize this optimization to other redeclarable decls.
+      if (isa<NamespaceDecl>(ND))
+        ND = cast<NamedDecl>(getKeyDeclaration(ND));
+      Decls[ND->getDeclName()].insert(ND);
     }
 
     // FIXME: Why a PCH test is failing if we remove the iterator after findAll?
@@ -8800,9 +8836,9 @@ void ASTReader::completeVisibleDeclsMap(const DeclContext *DC) {
   findAll(ModuleLocalLookups, NumModuleLocalVisibleDeclContexts);
   findAll(TULocalLookups, NumTULocalVisibleDeclContexts);
 
-  for (DeclsMap::iterator I = Decls.begin(), E = Decls.end(); I != E; ++I) {
-    SetExternalVisibleDeclsForName(DC, I->first, I->second);
-  }
+  for (auto &[Name, DS] : Decls)
+    SetExternalVisibleDeclsForName(DC, Name, DS);
+
   const_cast<DeclContext *>(DC)->setHasExternalVisibleStorage(false);
 }
 
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 2e9f19eb0f51d..07c93187e0a47 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -4399,20 +4399,20 @@ class ASTDeclContextNameLookupTrait
 
   template <typename Coll> data_type getData(const Coll &Decls) {
     unsigned Start = DeclIDs.size();
-    for (NamedDecl *D : Decls) {
+    auto AddDecl = [this](NamedDecl *D) {
       NamedDecl *DeclForLocalLookup =
           getDeclForLocalLookup(Writer.getLangOpts(), D);
 
       if (Writer.getDoneWritingDeclsAndTypes() &&
           !Writer.wasDeclEmitted(DeclForLocalLookup))
-        continue;
+        return;
 
       // Try to avoid writing internal decls to reduced BMI.
       // See comments in ASTWriter::WriteDeclContextLexicalBlock for details.
       if (Writer.isGeneratingReducedBMI() &&
           !DeclForLocalLookup->isFromExplicitGlobalModule() &&
           IsInternalDeclFromFileContext(DeclForLocalLookup))
-        continue;
+        return;
 
       auto ID = Writer.GetDeclRef(DeclForLocalLookup);
 
@@ -4426,7 +4426,7 @@ class ASTDeclContextNameLookupTrait
             ModuleLocalDeclsMap.insert({Key, DeclIDsTy{ID}});
           else
             Iter->second.push_back(ID);
-          continue;
+          return;
         }
         break;
       case LookupVisibility::TULocal: {
@@ -4435,7 +4435,7 @@ class ASTDeclContextNameLookupTrait
           TULocalDeclsMap.insert({D->getDeclName(), DeclIDsTy{ID}});
         else
           Iter->second.push_back(ID);
-        continue;
+        return;
       }
       case LookupVisibility::GenerallyVisibile:
         // Generally visible decls go into the general lookup table.
@@ -4443,6 +4443,25 @@ class ASTDeclContextNameLookupTrait
       }
 
       DeclIDs.push_back(ID);
+    };
+    ASTReader *Chain = Writer.getChain();
+    for (NamedDecl *D : Decls) {
+      if (Chain && isa<NamespaceDecl>(D) && D->isFromASTFile() &&
+          D == Chain->getKeyDeclaration(D)) {
+        // In ASTReader, we stored only the key declaration of a namespace decl
+        // for this TU rather than storing all of the key declarations from each
+        // imported module. If we have an external namespace decl, this is that
+        // key declaration and we need to re-expand it to write out all of the
+        // key declarations from each imported module again.
+        //
+        // See comment 'ASTReader::FindExternalVisibleDeclsByName' for details.
+        auto Firsts =
+            Writer.CollectFirstDeclFromEachModule(D, /*IncludeLocal=*/false);
+        for (const auto &[_, First] : Firsts)
+          AddDecl(cast<NamedDecl>(const_cast<Decl *>(First)));
+      } else {
+        AddDecl(D);
+      }
     }
     return std::make_pair(Start, DeclIDs.size());
   }
@@ -6916,6 +6935,19 @@ TypeID ASTWriter::GetOrCreateTypeID(ASTContext &Context, QualType T) {
   });
 }
 
+llvm::MapVector<ModuleFile *, const Decl *>
+ASTWriter::CollectFirstDeclFromEachModule(const Decl *D, bool IncludeLocal) {
+  llvm::MapVector<ModuleFile *, const Decl *> Firsts;
+  // FIXME: We can skip entries that we know are implied by others.
+  for (const Decl *R = D->getMostRecentDecl(); R; R = R->getPreviousDecl()) {
+    if (R->isFromASTFile())
+      Firsts[Chain->getOwningModuleFile(R)] = R;
+    else if (IncludeLocal)
+      Firsts[nullptr] = R;
+  }
+  return Firsts;
+}
+
 void ASTWriter::AddLookupOffsets(const LookupBlockOffsets &Offsets,
                                  RecordDataImpl &Record) {
   Record.push_back(Offsets.LexicalOffset);
diff --git a/clang/lib/Serialization/ASTWriterDecl.cpp b/clang/lib/Serialization/ASTWriterDecl.cpp
index df24a12271a16..7646d5d5efe00 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -194,30 +194,13 @@ namespace clang {
       Record.AddSourceLocation(typeParams->getRAngleLoc());
     }
 
-    /// Collect the first declaration from each module file that provides a
-    /// declaration of D.
-    void CollectFirstDeclFromEachModule(
-        const Decl *D, bool IncludeLocal,
-        llvm::MapVector<ModuleFile *, const Decl *> &Firsts) {
-
-      // FIXME: We can skip entries that we know are implied by others.
-      for (const Decl *R = D->getMostRecentDecl(); R; R = R->getPreviousDecl()) {
-        if (R->isFromASTFile())
-          Firsts[Writer.Chain->getOwningModuleFile(R)] = R;
-        else if (IncludeLocal)
-          Firsts[nullptr] = R;
-      }
-    }
-
     /// Add to the record the first declaration from each module file that
     /// provides a declaration of D. The intent is to provide a sufficient
     /// set such that reloading this set will load all current redeclarations.
     void AddFirstDeclFromEachModule(const Decl *D, bool IncludeLocal) {
-      llvm::MapVector<ModuleFile *, const Decl *> Firsts;
-      CollectFirstDeclFromEachModule(D, IncludeLocal, Firsts);
-
-      for (const auto &F : Firsts)
-        Record.AddDeclRef(F.second);
+      auto Firsts = Writer.CollectFirstDeclFromEachModule(D, IncludeLocal);
+      for (const auto &[_, First] : Firsts)
+        Record.AddDeclRef(First);
     }
 
     template <typename T> bool shouldSkipWritingSpecializations(T *Spec) {
@@ -272,18 +255,17 @@ namespace clang {
       assert((isa<ClassTemplateSpecializationDecl>(D) ||
               isa<VarTemplateSpecializationDecl>(D) || isa<FunctionDecl>(D)) &&
              "Must not be called with other decls");
-      llvm::MapVector<ModuleFile *, const Decl *> Firsts;
-      CollectFirstDeclFromEachModule(D, /*IncludeLocal*/ true, Firsts);
-
-      for (const auto &F : Firsts) {
-        if (shouldSkipWritingSpecializations(F.second))
+      auto Firsts =
+          Writer.CollectFirstDeclFromEachModule(D, /*IncludeLocal=*/true);
+      for (const auto &[_, First] : Firsts) {
+        if (shouldSkipWritingSpecializations(First))
           continue;
 
         if (isa<ClassTemplatePartialSpecializationDecl,
-                VarTemplatePartialSpecializationDecl>(F.second))
-          PartialSpecsInMap.push_back(F.second);
+                VarTemplatePartialSpecializationDecl>(First))
+          PartialSpecsInMap.push_back(First);
         else
-          SpecsInMap.push_back(F.second);
+          SpecsInMap.push_back(First);
       }
     }
 
diff --git a/clang/test/Modules/pr171769.cpp b/clang/test/Modules/pr171769.cpp
new file mode 100644
index 0000000000000..c28ce890b715b
--- /dev/null
+++ b/clang/test/Modules/pr171769.cpp
@@ -0,0 +1,63 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+// RUN: cd %t
+//
+
+// RUN: %clang_cc1 -fmodule-name=A -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules A.cppmap -o A.pcm
+// RUN: %clang_cc1 -fmodule-name=B -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules -fmodule-file=A.pcm B.cppmap -o B.pcm
+// RUN: %clang_cc1 -fmodule-name=C -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules C.cppmap -o C.pcm
+// RUN: %clang_cc1 -fmodule-name=D -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules -fmodule-file=C.pcm D.cppmap -o D.pcm
+// RUN: %clang_cc1 -fmodule-name=E -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules -fmodule-file=B.pcm E.cppmap -o E.pcm
+// RUN: %clang_cc1 -fmodule-name=F -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules -fmodule-file=D.pcm F.cppmap -o F.pcm
+// RUN: %clang_cc1 -fmodule-name=G -fno-cxx-modules -xc++ -emit-module \
+// RUN:   -fmodules -fmodule-file=E.pcm -fmodule-file=F.pcm G.cppmap -o G.pcm
+// RUN: %clang_cc1 -fno-cxx-modules -fmodules -fmodule-file=G.pcm src.cpp \
+// RUN:   -o /dev/null
+
+//--- A.cppmap
+module "A" { header "A.h" }
+
+//--- A.h
+int x;
+
+//--- B.cppmap
+module "B" {}
+
+//--- C.cppmap
+module "C" { header "D.h" }
+
+//--- D.cppmap
+module "D" {}
+
+//--- D.h
+namespace xyz {}
+
+//--- E.cppmap
+module "E" {}
+
+//--- F.cppmap
+module "F" { header "F.h" }
+
+//--- F.h
+namespace xyz { inline void func() {} }
+
+//--- G.cppmap
+module "G" { header "G.h" }
+
+//--- G.h
+#include "F.h"
+namespace { void func2() { xyz::func(); } }
+
+//--- hdr.h
+#include "F.h"
+namespace xyz_ns = xyz;
+
+//--- src.cpp
+#include "hdr.h"
diff --git a/clang/unittests/Serialization/CMakeLists.txt b/clang/unittests/Serialization/CMakeLists.txt
index 6782e6b4d7330..a5cc1ed83af49 100644
--- a/clang/unittests/Serialization/CMakeLists.txt
+++ b/clang/unittests/Serialization/CMakeLists.txt
@@ -2,6 +2,7 @@ add_clang_unittest(SerializationTests
   ForceCheckFileInputTest.cpp
   InMemoryModuleCacheTest.cpp
   ModuleCacheTest.cpp
+  NamespaceLookupTest.cpp
   NoCommentsTest.cpp
   PreambleInNamedModulesTest.cpp
   LoadSpecLazilyTest.cpp
diff --git a/clang/unittests/Serialization/NamespaceLookupTest.cpp b/clang/unittests/Serialization/NamespaceLookupTest.cpp
new file mode 100644
index 0000000000000..eefa4be9fbee5
--- /dev/null
+++ b/clang/unittests/Serialization/NamespaceLookupTest.cpp
@@ -0,0 +1,247 @@
+//== unittests/Serialization/NamespaceLookupOptimizationTest.cpp =======//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Driver/CreateInvocationFromArgs.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendAction.h"
+#include "clang/Frontend/FrontendActions.h"
+#include "clang/Parse/ParseAST.h"
+#include "clang/Serialization/ASTReader.h"
+#include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace clang;
+using namespace clang::tooling;
+
+namespace {
+
+class NamespaceLookupTest : public ::testing::Test {
+  void SetUp() override {
+    ASSERT_FALSE(
+        sys::fs::createUniqueDirectory("namespace-lookup-test", TestDir));
+  }
+
+  void TearDown() override { sys::fs::remove_directories(TestDir); }
+
+public:
+  SmallString<256> TestDir;
+
+  void addFile(StringRef Path, StringRef Contents) {
+    ASSERT_FALSE(sys::path::is_absolute(Path));
+
+    SmallString<256> AbsPath(TestDir);
+    sys::path::append(AbsPath, Path);
+
+    ASSERT_FALSE(
+        sys::fs::create_directories(llvm::sys::path::parent_path(AbsPath)));
+
+    std::error_code EC;
+    llvm::raw_fd_ostream OS(AbsPath, EC);
+    ASSERT_FALSE(EC);
+    OS << Contents;
+  }
+
+  std::string GenerateModuleInterface(StringRef ModuleName,
+                                      StringRef Contents) {
+    std::string FileName = llvm::Twine(ModuleName + ".cppm").str();
+    addFile(FileName, Contents);
+
+    IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS =
+        llvm::vfs::createPhysicalFileSystem();
+    DiagnosticOptions DiagOpts;
+    IntrusiveRefCntPtr<DiagnosticsEngine> Diags =
+        CompilerInstance::createDiagnostics(*VFS, DiagOpts);
+    CreateInvocationOptions CIOpts;
+    CIOpts.Diags = Diags;
+    CIOpts.VFS = VFS;
+
+    std::string CacheBMIPath =
+        llvm::Twine(TestDir + "/" + ModuleName + ".pcm").str();
+    std::string PrebuiltModulePath =
+        "-fprebuilt-module-path=" + TestDir.str().str();
+    const char *Args[] = {"clang++",
+                          "-std=c++20",
+                          "--precompile",
+                          PrebuiltModulePath.c_str(),
+                          "-working-directory",
+                          TestDir.c_str(),
+                          "-I",
+                          TestDir.c_str(),
+                          FileName.c_str(),
+                          "-o",
+                          CacheBMIPath.c_str()};
+    std::shared_ptr<CompilerInvocation> Invocation =
+        createInvocation(Args, CIOpts);
+    EXPECT_TRUE(Invocation);
+
+    CompilerInstance Instance(std::move(Invocation));
+    Instance.setDiagnostics(Diags);
+    Instance.getFrontendOpts().OutputFile = CacheBMIPath;
+    // Avoid memory leaks.
+    Instance.getFrontendOpts().DisableFree = false;
+    GenerateModuleInterfaceAction Action;
+    EXPECT_TRUE(Instance.ExecuteAction(Action));
+    EXPECT_FALSE(Diags->hasErrorOccurred());
+
+    return CacheBMIPath;
+  }
+};
+
+struct NamespaceLookupResult {
+  int NumLocalNamespaces = 0;
+  int NumExternalNamespaces = 0;
+};
+
+class NamespaceLookupConsumer : public ASTConsumer {
+  NamespaceLookupResult &Result;
+
+public:
+  explicit NamespaceLookupConsumer(NamespaceLookupResult &Result)
+      : Result(Result) {}
+
+  void HandleTranslationUnit(ASTContext &Context) override {
+    TranslationUnitDecl *TU = Context.getTranslationUnitDecl();
+    ASSERT_TRUE(TU);
+    ASTReader *Chain = dyn_cast_or_null<ASTReader>(Context.getExternalSource());
+    ASSERT_TRUE(Chain);
+    for (const Decl *D :
+         TU->lookup(DeclarationName(&Context.Idents.get("N")))) {
+      if (!isa<NamespaceDecl>(D))
+        continue;
+      if (!D->isFromASTFile()) {
+        ++Result.NumLocalNamespaces;
+      } else {
+        ++Result.NumExternalNamespaces;
+        EXPECT_EQ(D, Chain->getKeyDeclaration(D));
+      }
+    }
+  }
+};
+
+class NamespaceLookupAction : public ASTFrontendAction {
+  NamespaceLookupResult &Result;
+
+public:
+  explicit NamespaceLookupAction(NamespaceLookupResult &Result)
+      : Result(Result) {}
+
+  std::unique_ptr<ASTConsumer>
+  CreateASTConsumer(CompilerInstance &CI, StringRef /*Unused*/) override {
+    return std::make_unique<NamespaceLookupConsumer>(Result);
+  }
+};
+
+TEST_F(NamespaceLookupTest, ExternalNamespacesOnly) {
+  GenerateModuleInterface("M1", R"cpp(
+export module M1;
+namespace N {}
+  )cpp");
+  GenerateModuleInterface("M2", R"cpp(
+export module M2;
+namespace N {}
+  )cpp");
+  GenerateModuleInterface("M3", R"cpp(
+export module M3;
+namespace N {}
+  )cpp");
+  const char *test_file_contents = R"cpp(
+import M1;
+import M2;
+import M3;
+  )cpp";
+  std::string DepArg = "-fprebuilt-module-path=" + TestDir.str().str();
+  NamespaceLookupResult Result;
+  EXPECT_TRUE(runToolOnCodeWithArgs(
+      std::make_unique<NamespaceLookupAction>(Result), test_file_contents,
+      {
+          "-std=c++20",
+          DepArg.c_str(),
+          "-I",
+          TestDir.c_str(),
+      },
+      "main.cpp"));
+
+  EXPECT_EQ(0, Result.NumLocalNamespaces);
+  EXPECT_EQ(1, Result.NumExternalNamespaces);
+}
+
+TEST_F(NamespaceLookupTest, ExternalReplacedByLocal) {
+  GenerateModuleInterface("M1", R"cpp(
+export module M1;
+namespace N {}
+  )cpp");
+  GenerateModuleInterface("M2", R"cpp(
+export module M2;
+namespace N {}
+  )cpp");
+  GenerateModuleInterface("M3", R"cpp(
+export module M3;
+namespace N {}
+  )cpp");
+  const char *test_file_contents = R"cpp(
+import M1;
+import M2;
+import M3;
+
+namespace N {}
+  )cpp";
+  std::string DepArg = "-fprebuilt-module-path=" + TestDir.str().str();
+  NamespaceLookupResult Result;
+  EXPECT_TRUE(runToolOnCodeWithArgs(
+      std::make_unique<NamespaceLookupAction>(Result), test_file_contents,
+      {
+          "-std=c++20",
+          DepArg.c_str(),
+          "-I",
+          TestDir.c_str(),
+      },
+      "main.cpp"));
+
+  EXPECT_EQ(1, Result.NumLocalNamespaces);
+  EXPECT_EQ(0, Result.NumExternalNamespaces);
+}
+
+TEST_F(NamespaceLookupTest, LocalAndExternalInt...
[truncated]

@mpark mpark force-pushed the modules-perf-namespaces-2 branch from 3e484e7 to 28585e0 Compare January 21, 2026 22:09
@mpark mpark changed the title [C++20][Modules] Improve namespace look-up performance for modules. (Take 2) [C++20][Modules] Improve namespace look-up performance for modules. (Attempt #2) Jan 21, 2026
@rupprecht
Copy link
Collaborator

I can't say anything about the implementation, but I patched this in and all failures we saw last time still build w/ this. So correctness-wise, this should be fine.

I did see some longer compilation times on one file (151s -> 225s), but it's a pathologically large input (a generated file), so it's possible that was just a fluke, e.g. ran on a slow build machine.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpark mpark merged commit 688b01a into llvm:main Jan 22, 2026
11 checks passed
@mpark mpark deleted the modules-perf-namespaces-2 branch January 22, 2026 02:19
@llvm-ci
Copy link

llvm-ci commented Jan 22, 2026

LLVM Buildbot has detected a new failure on builder bolt-aarch64-ubuntu-clang running on bolt-worker-aarch64 while building clang at step 5 "build-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/128/builds/10261

Here is the relevant piece of the build log for the reference
Step 5 (build-clang-bolt) failure: build (failure)
...
53.835 [7/3/3373] Linking CXX static library lib/libclangExtractAPI.a
53.868 [7/2/3374] Linking CXX static library lib/libclangDriver.a
53.906 [6/2/3375] Linking CXX static library lib/libclangCrossTU.a
54.021 [5/2/3376] Linking CXX static library lib/libclangCodeGen.a
54.049 [5/1/3377] Linking CXX static library lib/libclangStaticAnalyzerCore.a
54.358 [4/1/3378] Linking CXX static library lib/libclangStaticAnalyzerCheckers.a
54.403 [3/1/3379] Linking CXX static library lib/libclangStaticAnalyzerFrontend.a
54.440 [2/1/3380] Linking CXX static library lib/libclangFrontendTool.a
55.288 [1/1/3381] Linking CXX executable bin/clang-23
201.856 [0/1/3382] Creating executable symlink bin/clang
FAILED: bin/clang 
/usr/bin/cmake -E cmake_symlink_executable bin/clang-23 bin/clang && cd /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver && /usr/bin/cmake -E create_symlink clang-23 /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/clang++ && cd /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver && /usr/bin/cmake -E create_symlink clang-23 /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/clang-cl && cd /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver && /usr/bin/cmake -E create_symlink clang-23 /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/clang-cpp && cd /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver && /usr/bin/python3 /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/llvm-project/clang/tools/driver/../../utils/perf-training/perf-helper.py bolt-optimize --method INSTRUMENT --input /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/clang-23 --instrumented-output /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/clang-bolt.inst --fdata /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver/../../utils/perf-training/prof.fdata --perf-training-binary-dir /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver/../../utils/perf-training --readelf /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/llvm-readobj --bolt /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/llvm-bolt --lit /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/llvm-lit --merge-fdata /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/merge-fdata
Running: /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/llvm-bolt /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/bin/clang-23 -o /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/./bin/clang-bolt.inst -instrument --instrumentation-file-append-pid --instrumentation-file=/home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver/../../utils/perf-training/prof.fdata
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: <unknown>
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x8e00000, offset 0x8e00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: PointerAuthCFIAnalyzer ran on 2 functions. Ignored 0 functions (0.00%) because of CFI inconsistencies
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 0 out of 158198 functions in the binary (0.0%) have non-empty execution profile
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 62038
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 155954
BOLT-INSTRUMENTER: Number of function descriptors: 155940
BOLT-INSTRUMENTER: Number of branch counters: 2053000
BOLT-INSTRUMENTER: Number of ST leaf node counters: 1044633
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 3097633
BOLT-INSTRUMENTER: Total size of counters: 24781064 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 16371869 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 229109660 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /home/buildbot/workspace/bolt-aarch64-ubuntu-clang/build/tools/clang/tools/driver/../../utils/perf-training/prof.fdata
BOLT-INFO: removed 20805 empty blocks
BOLT-INFO: merged 6 duplicate CFG edges
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 1489 stubs in the hot area and 0 stubs in the cold area. Shared 188365 times, iterated 3 times.
BOLT-INFO: rewritten pac-ret DWARF info in 2 out of 158501 functions (0.00%).
BOLT-INFO: padding code to 0x13800000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x165188d0
BOLT-INFO: clear procedure is 0x16517550
BOLT-INFO: patched build-id (flipped last bit)
BOLT-INFO: setting __bolt_runtime_start to 0x165188d0
BOLT-INFO: setting __bolt_runtime_fini to 0x16518964
BOLT-INFO: setting __hot_start to 0x9000000
BOLT-INFO: setting __hot_end to 0x13693d8c
BOLT-INFO: runtime library finalization was hooked via DT_FINI, set to 0x16518964
BOLT-INFO: runtime library initialization was hooked via ELF Header Entry Point, set to 0x165188d0

Harrish92 pushed a commit to Harrish92/llvm-project that referenced this pull request Jan 23, 2026
Harrish92 pushed a commit to Harrish92/llvm-project that referenced this pull request Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants