Skip to content

[clang][modules] Stop uniquing implicit modules via FileEntry#185765

Open
jansvoboda11 wants to merge 11 commits intollvm:mainfrom
jansvoboda11:no-file-entry-module-cache-1
Open

[clang][modules] Stop uniquing implicit modules via FileEntry#185765
jansvoboda11 wants to merge 11 commits intollvm:mainfrom
jansvoboda11:no-file-entry-module-cache-1

Conversation

@jansvoboda11
Copy link
Contributor

This PR changes how ModuleManager deduplicates module files.

Previously, ModuleManager used FileEntry for assigning unique identity to module files. This works fine for explicitly-built modules because they don't change during the lifetime of a single Clang instance. For implicitly-built modules however, there are two issues:

  1. The FileEntry objects are deduplicated by FileManager based on the inode number. Some file systems reuse inode numbers of previously removed files. Because implicitly-built module files are rapidly removed and created, this deduplication breaks and compilations may fail spuriously when inode numbers are recycled during the lifetime of a single Clang instance.
  2. The first thing ModuleManager does when loading a module file is consulting the FileManager and checking the file size and modification time match the expectation of the importer. This is done even when such module file already lives in the InMemoryModuleCache. This introduces racy behavior into the mechanism that explicitly tries to solve race conditions, and may lead into spurious compilation failures.

This PR identifies implicitly-built module files by a pair of DirectoryEntry of the module cache path and the path suffix <context-hash>/<module-name>-<module-map-path-hash>.pcm. This gives us canonicalization of the user-provided module cache path without turning to FileEntry for the PCM file. The path suffix is Clang-generated and is already canonical.

Some tests needed to be updated because the module cache path directory was also used as an include directory. This PR relies on not caching the non-existence of the module cache directory in the FileManager. When other parts of Clang are trying to look up the same path and cache its non-existence, things break. This is probably very specific to some of our tests and not how users are setting up their compilations.

@jansvoboda11 jansvoboda11 force-pushed the no-file-entry-module-cache-1 branch from e71185b to e558b32 Compare March 11, 2026 22:37
@jansvoboda11 jansvoboda11 marked this pull request as ready for review March 11, 2026 22:39
@llvmbot llvmbot added clang-tools-extra clangd clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 11, 2026
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2026

@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-tools-extra

Author: Jan Svoboda (jansvoboda11)

Changes

This PR changes how ModuleManager deduplicates module files.

Previously, ModuleManager used FileEntry for assigning unique identity to module files. This works fine for explicitly-built modules because they don't change during the lifetime of a single Clang instance. For implicitly-built modules however, there are two issues:

  1. The FileEntry objects are deduplicated by FileManager based on the inode number. Some file systems reuse inode numbers of previously removed files. Because implicitly-built module files are rapidly removed and created, this deduplication breaks and compilations may fail spuriously when inode numbers are recycled during the lifetime of a single Clang instance.
  2. The first thing ModuleManager does when loading a module file is consulting the FileManager and checking the file size and modification time match the expectation of the importer. This is done even when such module file already lives in the InMemoryModuleCache. This introduces racy behavior into the mechanism that explicitly tries to solve race conditions, and may lead into spurious compilation failures.

This PR identifies implicitly-built module files by a pair of DirectoryEntry of the module cache path and the path suffix &lt;context-hash&gt;/&lt;module-name&gt;-&lt;module-map-path-hash&gt;.pcm. This gives us canonicalization of the user-provided module cache path without turning to FileEntry for the PCM file. The path suffix is Clang-generated and is already canonical.

Some tests needed to be updated because the module cache path directory was also used as an include directory. This PR relies on not caching the non-existence of the module cache directory in the FileManager. When other parts of Clang are trying to look up the same path and cache its non-existence, things break. This is probably very specific to some of our tests and not how users are setting up their compilations.


Patch is 84.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185765.diff

34 Files Affected:

  • (modified) clang-tools-extra/clangd/ModulesBuilder.cpp (+2-2)
  • (modified) clang/include/clang/Basic/Module.h (+114)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+2-6)
  • (modified) clang/include/clang/Lex/HeaderSearch.h (+27-18)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+1-1)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+5-8)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (+3)
  • (modified) clang/include/clang/Serialization/ModuleManager.h (+16-28)
  • (modified) clang/lib/Basic/Module.cpp (+17)
  • (modified) clang/lib/DependencyScanning/DependencyScannerImpl.cpp (+2-3)
  • (modified) clang/lib/Frontend/ASTUnit.cpp (+14-14)
  • (modified) clang/lib/Frontend/ChainedIncludesSource.cpp (+2-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+26-39)
  • (modified) clang/lib/Frontend/FrontendAction.cpp (+5-2)
  • (modified) clang/lib/Frontend/FrontendActions.cpp (+15-11)
  • (modified) clang/lib/Lex/HeaderSearch.cpp (+48-17)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+56-49)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+14-8)
  • (modified) clang/lib/Serialization/GlobalModuleIndex.cpp (+1)
  • (modified) clang/lib/Serialization/ModuleManager.cpp (+114-97)
  • (modified) clang/test/Modules/DebugInfoNamespace.cpp (+1-1)
  • (modified) clang/test/Modules/ExtDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/ExtDebugInfo.m (+1-1)
  • (modified) clang/test/Modules/ModuleDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/ModuleDebugInfo.m (+1-1)
  • (modified) clang/test/Modules/ModuleModuleDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/cxx20-hu-04.cpp (+1-1)
  • (modified) clang/test/Modules/debug-info-moduleimport.m (+4-4)
  • (modified) clang/test/Modules/global_index.m (+5-5)
  • (modified) clang/test/Modules/load-after-failure.m (+2-2)
  • (modified) clang/test/Modules/module-debuginfo-compdir.m (+1-1)
  • (modified) clang/test/Modules/module-debuginfo-prefix.m (+1-1)
  • (modified) clang/test/Modules/relocatable-modules.cpp (+3-5)
  • (modified) clang/unittests/Serialization/ForceCheckFileInputTest.cpp (+3-3)
diff --git a/clang-tools-extra/clangd/ModulesBuilder.cpp b/clang-tools-extra/clangd/ModulesBuilder.cpp
index 524ec620c4076..b5a0127d48b7e 100644
--- a/clang-tools-extra/clangd/ModulesBuilder.cpp
+++ b/clang-tools-extra/clangd/ModulesBuilder.cpp
@@ -275,8 +275,8 @@ bool IsModuleFileUpToDate(PathRef ModuleFilePath,
   // listener.
   Reader.setListener(nullptr);
 
-  if (Reader.ReadAST(ModuleFilePath, serialization::MK_MainFile,
-                     SourceLocation(),
+  if (Reader.ReadAST(ModuleFileName::make_explicit(ModuleFilePath),
+                     serialization::MK_MainFile, SourceLocation(),
                      ASTReader::ARR_None) != ASTReader::Success)
     return false;
 
diff --git a/clang/include/clang/Basic/Module.h b/clang/include/clang/Basic/Module.h
index 69a1de6f79b35..c512249e5e945 100644
--- a/clang/include/clang/Basic/Module.h
+++ b/clang/include/clang/Basic/Module.h
@@ -54,6 +54,101 @@ class TargetInfo;
 /// Describes the name of a module.
 using ModuleId = SmallVector<std::pair<std::string, SourceLocation>, 2>;
 
+namespace serialization {
+class ModuleManager;
+} // namespace serialization
+
+/// Deduplication key for a loaded module file in \c ModuleManager.
+///
+/// For implicitly-built modules, this is the \c DirectoryEntry of the module
+/// cache and the module file name with the (optional) context hash.
+/// This enables using \c FileManager's inode-based canonicalization of the
+/// user-provided module cache path without hitting issues on file systems that
+/// recycle inodes for recompiled module files.
+///
+/// For explicitly-built modules, this is \c FileEntry.
+/// This uses \c FileManager's inode-based canonicalization of the user-provided
+/// module file path. Because input explicitly-built modules do not change
+/// during the lifetime of the compiler, inode recycling is not of concern here.
+class ModuleFileKey {
+  /// The FileManager entity used for deduplication.
+  const void *Ptr;
+  /// The path relative to the module cache path for implicit module file, empty
+  /// for other kinds of module files.
+  std::string PathSuffix;
+
+  friend class serialization::ModuleManager;
+  friend class ModuleFileName;
+  friend llvm::DenseMapInfo<ModuleFileKey>;
+
+  ModuleFileKey(const void *Ptr) : Ptr(Ptr) {}
+
+  ModuleFileKey(const FileEntry *ModuleFile) : Ptr(ModuleFile) {}
+
+  ModuleFileKey(const DirectoryEntry *ModuleCacheDir, StringRef PathSuffix)
+      : Ptr(ModuleCacheDir), PathSuffix(PathSuffix) {}
+
+public:
+  bool operator==(const ModuleFileKey &Other) const {
+    return Ptr == Other.Ptr && PathSuffix == Other.PathSuffix;
+  }
+
+  bool operator!=(const ModuleFileKey &Other) const {
+    return !operator==(Other);
+  }
+};
+
+/// Identifies a module file to be loaded.
+///
+/// For implicitly-built module files, the path is split into the module cache
+/// path and the module file name with the (optional) context hash. For all
+/// other types of module files, this is just the file system path.
+class ModuleFileName {
+  std::string Path;
+  std::optional<uint32_t> Separator;
+
+public:
+  /// Creates an empty module file name.
+  ModuleFileName() = default;
+
+  /// Creates a file name for an explicit module.
+  static ModuleFileName make_explicit(std::string Name) {
+    ModuleFileName File;
+    File.Path = std::move(Name);
+    return File;
+  }
+
+  /// Creates a file name for an explicit module.
+  static ModuleFileName make_explicit(StringRef Name) {
+    return make_explicit(Name.str());
+  }
+
+  /// Creates a file name for an implicit module.
+  static ModuleFileName make_implicit(std::string Name, uint32_t Separator) {
+    ModuleFileName File;
+    File.Path = std::move(Name);
+    File.Separator = Separator;
+    return File;
+  }
+
+  /// Creates a file name for an implicit module.
+  static ModuleFileName make_implicit(StringRef Name, uint32_t Separator) {
+    return make_implicit(Name.str(), Separator);
+  }
+
+  /// Returns the plain module file name.
+  StringRef str() const { return Path; }
+
+  /// Converts to StringRef representing the plain module file name.
+  operator StringRef() const { return Path; }
+
+  /// Checks whether the module file name is empty.
+  bool empty() const { return Path.empty(); }
+
+  /// Creates the deduplication key for use in \c ModuleManager.
+  std::optional<ModuleFileKey> makeKey(FileManager &FileMgr) const;
+};
+
 /// The signature of a module, which is a hash of the AST content.
 struct ASTFileSignature : std::array<uint8_t, 20> {
   using BaseT = std::array<uint8_t, 20>;
@@ -926,4 +1021,23 @@ class VisibleModuleSet {
 
 } // namespace clang
 
+template <> struct llvm::DenseMapInfo<clang::ModuleFileKey> {
+  static clang::ModuleFileKey getEmptyKey() {
+    return DenseMapInfo<const void *>::getEmptyKey();
+  }
+
+  static clang::ModuleFileKey getTombstoneKey() {
+    return DenseMapInfo<const void *>::getTombstoneKey();
+  }
+
+  static unsigned getHashValue(const clang::ModuleFileKey &Val) {
+    return hash_combine(Val.Ptr, Val.PathSuffix);
+  }
+
+  static bool isEqual(const clang::ModuleFileKey &LHS,
+                      const clang::ModuleFileKey &RHS) {
+    return LHS == RHS;
+  }
+};
+
 #endif // LLVM_CLANG_BASIC_MODULE_H
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 266e0826b38f4..0d684d5c7f9fe 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -17,6 +17,7 @@
 #include "clang/Frontend/PCHContainerOperations.h"
 #include "clang/Frontend/Utils.h"
 #include "clang/Lex/DependencyDirectivesScanner.h"
+#include "clang/Lex/HeaderSearch.h"
 #include "clang/Lex/HeaderSearchOptions.h"
 #include "clang/Lex/ModuleLoader.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -746,11 +747,6 @@ class CompilerInstance : public ModuleLoader {
     GetDependencyDirectives = std::move(Getter);
   }
 
-  std::string getSpecificModuleCachePath(StringRef ContextHash);
-  std::string getSpecificModuleCachePath() {
-    return getSpecificModuleCachePath(getInvocation().computeContextHash());
-  }
-
   /// Create the AST context.
   void createASTContext();
 
@@ -872,7 +868,7 @@ class CompilerInstance : public ModuleLoader {
 
   void createASTReader();
 
-  bool loadModuleFile(StringRef FileName,
+  bool loadModuleFile(ModuleFileName FileName,
                       serialization::ModuleFile *&LoadedModuleFile);
 
   /// Configuration object for making the result of \c cloneForModuleCompile()
diff --git a/clang/include/clang/Lex/HeaderSearch.h b/clang/include/clang/Lex/HeaderSearch.h
index d7000da682c6e..fd52a0ad5b007 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -282,6 +282,10 @@ class HeaderSearch {
   /// The specific module cache path containing ContextHash (unless suppressed).
   std::string SpecificModuleCachePath;
 
+  /// The length of the normalized module cache path at the start of \c
+  /// SpecificModuleCachePath.
+  size_t NormalizedModuleCachePathLen = 0;
+
   /// All of the preprocessor-specific data about files that are
   /// included, indexed by the FileEntry's UID.
   mutable std::vector<HeaderFileInfo> FileInfo;
@@ -467,20 +471,20 @@ class HeaderSearch {
     return {};
   }
 
-  /// Set the context hash to use for module cache paths.
-  void setContextHash(StringRef Hash) { ContextHash = std::string(Hash); }
+  /// Initialize the module cache path.
+  void initializeModuleCachePath(std::string ContextHash);
 
-  /// Set the module cache path with the context hash (unless suppressed).
-  void setSpecificModuleCachePath(StringRef Path) {
-    SpecificModuleCachePath = std::string(Path);
+  /// Retrieve the module cache path with the context hash (unless suppressed).
+  StringRef getSpecificModuleCachePath() const {
+    return SpecificModuleCachePath;
   }
 
   /// Retrieve the context hash.
   StringRef getContextHash() const { return ContextHash; }
 
-  /// Retrieve the module cache path with the context hash (unless suppressed).
-  StringRef getSpecificModuleCachePath() const {
-    return SpecificModuleCachePath;
+  /// Retrieve the normalized module cache path.
+  StringRef getNormalizedModuleCachePath() const {
+    return getSpecificModuleCachePath().substr(0, NormalizedModuleCachePathLen);
   }
 
   /// Forget everything we know about headers so far.
@@ -657,7 +661,7 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileName(Module *Module);
+  ModuleFileName getCachedModuleFileName(Module *Module);
 
   /// Retrieve the name of the prebuilt module file that should be used
   /// to load a module with the given name.
@@ -669,8 +673,8 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getPrebuiltModuleFileName(StringRef ModuleName,
-                                        bool FileMapOnly = false);
+  ModuleFileName getPrebuiltModuleFileName(StringRef ModuleName,
+                                           bool FileMapOnly = false);
 
   /// Retrieve the name of the prebuilt module file that should be used
   /// to load the given module.
@@ -679,7 +683,7 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getPrebuiltImplicitModuleFileName(Module *Module);
+  ModuleFileName getPrebuiltImplicitModuleFileName(Module *Module);
 
   /// Retrieve the name of the (to-be-)cached module file that should
   /// be used to load a module with the given name.
@@ -691,8 +695,8 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileName(StringRef ModuleName,
-                                      StringRef ModuleMapPath);
+  ModuleFileName getCachedModuleFileName(StringRef ModuleName,
+                                         StringRef ModuleMapPath);
 
   /// Lookup a module Search for a module with the given name.
   ///
@@ -808,13 +812,13 @@ class HeaderSearch {
   /// \param ModuleMapPath A path that when combined with \c ModuleName
   /// uniquely identifies this module. See Module::ModuleMap.
   ///
-  /// \param CachePath A path to the module cache.
+  /// \param NormalizedCachePath The normalized path to the module cache.
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileNameImpl(StringRef ModuleName,
-                                          StringRef ModuleMapPath,
-                                          StringRef CachePath);
+  ModuleFileName getCachedModuleFileNameImpl(StringRef ModuleName,
+                                             StringRef ModuleMapPath,
+                                             StringRef NormalizedCachePath);
 
   /// Retrieve a module with the given name, which may be part of the
   /// given framework.
@@ -1042,6 +1046,11 @@ void ApplyHeaderSearchOptions(HeaderSearch &HS,
 void normalizeModuleCachePath(FileManager &FileMgr, StringRef Path,
                               SmallVectorImpl<char> &NormalizedPath);
 
+std::string createSpecificModuleCachePath(FileManager &FileMgr,
+                                          StringRef ModuleCachePath,
+                                          bool DisableModuleHash,
+                                          std::string ContextHash);
+
 } // namespace clang
 
 #endif // LLVM_CLANG_LEX_HEADERSEARCH_H
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 752e7fd288aa6..4e8fe1d32d42e 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -44,7 +44,7 @@ namespace serialization {
 /// Version 4 of AST files also requires that the version control branch and
 /// revision match exactly, since there is no backward compatibility of
 /// AST files at this time.
-const unsigned VERSION_MAJOR = 35;
+const unsigned VERSION_MAJOR = 37;
 
 /// AST file minor version number supported by this version of
 /// Clang.
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index f254459ce933d..e9706d0ea2f2b 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -185,8 +185,7 @@ class ASTReaderListener {
   /// otherwise.
   virtual bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
                                        StringRef ModuleFilename,
-                                       StringRef SpecificModuleCachePath,
-                                       bool Complain) {
+                                       StringRef ContextHash, bool Complain) {
     return false;
   }
 
@@ -304,8 +303,7 @@ class ChainedASTReaderListener : public ASTReaderListener {
                              bool Complain) override;
 
   bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
-                               StringRef ModuleFilename,
-                               StringRef SpecificModuleCachePath,
+                               StringRef ModuleFilename, StringRef ContextHash,
                                bool Complain) override;
   bool ReadPreprocessorOptions(const PreprocessorOptions &PPOpts,
                                StringRef ModuleFilename, bool ReadMacros,
@@ -349,8 +347,7 @@ class PCHValidator : public ASTReaderListener {
                                bool Complain,
                                std::string &SuggestedPredefines) override;
   bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
-                               StringRef ModuleFilename,
-                               StringRef SpecificModuleCachePath,
+                               StringRef ModuleFilename, StringRef ContextHash,
                                bool Complain) override;
   void ReadCounter(const serialization::ModuleFile &M, uint32_t Value) override;
 };
@@ -1552,7 +1549,7 @@ class ASTReader
         : Mod(Mod), ImportedBy(ImportedBy), ImportLoc(ImportLoc) {}
   };
 
-  ASTReadResult ReadASTCore(StringRef FileName, ModuleKind Type,
+  ASTReadResult ReadASTCore(ModuleFileName FileName, ModuleKind Type,
                             SourceLocation ImportLoc, ModuleFile *ImportedBy,
                             SmallVectorImpl<ImportedModule> &Loaded,
                             off_t ExpectedSize, time_t ExpectedModTime,
@@ -1888,7 +1885,7 @@ class ASTReader
   /// NewLoadedModuleFile would refer to the address of the new loaded top level
   /// module. The state of NewLoadedModuleFile is unspecified if the AST file
   /// isn't loaded successfully.
-  ASTReadResult ReadAST(StringRef FileName, ModuleKind Type,
+  ASTReadResult ReadAST(ModuleFileName FileName, ModuleKind Type,
                         SourceLocation ImportLoc,
                         unsigned ClientLoadCapabilities,
                         ModuleFile **NewLoadedModuleFile = nullptr);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 519a74d920129..4915ad0634fcd 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -159,6 +159,9 @@ class ModuleFile {
   /// The file name of the module file.
   std::string FileName;
 
+  /// All keys ModuleManager used for the module file.
+  llvm::DenseSet<ModuleFileKey> Keys;
+
   /// The name of the module.
   std::string ModuleName;
 
diff --git a/clang/include/clang/Serialization/ModuleManager.h b/clang/include/clang/Serialization/ModuleManager.h
index 8ab70b6630f47..162856f2f14c0 100644
--- a/clang/include/clang/Serialization/ModuleManager.h
+++ b/clang/include/clang/Serialization/ModuleManager.h
@@ -16,6 +16,7 @@
 
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Lex/HeaderSearch.h"
 #include "clang/Serialization/ModuleFile.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
@@ -56,8 +57,8 @@ class ModuleManager {
   // to implement short-circuiting logic when running DFS over the dependencies.
   SmallVector<ModuleFile *, 2> Roots;
 
-  /// All loaded modules, indexed by name.
-  llvm::DenseMap<const FileEntry *, ModuleFile *> Modules;
+  /// All loaded modules.
+  llvm::DenseMap<ModuleFileKey, ModuleFile *> Modules;
 
   /// FileManager that handles translating between filenames and
   /// FileEntry *.
@@ -172,14 +173,22 @@ class ModuleManager {
   ModuleFile &operator[](unsigned Index) const { return *Chain[Index]; }
 
   /// Returns the module associated with the given file name.
+  /// Soon to be removed.
   ModuleFile *lookupByFileName(StringRef FileName) const;
 
   /// Returns the module associated with the given module name.
   ModuleFile *lookupByModuleName(StringRef ModName) const;
 
   /// Returns the module associated with the given module file.
+  /// Soon to be removed.
   ModuleFile *lookup(const FileEntry *File) const;
 
+  /// Returns the module associated with the given module file name.
+  ModuleFile *lookupByFileName(ModuleFileName FileName) const;
+
+  /// Returns the module associated with the given module file key.
+  ModuleFile *lookup(ModuleFileKey Key) const;
+
   /// Returns the in-memory (virtual file) buffer with the given name
   std::unique_ptr<llvm::MemoryBuffer> lookupBuffer(StringRef Name);
 
@@ -237,14 +246,13 @@ class ModuleManager {
   ///
   /// \return A pointer to the module that corresponds to this file name,
   /// and a value indicating whether the module was loaded.
-  AddModuleResult addModule(StringRef FileName, ModuleKind Type,
-                            SourceLocation ImportLoc,
-                            ModuleFile *ImportedBy, unsigned Generation,
-                            off_t ExpectedSize, time_t ExpectedModTime,
+  AddModuleResult addModule(ModuleFileName FileName, ModuleKind Type,
+                            SourceLocation ImportLoc, ModuleFile *ImportedBy,
+                            unsigned Generation, off_t ExpectedSize,
+                            time_t ExpectedModTime,
                             ASTFileSignature ExpectedSignature,
                             ASTFileSignatureReader ReadSignature,
-                            ModuleFile *&Module,
-                            std::string &ErrorStr);
+                            ModuleFile *&Module, std::string &ErrorStr);
 
   /// Remove the modules starting from First (to the end).
   void removeModules(ModuleIterator First);
@@ -282,26 +290,6 @@ class ModuleManager {
   void visit(llvm::function_ref<bool(ModuleFile &M)> Visitor,
              llvm::SmallPtrSetImpl<ModuleFile *> *ModuleFilesHit = nullptr);
 
-  /// Attempt to resolve the given module file name to a file entry.
-  ///
-  /// \param FileName The name of the module file.
-  ///
-  /// \param ExpectedSize The size that the module file is expected to have.
-  /// If the actual size differs, the resolver should return \c true.
-  ///
-  /// \param ExpectedModTime The modification time that the module file is
-  /// expected to have. If the actual modification time differs, the resolver
-  /// should return \c true.
-  ///
-  /// \param File Will be set to the file if there is one, or null
-  /// otherwise.
-  ///
-  /// \returns True if a file exists but does not meet the size/
-  /// modification time criteria, false if the file is either available and
-  /// suitable, or is missing.
-  bool lookupModuleFile(StringRef FileName, off_t ExpectedSize,
-                        time_t ExpectedModTime, OptionalFileEntryRef &File);
-
   /// View the graphviz representation of the module graph.
   void viewGraph();
 
diff --git a/clang/lib/Basic/Module.cpp b/clang/lib/Basic/Module.cpp
index 97f742d292224..bfc29e32c3672 100644
--- a/clang/lib/Basic/Module.cpp
+++ b/clang/lib/Basic/Module.cpp
@@ -33,6 +33,23 @@
 
 using namespace clang;
 
+s...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2026

@llvm/pr-subscribers-clang-modules

Author: Jan Svoboda (jansvoboda11)

Changes

This PR changes how ModuleManager deduplicates module files.

Previously, ModuleManager used FileEntry for assigning unique identity to module files. This works fine for explicitly-built modules because they don't change during the lifetime of a single Clang instance. For implicitly-built modules however, there are two issues:

  1. The FileEntry objects are deduplicated by FileManager based on the inode number. Some file systems reuse inode numbers of previously removed files. Because implicitly-built module files are rapidly removed and created, this deduplication breaks and compilations may fail spuriously when inode numbers are recycled during the lifetime of a single Clang instance.
  2. The first thing ModuleManager does when loading a module file is consulting the FileManager and checking the file size and modification time match the expectation of the importer. This is done even when such module file already lives in the InMemoryModuleCache. This introduces racy behavior into the mechanism that explicitly tries to solve race conditions, and may lead into spurious compilation failures.

This PR identifies implicitly-built module files by a pair of DirectoryEntry of the module cache path and the path suffix &lt;context-hash&gt;/&lt;module-name&gt;-&lt;module-map-path-hash&gt;.pcm. This gives us canonicalization of the user-provided module cache path without turning to FileEntry for the PCM file. The path suffix is Clang-generated and is already canonical.

Some tests needed to be updated because the module cache path directory was also used as an include directory. This PR relies on not caching the non-existence of the module cache directory in the FileManager. When other parts of Clang are trying to look up the same path and cache its non-existence, things break. This is probably very specific to some of our tests and not how users are setting up their compilations.


Patch is 84.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/185765.diff

34 Files Affected:

  • (modified) clang-tools-extra/clangd/ModulesBuilder.cpp (+2-2)
  • (modified) clang/include/clang/Basic/Module.h (+114)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+2-6)
  • (modified) clang/include/clang/Lex/HeaderSearch.h (+27-18)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+1-1)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+5-8)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (+3)
  • (modified) clang/include/clang/Serialization/ModuleManager.h (+16-28)
  • (modified) clang/lib/Basic/Module.cpp (+17)
  • (modified) clang/lib/DependencyScanning/DependencyScannerImpl.cpp (+2-3)
  • (modified) clang/lib/Frontend/ASTUnit.cpp (+14-14)
  • (modified) clang/lib/Frontend/ChainedIncludesSource.cpp (+2-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+26-39)
  • (modified) clang/lib/Frontend/FrontendAction.cpp (+5-2)
  • (modified) clang/lib/Frontend/FrontendActions.cpp (+15-11)
  • (modified) clang/lib/Lex/HeaderSearch.cpp (+48-17)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+56-49)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+14-8)
  • (modified) clang/lib/Serialization/GlobalModuleIndex.cpp (+1)
  • (modified) clang/lib/Serialization/ModuleManager.cpp (+114-97)
  • (modified) clang/test/Modules/DebugInfoNamespace.cpp (+1-1)
  • (modified) clang/test/Modules/ExtDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/ExtDebugInfo.m (+1-1)
  • (modified) clang/test/Modules/ModuleDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/ModuleDebugInfo.m (+1-1)
  • (modified) clang/test/Modules/ModuleModuleDebugInfo.cpp (+1-1)
  • (modified) clang/test/Modules/cxx20-hu-04.cpp (+1-1)
  • (modified) clang/test/Modules/debug-info-moduleimport.m (+4-4)
  • (modified) clang/test/Modules/global_index.m (+5-5)
  • (modified) clang/test/Modules/load-after-failure.m (+2-2)
  • (modified) clang/test/Modules/module-debuginfo-compdir.m (+1-1)
  • (modified) clang/test/Modules/module-debuginfo-prefix.m (+1-1)
  • (modified) clang/test/Modules/relocatable-modules.cpp (+3-5)
  • (modified) clang/unittests/Serialization/ForceCheckFileInputTest.cpp (+3-3)
diff --git a/clang-tools-extra/clangd/ModulesBuilder.cpp b/clang-tools-extra/clangd/ModulesBuilder.cpp
index 524ec620c4076..b5a0127d48b7e 100644
--- a/clang-tools-extra/clangd/ModulesBuilder.cpp
+++ b/clang-tools-extra/clangd/ModulesBuilder.cpp
@@ -275,8 +275,8 @@ bool IsModuleFileUpToDate(PathRef ModuleFilePath,
   // listener.
   Reader.setListener(nullptr);
 
-  if (Reader.ReadAST(ModuleFilePath, serialization::MK_MainFile,
-                     SourceLocation(),
+  if (Reader.ReadAST(ModuleFileName::make_explicit(ModuleFilePath),
+                     serialization::MK_MainFile, SourceLocation(),
                      ASTReader::ARR_None) != ASTReader::Success)
     return false;
 
diff --git a/clang/include/clang/Basic/Module.h b/clang/include/clang/Basic/Module.h
index 69a1de6f79b35..c512249e5e945 100644
--- a/clang/include/clang/Basic/Module.h
+++ b/clang/include/clang/Basic/Module.h
@@ -54,6 +54,101 @@ class TargetInfo;
 /// Describes the name of a module.
 using ModuleId = SmallVector<std::pair<std::string, SourceLocation>, 2>;
 
+namespace serialization {
+class ModuleManager;
+} // namespace serialization
+
+/// Deduplication key for a loaded module file in \c ModuleManager.
+///
+/// For implicitly-built modules, this is the \c DirectoryEntry of the module
+/// cache and the module file name with the (optional) context hash.
+/// This enables using \c FileManager's inode-based canonicalization of the
+/// user-provided module cache path without hitting issues on file systems that
+/// recycle inodes for recompiled module files.
+///
+/// For explicitly-built modules, this is \c FileEntry.
+/// This uses \c FileManager's inode-based canonicalization of the user-provided
+/// module file path. Because input explicitly-built modules do not change
+/// during the lifetime of the compiler, inode recycling is not of concern here.
+class ModuleFileKey {
+  /// The FileManager entity used for deduplication.
+  const void *Ptr;
+  /// The path relative to the module cache path for implicit module file, empty
+  /// for other kinds of module files.
+  std::string PathSuffix;
+
+  friend class serialization::ModuleManager;
+  friend class ModuleFileName;
+  friend llvm::DenseMapInfo<ModuleFileKey>;
+
+  ModuleFileKey(const void *Ptr) : Ptr(Ptr) {}
+
+  ModuleFileKey(const FileEntry *ModuleFile) : Ptr(ModuleFile) {}
+
+  ModuleFileKey(const DirectoryEntry *ModuleCacheDir, StringRef PathSuffix)
+      : Ptr(ModuleCacheDir), PathSuffix(PathSuffix) {}
+
+public:
+  bool operator==(const ModuleFileKey &Other) const {
+    return Ptr == Other.Ptr && PathSuffix == Other.PathSuffix;
+  }
+
+  bool operator!=(const ModuleFileKey &Other) const {
+    return !operator==(Other);
+  }
+};
+
+/// Identifies a module file to be loaded.
+///
+/// For implicitly-built module files, the path is split into the module cache
+/// path and the module file name with the (optional) context hash. For all
+/// other types of module files, this is just the file system path.
+class ModuleFileName {
+  std::string Path;
+  std::optional<uint32_t> Separator;
+
+public:
+  /// Creates an empty module file name.
+  ModuleFileName() = default;
+
+  /// Creates a file name for an explicit module.
+  static ModuleFileName make_explicit(std::string Name) {
+    ModuleFileName File;
+    File.Path = std::move(Name);
+    return File;
+  }
+
+  /// Creates a file name for an explicit module.
+  static ModuleFileName make_explicit(StringRef Name) {
+    return make_explicit(Name.str());
+  }
+
+  /// Creates a file name for an implicit module.
+  static ModuleFileName make_implicit(std::string Name, uint32_t Separator) {
+    ModuleFileName File;
+    File.Path = std::move(Name);
+    File.Separator = Separator;
+    return File;
+  }
+
+  /// Creates a file name for an implicit module.
+  static ModuleFileName make_implicit(StringRef Name, uint32_t Separator) {
+    return make_implicit(Name.str(), Separator);
+  }
+
+  /// Returns the plain module file name.
+  StringRef str() const { return Path; }
+
+  /// Converts to StringRef representing the plain module file name.
+  operator StringRef() const { return Path; }
+
+  /// Checks whether the module file name is empty.
+  bool empty() const { return Path.empty(); }
+
+  /// Creates the deduplication key for use in \c ModuleManager.
+  std::optional<ModuleFileKey> makeKey(FileManager &FileMgr) const;
+};
+
 /// The signature of a module, which is a hash of the AST content.
 struct ASTFileSignature : std::array<uint8_t, 20> {
   using BaseT = std::array<uint8_t, 20>;
@@ -926,4 +1021,23 @@ class VisibleModuleSet {
 
 } // namespace clang
 
+template <> struct llvm::DenseMapInfo<clang::ModuleFileKey> {
+  static clang::ModuleFileKey getEmptyKey() {
+    return DenseMapInfo<const void *>::getEmptyKey();
+  }
+
+  static clang::ModuleFileKey getTombstoneKey() {
+    return DenseMapInfo<const void *>::getTombstoneKey();
+  }
+
+  static unsigned getHashValue(const clang::ModuleFileKey &Val) {
+    return hash_combine(Val.Ptr, Val.PathSuffix);
+  }
+
+  static bool isEqual(const clang::ModuleFileKey &LHS,
+                      const clang::ModuleFileKey &RHS) {
+    return LHS == RHS;
+  }
+};
+
 #endif // LLVM_CLANG_BASIC_MODULE_H
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 266e0826b38f4..0d684d5c7f9fe 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -17,6 +17,7 @@
 #include "clang/Frontend/PCHContainerOperations.h"
 #include "clang/Frontend/Utils.h"
 #include "clang/Lex/DependencyDirectivesScanner.h"
+#include "clang/Lex/HeaderSearch.h"
 #include "clang/Lex/HeaderSearchOptions.h"
 #include "clang/Lex/ModuleLoader.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -746,11 +747,6 @@ class CompilerInstance : public ModuleLoader {
     GetDependencyDirectives = std::move(Getter);
   }
 
-  std::string getSpecificModuleCachePath(StringRef ContextHash);
-  std::string getSpecificModuleCachePath() {
-    return getSpecificModuleCachePath(getInvocation().computeContextHash());
-  }
-
   /// Create the AST context.
   void createASTContext();
 
@@ -872,7 +868,7 @@ class CompilerInstance : public ModuleLoader {
 
   void createASTReader();
 
-  bool loadModuleFile(StringRef FileName,
+  bool loadModuleFile(ModuleFileName FileName,
                       serialization::ModuleFile *&LoadedModuleFile);
 
   /// Configuration object for making the result of \c cloneForModuleCompile()
diff --git a/clang/include/clang/Lex/HeaderSearch.h b/clang/include/clang/Lex/HeaderSearch.h
index d7000da682c6e..fd52a0ad5b007 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -282,6 +282,10 @@ class HeaderSearch {
   /// The specific module cache path containing ContextHash (unless suppressed).
   std::string SpecificModuleCachePath;
 
+  /// The length of the normalized module cache path at the start of \c
+  /// SpecificModuleCachePath.
+  size_t NormalizedModuleCachePathLen = 0;
+
   /// All of the preprocessor-specific data about files that are
   /// included, indexed by the FileEntry's UID.
   mutable std::vector<HeaderFileInfo> FileInfo;
@@ -467,20 +471,20 @@ class HeaderSearch {
     return {};
   }
 
-  /// Set the context hash to use for module cache paths.
-  void setContextHash(StringRef Hash) { ContextHash = std::string(Hash); }
+  /// Initialize the module cache path.
+  void initializeModuleCachePath(std::string ContextHash);
 
-  /// Set the module cache path with the context hash (unless suppressed).
-  void setSpecificModuleCachePath(StringRef Path) {
-    SpecificModuleCachePath = std::string(Path);
+  /// Retrieve the module cache path with the context hash (unless suppressed).
+  StringRef getSpecificModuleCachePath() const {
+    return SpecificModuleCachePath;
   }
 
   /// Retrieve the context hash.
   StringRef getContextHash() const { return ContextHash; }
 
-  /// Retrieve the module cache path with the context hash (unless suppressed).
-  StringRef getSpecificModuleCachePath() const {
-    return SpecificModuleCachePath;
+  /// Retrieve the normalized module cache path.
+  StringRef getNormalizedModuleCachePath() const {
+    return getSpecificModuleCachePath().substr(0, NormalizedModuleCachePathLen);
   }
 
   /// Forget everything we know about headers so far.
@@ -657,7 +661,7 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileName(Module *Module);
+  ModuleFileName getCachedModuleFileName(Module *Module);
 
   /// Retrieve the name of the prebuilt module file that should be used
   /// to load a module with the given name.
@@ -669,8 +673,8 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getPrebuiltModuleFileName(StringRef ModuleName,
-                                        bool FileMapOnly = false);
+  ModuleFileName getPrebuiltModuleFileName(StringRef ModuleName,
+                                           bool FileMapOnly = false);
 
   /// Retrieve the name of the prebuilt module file that should be used
   /// to load the given module.
@@ -679,7 +683,7 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getPrebuiltImplicitModuleFileName(Module *Module);
+  ModuleFileName getPrebuiltImplicitModuleFileName(Module *Module);
 
   /// Retrieve the name of the (to-be-)cached module file that should
   /// be used to load a module with the given name.
@@ -691,8 +695,8 @@ class HeaderSearch {
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileName(StringRef ModuleName,
-                                      StringRef ModuleMapPath);
+  ModuleFileName getCachedModuleFileName(StringRef ModuleName,
+                                         StringRef ModuleMapPath);
 
   /// Lookup a module Search for a module with the given name.
   ///
@@ -808,13 +812,13 @@ class HeaderSearch {
   /// \param ModuleMapPath A path that when combined with \c ModuleName
   /// uniquely identifies this module. See Module::ModuleMap.
   ///
-  /// \param CachePath A path to the module cache.
+  /// \param NormalizedCachePath The normalized path to the module cache.
   ///
   /// \returns The name of the module file that corresponds to this module,
   /// or an empty string if this module does not correspond to any module file.
-  std::string getCachedModuleFileNameImpl(StringRef ModuleName,
-                                          StringRef ModuleMapPath,
-                                          StringRef CachePath);
+  ModuleFileName getCachedModuleFileNameImpl(StringRef ModuleName,
+                                             StringRef ModuleMapPath,
+                                             StringRef NormalizedCachePath);
 
   /// Retrieve a module with the given name, which may be part of the
   /// given framework.
@@ -1042,6 +1046,11 @@ void ApplyHeaderSearchOptions(HeaderSearch &HS,
 void normalizeModuleCachePath(FileManager &FileMgr, StringRef Path,
                               SmallVectorImpl<char> &NormalizedPath);
 
+std::string createSpecificModuleCachePath(FileManager &FileMgr,
+                                          StringRef ModuleCachePath,
+                                          bool DisableModuleHash,
+                                          std::string ContextHash);
+
 } // namespace clang
 
 #endif // LLVM_CLANG_LEX_HEADERSEARCH_H
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 752e7fd288aa6..4e8fe1d32d42e 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -44,7 +44,7 @@ namespace serialization {
 /// Version 4 of AST files also requires that the version control branch and
 /// revision match exactly, since there is no backward compatibility of
 /// AST files at this time.
-const unsigned VERSION_MAJOR = 35;
+const unsigned VERSION_MAJOR = 37;
 
 /// AST file minor version number supported by this version of
 /// Clang.
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index f254459ce933d..e9706d0ea2f2b 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -185,8 +185,7 @@ class ASTReaderListener {
   /// otherwise.
   virtual bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
                                        StringRef ModuleFilename,
-                                       StringRef SpecificModuleCachePath,
-                                       bool Complain) {
+                                       StringRef ContextHash, bool Complain) {
     return false;
   }
 
@@ -304,8 +303,7 @@ class ChainedASTReaderListener : public ASTReaderListener {
                              bool Complain) override;
 
   bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
-                               StringRef ModuleFilename,
-                               StringRef SpecificModuleCachePath,
+                               StringRef ModuleFilename, StringRef ContextHash,
                                bool Complain) override;
   bool ReadPreprocessorOptions(const PreprocessorOptions &PPOpts,
                                StringRef ModuleFilename, bool ReadMacros,
@@ -349,8 +347,7 @@ class PCHValidator : public ASTReaderListener {
                                bool Complain,
                                std::string &SuggestedPredefines) override;
   bool ReadHeaderSearchOptions(const HeaderSearchOptions &HSOpts,
-                               StringRef ModuleFilename,
-                               StringRef SpecificModuleCachePath,
+                               StringRef ModuleFilename, StringRef ContextHash,
                                bool Complain) override;
   void ReadCounter(const serialization::ModuleFile &M, uint32_t Value) override;
 };
@@ -1552,7 +1549,7 @@ class ASTReader
         : Mod(Mod), ImportedBy(ImportedBy), ImportLoc(ImportLoc) {}
   };
 
-  ASTReadResult ReadASTCore(StringRef FileName, ModuleKind Type,
+  ASTReadResult ReadASTCore(ModuleFileName FileName, ModuleKind Type,
                             SourceLocation ImportLoc, ModuleFile *ImportedBy,
                             SmallVectorImpl<ImportedModule> &Loaded,
                             off_t ExpectedSize, time_t ExpectedModTime,
@@ -1888,7 +1885,7 @@ class ASTReader
   /// NewLoadedModuleFile would refer to the address of the new loaded top level
   /// module. The state of NewLoadedModuleFile is unspecified if the AST file
   /// isn't loaded successfully.
-  ASTReadResult ReadAST(StringRef FileName, ModuleKind Type,
+  ASTReadResult ReadAST(ModuleFileName FileName, ModuleKind Type,
                         SourceLocation ImportLoc,
                         unsigned ClientLoadCapabilities,
                         ModuleFile **NewLoadedModuleFile = nullptr);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 519a74d920129..4915ad0634fcd 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -159,6 +159,9 @@ class ModuleFile {
   /// The file name of the module file.
   std::string FileName;
 
+  /// All keys ModuleManager used for the module file.
+  llvm::DenseSet<ModuleFileKey> Keys;
+
   /// The name of the module.
   std::string ModuleName;
 
diff --git a/clang/include/clang/Serialization/ModuleManager.h b/clang/include/clang/Serialization/ModuleManager.h
index 8ab70b6630f47..162856f2f14c0 100644
--- a/clang/include/clang/Serialization/ModuleManager.h
+++ b/clang/include/clang/Serialization/ModuleManager.h
@@ -16,6 +16,7 @@
 
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Lex/HeaderSearch.h"
 #include "clang/Serialization/ModuleFile.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
@@ -56,8 +57,8 @@ class ModuleManager {
   // to implement short-circuiting logic when running DFS over the dependencies.
   SmallVector<ModuleFile *, 2> Roots;
 
-  /// All loaded modules, indexed by name.
-  llvm::DenseMap<const FileEntry *, ModuleFile *> Modules;
+  /// All loaded modules.
+  llvm::DenseMap<ModuleFileKey, ModuleFile *> Modules;
 
   /// FileManager that handles translating between filenames and
   /// FileEntry *.
@@ -172,14 +173,22 @@ class ModuleManager {
   ModuleFile &operator[](unsigned Index) const { return *Chain[Index]; }
 
   /// Returns the module associated with the given file name.
+  /// Soon to be removed.
   ModuleFile *lookupByFileName(StringRef FileName) const;
 
   /// Returns the module associated with the given module name.
   ModuleFile *lookupByModuleName(StringRef ModName) const;
 
   /// Returns the module associated with the given module file.
+  /// Soon to be removed.
   ModuleFile *lookup(const FileEntry *File) const;
 
+  /// Returns the module associated with the given module file name.
+  ModuleFile *lookupByFileName(ModuleFileName FileName) const;
+
+  /// Returns the module associated with the given module file key.
+  ModuleFile *lookup(ModuleFileKey Key) const;
+
   /// Returns the in-memory (virtual file) buffer with the given name
   std::unique_ptr<llvm::MemoryBuffer> lookupBuffer(StringRef Name);
 
@@ -237,14 +246,13 @@ class ModuleManager {
   ///
   /// \return A pointer to the module that corresponds to this file name,
   /// and a value indicating whether the module was loaded.
-  AddModuleResult addModule(StringRef FileName, ModuleKind Type,
-                            SourceLocation ImportLoc,
-                            ModuleFile *ImportedBy, unsigned Generation,
-                            off_t ExpectedSize, time_t ExpectedModTime,
+  AddModuleResult addModule(ModuleFileName FileName, ModuleKind Type,
+                            SourceLocation ImportLoc, ModuleFile *ImportedBy,
+                            unsigned Generation, off_t ExpectedSize,
+                            time_t ExpectedModTime,
                             ASTFileSignature ExpectedSignature,
                             ASTFileSignatureReader ReadSignature,
-                            ModuleFile *&Module,
-                            std::string &ErrorStr);
+                            ModuleFile *&Module, std::string &ErrorStr);
 
   /// Remove the modules starting from First (to the end).
   void removeModules(ModuleIterator First);
@@ -282,26 +290,6 @@ class ModuleManager {
   void visit(llvm::function_ref<bool(ModuleFile &M)> Visitor,
              llvm::SmallPtrSetImpl<ModuleFile *> *ModuleFilesHit = nullptr);
 
-  /// Attempt to resolve the given module file name to a file entry.
-  ///
-  /// \param FileName The name of the module file.
-  ///
-  /// \param ExpectedSize The size that the module file is expected to have.
-  /// If the actual size differs, the resolver should return \c true.
-  ///
-  /// \param ExpectedModTime The modification time that the module file is
-  /// expected to have. If the actual modification time differs, the resolver
-  /// should return \c true.
-  ///
-  /// \param File Will be set to the file if there is one, or null
-  /// otherwise.
-  ///
-  /// \returns True if a file exists but does not meet the size/
-  /// modification time criteria, false if the file is either available and
-  /// suitable, or is missing.
-  bool lookupModuleFile(StringRef FileName, off_t ExpectedSize,
-                        time_t ExpectedModTime, OptionalFileEntryRef &File);
-
   /// View the graphviz representation of the module graph.
   void viewGraph();
 
diff --git a/clang/lib/Basic/Module.cpp b/clang/lib/Basic/Module.cpp
index 97f742d292224..bfc29e32c3672 100644
--- a/clang/lib/Basic/Module.cpp
+++ b/clang/lib/Basic/Module.cpp
@@ -33,6 +33,23 @@
 
 using namespace clang;
 
+s...
[truncated]

@jansvoboda11 jansvoboda11 force-pushed the no-file-entry-module-cache-1 branch from e558b32 to 2a23310 Compare March 12, 2026 21:06
@llvm llvm deleted a comment from github-actions bot Mar 12, 2026
@llvm llvm deleted a comment from github-actions bot Mar 12, 2026
Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this todo, but the change otherwise looks good.

if (ModuleEntry)
ModuleEntry->Keys.insert(*FileKey);

// TODO: Remove this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this code for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I thought there are multiple places that may look for the same file with inconsistent explicit/implicit path, so this seemed necessary to temporarily support both. But now that I look back, I might've been observing the fact that some tests used %t for both an include path and the module cache, so the DirectoryEntry got cached as non-existing. This made ModuleManager lookups fail with the implicit path, and turning it explicit and just looking up the full path in FileManager would succeed. I've removed this in 86ee991 and also removed the storage for multiple keys on ModuleFile in a109526, since that was there only to support this hack.

LMK if you're happy with how it looks now.

@jansvoboda11 jansvoboda11 requested a review from Bigcheese March 13, 2026 16:04
@jansvoboda11
Copy link
Contributor Author

I made couple of changes in 7b6c696..2323be8:

  1. ModuleFileName now stores the length of the "<context-hash>/<filename>.pcm" suffix instead of its offset from the start of the path. This enabled me to revert some changes to ASTWriter/ASTReader that made the import paths non-relocatable.
  2. ModuleFileName is now stored on ModuleFile, which simplified the serialization code from (from checking StringRef(M.FileName).starts_with(NormalizedModuleCache) to checking if M.FileName.getSuffixLength() is non-zero.
  3. Renamed ModuleFileName::make_{implicit,explicit}() to follow LLVM casing conventions.
  4. Added new test that checks that regardless of the spelling of the module cache, as long as it resolves to the same directory, the behavior stays consistent.

Can you PTAL?

@github-actions
Copy link

github-actions bot commented Mar 16, 2026

🪟 Windows x64 Test Results

  • 54904 tests passed
  • 2361 tests skipped

✅ The build succeeded and all tests passed.

// expected-remark{{importing module 'a' into 'b'}}

// Module cache via symlink.
// RUN: ln -s %t/cache %t/cache-symlink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's this bit that's making it fail on Windows. Maybe split this part out and add a requires to skip it on Windows? I don't think we have a good way to make symlinks on Windows yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this command succeeds, but I'm not sure if it ends up creating a hardlink.

Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except the Windows issue. I'm not sure if that's a real issue with the code, or just with ln -s on Windows.

@jansvoboda11
Copy link
Contributor Author

Yeah, that's just ln -s not working on Windows. Rather than adding REQUIRES: symlinks to the new test and disabling it entirely on Windows, I removed the symlink case which I think is partially covered by Modules/module-symlink.m.

@jansvoboda11 jansvoboda11 requested a review from Bigcheese March 17, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang-tools-extra clangd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants