Skip to content

[C++20][Modules] Implement P1857R3 Modules Dependency Discovery#107168

Merged
yronglin merged 63 commits intollvm:mainfrom
yronglin:modules_dependency_discovery
Dec 19, 2025
Merged

[C++20][Modules] Implement P1857R3 Modules Dependency Discovery#107168
yronglin merged 63 commits intollvm:mainfrom
yronglin:modules_dependency_discovery

Conversation

@yronglin
Copy link
Contributor

@yronglin yronglin commented Sep 4, 2024

This PR implement the following papers:
P1857R3 Modules Dependency Discovery.
P3034R1 Module Declarations Shouldn’t be Macros.
CWG2947.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are
    • at the start of a logical line, or
    • preceded by an export at the start of the logical line.
  • Are followed by an identifier pp token (before macro expansion), or
    • <, ", or : (but not ::) pp tokens for import, or
    • ; for module
      Otherwise the token is treated as an identifier.

Additionally:

  • The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.
  • The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.
  • A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
  • Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword __preprocessed_module and __preprocessed_import. These 2 keyword was generated during -E mode. This is useful to avoid confusion with module and import keyword in preprocessed output:

export module m;
struct import {};
#define EMPTY
EMPTY import foo;

Fixes #54047

@github-actions
Copy link

github-actions bot commented Sep 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

yronglin added a commit that referenced this pull request Apr 16, 2025
…tructures to `IdentifierLoc` (#135808)

I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.
 
This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
…like data structures to `IdentifierLoc` (#135808)

I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.

This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
yronglin added a commit that referenced this pull request Apr 17, 2025
… data structures to `IdentifierLoc` (#136077)

This PR reland #135808, fixed
some missed changes in LLDB.
I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 17, 2025
…` pair-like data structures to `IdentifierLoc` (#136077)

This PR reland llvm/llvm-project#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
… data structures to `IdentifierLoc` (llvm#136077)

This PR reland llvm#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin yronglin force-pushed the modules_dependency_discovery branch from dbc377e to d300a2b Compare May 31, 2025 07:36
@yronglin yronglin marked this pull request as ready for review May 31, 2025 07:43
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels May 31, 2025
@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang-modules

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@yronglin yronglin marked this pull request as draft May 31, 2025 08:02
Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin yronglin force-pushed the modules_dependency_discovery branch from d300a2b to 04ddbf6 Compare June 2, 2025 12:33
@yronglin yronglin marked this pull request as ready for review June 2, 2025 12:33
@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Jun 2, 2025
@yronglin
Copy link
Contributor Author

yronglin commented Dec 19, 2025

The failure does look related to this change.

Yes, I'm working on a fix PR. The '.' has an unexpected atStartOfLine flag.

#define str(s) # s 
#define xstr(s) str(s) 
#define INCFILE(n) vers ## n 

include xstr(INCFILE(2).h) 

@yronglin
Copy link
Contributor Author

Created a new PR(#173052) to fix this.

@ilovepi
Copy link
Contributor

ilovepi commented Dec 19, 2025

Not entirely sure if this is the same issue, but we're seeing a crash on our Mac builders with this patch.

Error Message:

FAILED: CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o 
/Volumes/Work/s/w/ir/x/w/llvm_build/./bin/clang --target=x86_64-apple-darwin25.1.0 --sysroot=/Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk  -I/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/compiler-rt/lib/builtins/../../../third-party/siphash/include -O3 -DNDEBUG -arch i386 -isysroot /Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -isysroot /Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=10.7 -fPIC -O3 -fvisibility=hidden -DVISIBILITY_HIDDEN -Wall -fomit-frame-pointer -Werror=format-nonliteral -DDONT_DEFINE_EPRINTF -arch i386 -MD -MT CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o -MF CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o.d -o CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o -c /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/compiler-rt/lib/builtins/emutls.c
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /Volumes/Work/s/w/ir/x/w/llvm_build/./bin/clang --target=x86_64-apple-darwin25.1.0 --sysroot=/Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -I/Volumes/Work/s/w/ir/x/w/llvm-llvm-project/compiler-rt/lib/builtins/../../../third-party/siphash/include -O3 -DNDEBUG -arch i386 -isysroot /Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -isysroot /Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=10.7 -fPIC -O3 -fvisibility=hidden -DVISIBILITY_HIDDEN -Wall -fomit-frame-pointer -Werror=format-nonliteral -DDONT_DEFINE_EPRINTF -arch i386 -MD -MT CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o -MF CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o.d -o CMakeFiles/clang_rt.builtins_i386_osx.dir/emutls.c.o -c /Volumes/Work/s/w/ir/x/w/llvm-llvm-project/compiler-rt/lib/builtins/emutls.c
1.	/Volumes/Work/s/w/ir/cache/macos_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk/usr/include/pthread/sched.h:35:1: current parser token 'struct'
 #0 0x0000000107ee3c18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x101d03c18)
 #1 0x0000000107ee163f llvm::sys::RunSignalHandlers() (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x101d0163f)
 #2 0x0000000107ee3176 llvm::sys::CleanupOnSignal(unsigned long) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x101d03176)
 #3 0x0000000107e5c9fe CrashRecoverySignalHandler(int) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x101c7c9fe)
 #4 0x00007ff80dd6637d (/usr/lib/system/libsystem_platform.dylib+0x7ff802bc137d)
 #5 0x000e7dee00000001
 #6 0x000000010b271643 clang::Lexer::Lex(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105091643)
 #7 0x000000010b2e7f7d clang::Preprocessor::Lex(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105107f7d)
 #8 0x000000010b2d67b7 clang::Preprocessor::Handle_Pragma(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x1050f67b7)
 #9 0x000000010b2c7a4c clang::Preprocessor::HandleMacroExpandedIdentifier(clang::Token&, clang::MacroDefinition const&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x1050e7a4c)
#10 0x000000010b2e7bca clang::Preprocessor::HandleIdentifier(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105107bca)
#11 0x000000010b26e215 clang::Lexer::LexIdentifierContinue(clang::Token&, char const*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10508e215)
#12 0x000000010b274ee9 clang::Lexer::LexTokenInternal(clang::Token&, bool) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105094ee9)
#13 0x000000010b271643 clang::Lexer::Lex(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105091643)
#14 0x000000010b2e7f7d clang::Preprocessor::Lex(clang::Token&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x105107f7d)
#15 0x0000000109ff5d2c clang::Parser::ExpectAndConsumeSemi(unsigned int, llvm::StringRef) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103e15d2c)
#16 0x0000000109fb8234 clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103dd8234)
#17 0x0000000109ffbf8b clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103e1bf8b)
#18 0x0000000109ffb856 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103e1b856)
#19 0x0000000109ffa018 clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103e1a018)
#20 0x0000000109ff8521 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103e18521)
#21 0x0000000109f0a4be clang::ParseAST(clang::Sema&, bool, bool) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x103d2a4be)
#22 0x0000000108b7e1ba clang::FrontendAction::Execute() (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10299e1ba)
#23 0x0000000108aed7fd clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10290d7fd)
#24 0x0000000108c7becb clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x102a9becb)
#25 0x0000000106214215 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x100034215)
#26 0x00000001062110b7 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x1000310b7)
#27 0x000000010621342c int llvm::function_ref<int (llvm::SmallVectorImpl<char const*>&)>::callback_fn<clang_main(int, char**, llvm::ToolContext const&)::$_0>(long, llvm::SmallVectorImpl<char const*>&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10003342c)
#28 0x000000010894c86e void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const::$_0>(long) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10276c86e)
#29 0x0000000107e5c73e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x101c7c73e)
#30 0x000000010894bbcf clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10276bbcf)
#31 0x000000010890bff6 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10272bff6)
#32 0x000000010890c24f clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&, bool) const (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10272c24f)
#33 0x000000010892bbb0 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x10274bbb0)
#34 0x00000001062107f2 clang_main(int, char**, llvm::ToolContext const&) (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x1000307f2)
#35 0x0000000106220150 main (/Volumes/Work/s/w/ir/x/w/llvm_build/bin/clang-22+0x100040150)
#36 0x00007ff80d985781
clang: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Fuchsia clang version 22.0.0git (https://llvm.googlesource.com/llvm-project d2e62d902438bb5860f2376e818d797bf20daa7d)
Target: i386-apple-darwin25.1.0
Thread model: posix
InstalledDir: /Volumes/Work/s/w/ir/x/w/llvm_build/bin
Build config: +assertions
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /Volumes/Work/s/w/ir/x/w/llvm_build/clang-crashreports/emutls-034ae5.c
clang: note: diagnostic msg: /Volumes/Work/s/w/ir/x/w/llvm_build/clang-crashreports/emutls-034ae5.sh
clang: note: diagnostic msg: Crash backtrace is located in
clang: note: diagnostic msg: /Volumes/Work/s/w/ir/x/t/Library/Logs/DiagnosticReports/clang-22_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang: note: diagnostic msg: 

********************

Bot: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-mac-arm64/b8695039956773730673/overview

Crash reproducer: https://storage.cloud.google.com/fuchsia-artifacts/builds/8695039956773730673/emutls-ada33d.tar.gz

@shafik
Copy link
Collaborator

shafik commented Dec 19, 2025

@ilovepi can you verify of the fix PR fixes your issue?

@ilovepi
Copy link
Contributor

ilovepi commented Dec 19, 2025

@ilovepi can you verify of the fix PR fixes your issue?

It would take me some time to access an appropriate piece of Mac hardware. I can look at launching a custom CI job but that isn't always simple. If the PR fixes the other failure, then I'd say land it and our CI will pick it up relatively soon (sooner than I can get access to such a machine).

I'd also hope the linked crash reproducer would fail the same way, even on non-mac hardware.

@dyung
Copy link
Collaborator

dyung commented Dec 19, 2025

My Mac bot is also hitting a crash and might be the same issue as reported by @ilovepi.
https://lab.llvm.org/buildbot/#/builders/190/builds/33105

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Dec 19, 2025
…#107168)

This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes llvm#54047

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
@nikic
Copy link
Contributor

nikic commented Dec 19, 2025

It looks like this change has significant compile time impact (+0.5% at O0): https://llvm-compile-time-tracker.com/compare.php?from=469e5245987b3250a9ca675275d59c7c5d6bbb00&to=d2e62d902438bb5860f2376e818d797bf20daa7d&stat=instructions:u

Is that expected?

@ilovepi
Copy link
Contributor

ilovepi commented Dec 19, 2025

I managed to get ahold of an arm64 macbook. With this patch, it the build crashes building libunwind with the new compiler. If I revert this patch, the build completes just fine.

I think this needs to be reverted, until you can investigate the cause of the failure. Maybe some Apple folks have an idea of whats going wrong. I'll try to see if I can use lldb to point you in the right direction, but our kernel CI is on fire and I'm investigating other potential miscompiles.

My CMake invocation was:

cmake -GNinja -DLLVM_ENABLE_PROJECTS="clang;llvm;lld;compiler-rt" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" ../llvm -DCMAKE_BUILD_TYPE=RelWithDebInfo

@ilovepi
Copy link
Contributor

ilovepi commented Dec 19, 2025

Seems like LangOpts is invalid on Mac builds? I don't understand why, but that seems to be the source of the error.

rocess 93144 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x15)
    frame #0: 0x00000001045b1db0 clang++`clang::Token::isModuleContextualKeyword(this=0x0000000152957810, LangOpts=0x0000000000000000, AllowExport=true) const at Lexer.cpp:78:7 [opt] [inlined]
   75
   76  	bool Token::isModuleContextualKeyword(const LangOptions &LangOpts,
   77  	                                      bool AllowExport) const {
-> 78  	  if (!LangOpts.CPlusPlusModules)
   79  	    return false;
   80  	  if (AllowExport && is(tok::kw_export))
   81  	    return true;
Target 0: (clang++) stopped.
warning: clang++ was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) p LangOpts.CPlusPlusModules

error: supposed to interpret, but failed: Interpreter couldn't read from memory
(lldb) p LangOpts
(const clang::LangOptions &) 0x0000000000000000

@dyung
Copy link
Collaborator

dyung commented Dec 19, 2025

Given the issues with this patch, unless some fixes are coming in soon, I also feel we need to revert this.

@qinkunbao
Copy link
Member

Hi,

+1 to revert this PR. Could you please take a look about the buildbot failures?

https://lab.llvm.org/buildbot/#/builders/94/builds/13727
https://lab.llvm.org/buildbot/#/builders/169/builds/18192

@yronglin
Copy link
Contributor Author

https://lab.llvm.org/buildbot/#/builders/94/builds/13727

Hi thanks for report this, do you know how to reproduce this sanitize build in local?

@yronglin
Copy link
Contributor Author

It looks like this change has significant compile time impact (+0.5% at O0): https://llvm-compile-time-tracker.com/compare.php?from=469e5245987b3250a9ca675275d59c7c5d6bbb00&to=d2e62d902438bb5860f2376e818d797bf20daa7d&stat=instructions:u

Is that expected?

I suspect the compilation time might increase because we've added more code to Lexer, which could potentially impact the critical path. Thank you for reporting this issue; I will continue to monitor it and try to reduce the compilation time.

@aemerson
Copy link
Contributor

I know we don't have any official policy about this: but I'd like to point out that it's the holiday season for many people. Landing such a big patch that could have some teething problems (which I'm not criticizing, that's normal part of development) today, on a Friday, where a lot of folks will not be returning until the new year, isn't ideal.

At Apple, and I'd guess in most other organizations, there'll be a skeleton crew trying to keep things ticking along over the break and we'd appreciate some stability while there are few eyes on quality. Just m2c.

valadaptive pushed a commit to valadaptive/llvm-project that referenced this pull request Dec 24, 2025
…#107168)

This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes llvm#54047

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
yronglin added a commit that referenced this pull request Dec 25, 2025
…ery" (#173130)

This PR reapply #107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 25, 2025
…ency Discovery" (#173130)

This PR reapply llvm/llvm-project#107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jan 6, 2026
…#107168)

This PR implement the following papers:
[P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3).
[P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1).
[CWG2947](https://cplusplus.github.io/CWG/issues/2947.html).

At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:

 - After skipping horizontal whitespace are
    - at the start of a logical line, or
    - preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
    - <, ", or : (but not ::) pp tokens for import, or
    - ; for module
Otherwise the token is treated as an identifier.

Additionally:

- The entire import or module directive (including the closing ;) must
be on a single logical line and for module must not come from an
#include.
- The expansion of macros must not result in an import or module
directive introducer that was not there prior to macro expansion.
- A module directive may only appear as the first preprocessing tokens
in a file (excluding the global module fragment.)
- Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as
a real pp-directive in preprocessor. Additionally, we refactor module
name lexing, remove the complex state machine and read full module name
during module/import directive handling. Possibly we can introduce a
tok::annot_module_name token in the future, avoid duplicatly parsing
module name in both preprocessor and parser, but it's makes error
recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword `__preprocessed_module` and
`__preprocessed_import`. These 2 keyword was generated during `-E` mode.
This is useful to avoid confusion with `module` and `import` keyword in
preprocessed output:
```cpp
export module m;
struct import {};
#define EMPTY
EMPTY import foo;
```

Fixes llvm#54047

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jan 6, 2026
…ery" (llvm#173130)

This PR reapply llvm#107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
navaneethshan pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Jan 8, 2026
…ery" (#173130)

This PR reapply llvm/llvm-project#107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
navaneethshan pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Jan 9, 2026
…ery" (#173130)

This PR reapply llvm/llvm-project#107168.

---------

Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>

(cherry picked from commit 0d1c396)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clang C++20 Feature: P1857R3 - Modules Dependency Discovery