Skip to content

[clang] Optimize Lexer hot path to reduce compile time#177153

Merged
yronglin merged 4 commits intollvm:mainfrom
yronglin:optimize_cpp_compile_time
Feb 16, 2026
Merged

[clang] Optimize Lexer hot path to reduce compile time#177153
yronglin merged 4 commits intollvm:mainfrom
yronglin:optimize_cpp_compile_time

Conversation

@yronglin
Copy link
Contributor

@yronglin yronglin commented Jan 21, 2026

This patch fix this compile time regression that introduced in #173789.

  • Introduce a TokenFlag::PhysicalStartOfLine flag to replace IsAtPhysicalStartOfLine in a brunch of Lexer member functions and remove ExportContextualKeywordInfo struct.
  • Handle import, module and export keyword in HandleIdentifier instead of in a Lexer hot path.

@yronglin yronglin requested review from cor3ntin and nikic January 21, 2026 12:37
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 21, 2026
@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2026

@llvm/pr-subscribers-clang

Author: None (yronglin)

Changes

This patch fix this compile time regression that introduced in #173789. The regression was fixed by add an early-exit optimization in the hot Lexer path.


Full diff: https://github.com/llvm/llvm-project/pull/177153.diff

2 Files Affected:

  • (modified) clang/include/clang/Lex/Preprocessor.h (+12)
  • (modified) clang/lib/Lex/Lexer.cpp (+14-6)
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 5adc45a19ca79..f286e0d8bb348 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -1871,6 +1871,18 @@ class Preprocessor {
   /// read is the correct one.
   bool HandleModuleContextualKeyword(Token &Result,
                                      bool TokAtPhysicalStartOfLine);
+  /// Quick check whether current token at physical start of line or previous
+  /// export tok was at physical start of line. This is used as an early-exit
+  /// optimization in the hot Lexer::Lex path.
+  //
+  // Returns true if the current token could potentially be a module directive
+  // introducer.
+  bool isModuleDirectiveIntroducerAtPhysicalStartOfLine(
+      bool TokAtPhysicalStartOfLine) {
+    return TokAtPhysicalStartOfLine ||
+           (LastTokenWasExportKeyword.isValid() &&
+            LastTokenWasExportKeyword.isAtPhysicalStartOfLine());
+  }
 
   /// Get the start location of the first pp-token in main file.
   SourceLocation getMainFileFirstPPTokenLoc() const {
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 2c4ba70551fab..c10ca8925586e 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -4058,10 +4058,14 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // so it's safe to access member variables after this call returns.
     bool returnedToken = LexIdentifierContinue(Result, CurPtr);
 
-    if (returnedToken && !LexingRawMode && !Is_PragmaLexer &&
-        !ParsingPreprocessorDirective && LangOpts.CPlusPlusModules &&
-        Result.isModuleContextualKeyword() &&
-        PP->HandleModuleContextualKeyword(Result, TokAtPhysicalStartOfLine))
+    if (LLVM_UNLIKELY(returnedToken && !LexingRawMode && !Is_PragmaLexer &&
+                      !ParsingPreprocessorDirective &&
+                      LangOpts.CPlusPlusModules &&
+                      PP->isModuleDirectiveIntroducerAtPhysicalStartOfLine(
+                          TokAtPhysicalStartOfLine) &&
+                      Result.isModuleContextualKeyword() &&
+                      PP->HandleModuleContextualKeyword(
+                          Result, TokAtPhysicalStartOfLine)))
       goto HandleDirective;
     return returnedToken;
   }
@@ -4637,8 +4641,12 @@ bool Lexer::LexDependencyDirectiveToken(Token &Result) {
     Result.setRawIdentifierData(TokPtr);
     if (!isLexingRawMode()) {
       const IdentifierInfo *II = PP->LookUpIdentifierInfo(Result);
-      if (LangOpts.CPlusPlusModules && Result.isModuleContextualKeyword() &&
-          PP->HandleModuleContextualKeyword(Result, Result.isAtStartOfLine())) {
+      if (LLVM_UNLIKELY(LangOpts.CPlusPlusModules &&
+                        PP->isModuleDirectiveIntroducerAtPhysicalStartOfLine(
+                            Result.isAtStartOfLine()) &&
+                        Result.isModuleContextualKeyword() &&
+                        PP->HandleModuleContextualKeyword(
+                            Result, Result.isAtStartOfLine()))) {
         PP->HandleDirective(Result);
         return false;
       }

@nikic
Copy link
Contributor

nikic commented Jan 21, 2026

@yronglin
Copy link
Contributor Author

…ontextual keyword in HandleIdentifier

Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin yronglin force-pushed the optimize_cpp_compile_time branch from 528d0a6 to 0e6ee2c Compare February 12, 2026 03:28
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin
Copy link
Contributor Author

@nikic Could you help verify that this PR can reduce this compile time? Many thanks!

@nikic
Copy link
Contributor

nikic commented Feb 12, 2026

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=680124ca9a06bd7413cf4cc1dab80f929c7d0bca&to=00edb53c7e3e79eeab95f4f4990c6d9041cd2a76&stat=instructions:u

This doesn't recover the original regression, but is an improvement.

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independently of the small performance improvement, i think the code is much cleaner that way. Thanks!

Signed-off-by: yronglin <yronglin777@gmail.com>
@github-actions
Copy link

github-actions bot commented Feb 16, 2026

🐧 Linux x64 Test Results

  • 113547 tests passed
  • 4688 tests skipped

✅ The build succeeded and all tests passed.

@yronglin
Copy link
Contributor Author

Thanks for the reivew! I'll land the patch first and look for further optimizations in future patches.

@yronglin yronglin merged commit badb215 into llvm:main Feb 16, 2026
10 checks passed
manasij7479 pushed a commit to manasij7479/llvm-project that referenced this pull request Feb 18, 2026
This patch fix this compile time regression that introduced in
llvm#173789.
- Introduce a `TokenFlag::PhysicalStartOfLine` flag to replace
`IsAtPhysicalStartOfLine` in a brunch of `Lexer` member functions and
remove `ExportContextualKeywordInfo` struct.
- Handle `import`, `module` and `export` keyword in `HandleIdentifier`
instead of in a `Lexer` hot path.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants