Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)"#173789
Reapply "[C++20][Modules] Implement P1857R3 Modules Dependency Discovery" (#173130)"#173789
Conversation
…ery" (llvm#173130)" Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
|
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: None (yronglin) ChangesThe patch reapply #173130. The previous patch has an use-after-free issue in Lexer::LexTokenInternal function. Since C++20, the Fix the use-after-free issue in Lexer. Also this is just a workaround; we should introduce a more reasonable mechanism to delay the destruction of bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
......
// C99 6.4.2: Identifiers.
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
case 'H': case 'I': case 'J': case 'K': /*'L'*/case 'M': case 'N':
case 'O': case 'P': case 'Q': /*'R'*/case 'S': case 'T': /*'U'*/
case 'V': case 'W': case 'X': case 'Y': case 'Z':
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
case 'o': case 'p': case 'q': case 'r': case 's': case 't': /*'u'*/
case 'v': case 'w': case 'x': case 'y': case 'z':
case '_': {
// Notify MIOpt that we read a non-whitespace/non-comment token.
MIOpt.ReadToken();
// The reason for saving and using these members here is that EOF may be
// encountered in LexIdentifierContinue and current Lexer might be
// destructed in HandleEndOfFile, If the code after LexIdentifierContinue
// try to access LangOps in this Lexer, it will hit undefined behavior.
//
// FIXME: we introduce a new mechanism to delay the destruction of CurLexer,
// at least until the current Lexer::Lex function ends.
bool isCPlusPlusModules = LangOpts.CPlusPlusModules;
bool lexingRawMode = LexingRawMode;
bool is_PragmaLexer = Is_PragmaLexer;
bool parsingPreprocessorDirective = ParsingPreprocessorDirective;
auto *ThePP = PP;
bool returnedToken = LexIdentifierContinue(Result, CurPtr);
if (returnedToken && !lexingRawMode && !is_PragmaLexer &&
!parsingPreprocessorDirective && isCPlusPlusModules &&
Result.isModuleContextualKeyword() &&
ThePP->HandleModuleContextualKeyword(Result, TokAtPhysicalStartOfLine))
goto HandleDirective;
return returnedToken;
}
......
}Patch is 143.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173789.diff 44 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 2319ff13f7864..7bf003f700448 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -209,6 +209,7 @@ C++20 Feature Support
- Clang now normalizes constraints before checking whether they are satisfied, as mandated by the standard.
As a result, Clang no longer incorrectly diagnoses substitution failures in template arguments only
used in concept-ids, and produces better diagnostics for satisfaction failure. (#GH61811) (#GH135190)
+- Clang now supports `P1857R3 <https://wg21.link/p1857r3>`_ Modules Dependency Discovery. (#GH54047)
C++17 Feature Support
^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/docs/StandardCPlusPlusModules.rst b/clang/docs/StandardCPlusPlusModules.rst
index 71988d0fced98..f6ab17ede46fa 100644
--- a/clang/docs/StandardCPlusPlusModules.rst
+++ b/clang/docs/StandardCPlusPlusModules.rst
@@ -1384,33 +1384,6 @@ declarations which use it. Thus, the preferred name will not be displayed in
the debugger as expected. This is tracked by
`#56490 <https://github.com/llvm/llvm-project/issues/56490>`_.
-Don't emit macros about module declaration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This is covered by `P1857R3 <https://wg21.link/P1857R3>`_. It is mentioned here
-because we want users to be aware that we don't yet implement it.
-
-A direct approach to write code that can be compiled by both modules and
-non-module builds may look like:
-
-.. code-block:: c++
-
- MODULE
- IMPORT header_name
- EXPORT_MODULE MODULE_NAME;
- IMPORT header_name
- EXPORT ...
-
-The intent of this is that this file can be compiled like a module unit or a
-non-module unit depending on the definition of some macros. However, this usage
-is forbidden by P1857R3 which is not yet implemented in Clang. This means that
-is possible to write invalid modules which will no longer be accepted once
-P1857R3 is implemented. This is tracked by
-`#54047 <https://github.com/llvm/llvm-project/issues/54047>`_.
-
-Until then, it is recommended not to mix macros with module declarations.
-
-
Inconsistent filename suffix requirement for importable module units
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index a72d3f37b1b72..77feea9f869e9 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -503,8 +503,8 @@ def warn_cxx98_compat_variadic_macro : Warning<
InGroup<CXX98CompatPedantic>, DefaultIgnore;
def ext_named_variadic_macro : Extension<
"named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
-def err_embedded_directive : Error<
- "embedding a #%0 directive within macro arguments is not supported">;
+def err_embedded_directive : Error<"embedding a %select{#|C++ }0%1 directive "
+ "within macro arguments is not supported">;
def ext_embedded_directive : Extension<
"embedding a directive within macro arguments has undefined behavior">,
InGroup<DiagGroup<"embedded-directive">>;
@@ -998,6 +998,21 @@ def warn_module_conflict : Warning<
InGroup<ModuleConflict>;
// C++20 modules
+def err_pp_module_name_is_macro : Error<
+ "%select{module|partition}0 name component %1 cannot be a object-like macro">;
+def err_pp_module_expected_ident : Error<
+ "expected %select{identifier after '.' in |}0module name">;
+def err_pp_unexpected_tok_after_module_name : Error<
+ "unexpected preprocessing token '%0' after module name, "
+ "only ';' and '[' (start of attribute specifier sequence) are allowed">;
+def warn_pp_extra_tokens_at_module_directive_eol
+ : Warning<"extra tokens after semicolon in '%0' directive">,
+ InGroup<ExtraTokens>;
+def err_pp_module_decl_in_header
+ : Error<"module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl
+ : Error<"module directive lines are not allowed on lines controlled "
+ "by preprocessor conditionals">;
def err_header_import_semi_in_macro : Error<
"semicolon terminating header import declaration cannot be produced "
"by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 662fe16d965b6..83d4ce3ca278c 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1792,10 +1792,8 @@ def ext_bit_int : Extension<
} // end of Parse Issue category.
let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
- "module declaration can only appear at the top level">;
-def err_module_expected_ident : Error<
- "expected a module name after '%select{module|import}0'">;
+def err_unexpected_module_or_import_decl : Error<
+ "%select{module|import}0 declaration can only appear at the top level">;
def err_attribute_not_module_attr : Error<
"%0 attribute cannot be applied to a module">;
def err_keyword_not_module_attr : Error<
@@ -1806,6 +1804,10 @@ def err_keyword_not_import_attr : Error<
"%0 cannot be applied to a module import">;
def err_module_expected_semi : Error<
"expected ';' after module name">;
+def err_expected_semi_after_module_or_import
+ : Error<"%0 directive must end with a ';'">;
+def note_module_declared_here : Note<
+ "%select{module|import}0 directive defined here">;
def err_global_module_introducer_not_at_start : Error<
"'module;' introducing a global module fragment can appear only "
"at the start of the translation unit">;
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 043c184323876..1131727ed23ee 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -231,6 +231,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
LLVM_PREFERRED_TYPE(bool)
unsigned IsModulesImport : 1;
+ // True if this is the 'module' contextual keyword.
+ LLVM_PREFERRED_TYPE(bool)
+ unsigned IsModulesDecl : 1;
+
// True if this is a mangled OpenMP variant name.
LLVM_PREFERRED_TYPE(bool)
unsigned IsMangledOpenMPVariantName : 1;
@@ -267,8 +271,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
- IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
- IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+ IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+ IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+ IsKeywordInCpp(false) {}
public:
IdentifierInfo(const IdentifierInfo &) = delete;
@@ -569,12 +574,24 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
}
/// Determine whether this is the contextual keyword \c import.
- bool isModulesImport() const { return IsModulesImport; }
+ bool isImportKeyword() const { return IsModulesImport; }
/// Set whether this identifier is the contextual keyword \c import.
- void setModulesImport(bool I) {
- IsModulesImport = I;
- if (I)
+ void setKeywordImport(bool Val) {
+ IsModulesImport = Val;
+ if (Val)
+ NeedsHandleIdentifier = true;
+ else
+ RecomputeNeedsHandleIdentifier();
+ }
+
+ /// Determine whether this is the contextual keyword \c module.
+ bool isModuleKeyword() const { return IsModulesDecl; }
+
+ /// Set whether this identifier is the contextual keyword \c module.
+ void setModuleKeyword(bool Val) {
+ IsModulesDecl = Val;
+ if (Val)
NeedsHandleIdentifier = true;
else
RecomputeNeedsHandleIdentifier();
@@ -629,7 +646,7 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
void RecomputeNeedsHandleIdentifier() {
NeedsHandleIdentifier = isPoisoned() || hasMacroDefinition() ||
isExtensionToken() || isFutureCompatKeyword() ||
- isOutOfDate() || isModulesImport();
+ isOutOfDate() || isImportKeyword();
}
};
@@ -797,10 +814,11 @@ class IdentifierTable {
// contents.
II->Entry = &Entry;
- // If this is the 'import' contextual keyword, mark it as such.
+ // If this is the 'import' or 'module' contextual keyword, mark it as such.
if (Name == "import")
- II->setModulesImport(true);
-
+ II->setKeywordImport(true);
+ else if (Name == "module")
+ II->setModuleKeyword(true);
return *II;
}
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 3d955095b07a8..a3d286fdb81a7 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,11 @@ PPKEYWORD(pragma)
// C23 & C++26 #embed
PPKEYWORD(embed)
+// C++20 Module Directive
+PPKEYWORD(module)
+PPKEYWORD(__preprocessed_module)
+PPKEYWORD(__preprocessed_import)
+
// GNU Extensions.
PPKEYWORD(import)
PPKEYWORD(include_next)
@@ -1030,6 +1035,9 @@ ANNOTATION(module_include)
ANNOTATION(module_begin)
ANNOTATION(module_end)
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
// Annotation for a header_name token that has been looked up and transformed
// into the name of a header unit.
ANNOTATION(header_unit)
diff --git a/clang/include/clang/Basic/TokenKinds.h b/clang/include/clang/Basic/TokenKinds.h
index a801113c57715..c0316257d9d97 100644
--- a/clang/include/clang/Basic/TokenKinds.h
+++ b/clang/include/clang/Basic/TokenKinds.h
@@ -76,6 +76,10 @@ const char *getPunctuatorSpelling(TokenKind Kind) LLVM_READNONE;
/// tokens like 'int' and 'dynamic_cast'. Returns NULL for other token kinds.
const char *getKeywordSpelling(TokenKind Kind) LLVM_READNONE;
+/// Determines the spelling of simple Objective-C keyword tokens like '@import'.
+/// Returns NULL for other token kinds.
+const char *getObjCKeywordSpelling(ObjCKeywordKind Kind) LLVM_READNONE;
+
/// Returns the spelling of preprocessor keywords, such as "else".
const char *getPPKeywordSpelling(PPKeywordKind Kind) LLVM_READNONE;
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index ded5f55d180aa..6db1be7f0ed58 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -905,7 +905,7 @@ class CompilerInstance : public ModuleLoader {
/// load it.
ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
SourceLocation ImportLoc,
- SourceLocation ModuleNameLoc,
+ SourceRange ModuleNameRange,
bool IsInclusionDirective);
/// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
#ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
#define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
#include "llvm/ADT/StringRef.h"
namespace clang {
class IdentifierInfo;
class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
/// Callback handler that receives notifications when performing code
/// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
/// file where we expect natural language, e.g., a comment, string, or
/// \#error directive.
virtual void CodeCompleteNaturalLanguage() { }
+
+ /// Callback invoked when performing code completion inside the module name
+ /// part of an import directive.
+ virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+ ModuleIdPath Path) {}
};
}
diff --git a/clang/include/clang/Lex/DependencyDirectivesScanner.h b/clang/include/clang/Lex/DependencyDirectivesScanner.h
index f9fec3998ca53..b21da166a96e5 100644
--- a/clang/include/clang/Lex/DependencyDirectivesScanner.h
+++ b/clang/include/clang/Lex/DependencyDirectivesScanner.h
@@ -135,6 +135,22 @@ void printDependencyDirectivesAsSource(
ArrayRef<dependency_directives_scan::Directive> Directives,
llvm::raw_ostream &OS);
+/// Scan an input source buffer for C++20 named module usage.
+///
+/// \param Source The input source buffer.
+///
+/// \returns true if any C++20 named modules related directive was found.
+bool scanInputForCXX20ModulesUsage(StringRef Source);
+
+/// Scan an input source buffer, and check whether the input source is a
+/// preprocessed output.
+///
+/// \param Source The input source buffer.
+///
+/// \returns true if any '__preprocessed_module' or '__preprocessed_import'
+/// directive was found.
+bool isPreprocessedModuleFile(StringRef Source);
+
/// Functor that returns the dependency directives for a given file.
class DependencyDirectivesGetter {
public:
diff --git a/clang/include/clang/Lex/ModuleLoader.h b/clang/include/clang/Lex/ModuleLoader.h
index a58407200c41c..042a5ab1f4a57 100644
--- a/clang/include/clang/Lex/ModuleLoader.h
+++ b/clang/include/clang/Lex/ModuleLoader.h
@@ -159,6 +159,7 @@ class ModuleLoader {
/// \returns Returns true if any modules with that symbol found.
virtual bool lookupMissingImports(StringRef Name,
SourceLocation TriggerLoc) = 0;
+ static std::string getFlatNameFromPath(ModuleIdPath Path);
bool HadFatalFailure = false;
};
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index b1c648e647f41..c8356b1dd45e4 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
#include <cassert>
#include <cstddef>
#include <cstdint>
@@ -136,6 +137,64 @@ struct CXXStandardLibraryVersionInfo {
std::uint64_t Version;
};
+/// Record the previous 'export' keyword info.
+///
+/// Since P1857R3, the standard introduced several rules to determine whether
+/// the 'module', 'export module', 'import', 'export import' is a valid
+/// directive introducer. This class is used to record the previous 'export'
+/// keyword token, and then handle 'export module' and 'export import'.
+class ExportContextualKeywordInfo {
+ Token ExportTok;
+ bool AtPhysicalStartOfLine = false;
+
+public:
+ ExportContextualKeywordInfo() = default;
+ ExportContextualKeywordInfo(const Token &Tok, bool AtPhysicalStartOfLine)
+ : ExportTok(Tok), AtPhysicalStartOfLine(AtPhysicalStartOfLine) {}
+
+ bool isValid() const { return ExportTok.is(tok::kw_export); }
+ bool isAtPhysicalStartOfLine() const { return AtPhysicalStartOfLine; }
+ Token getExportTok() const { return ExportTok; }
+ void reset() {
+ ExportTok.startToken();
+ AtPhysicalStartOfLine = false;
+ }
+};
+
+class ModuleNameLoc final
+ : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+ friend TrailingObjects;
+ unsigned NumIdentifierLocs;
+ unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+ return getNumIdentifierLocs();
+ }
+
+ ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+ (void)llvm::copy(Path, getTrailingObjectsNonStrict<IdentifierLoc>());
+ }
+
+public:
+ static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+ unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+ ModuleIdPath getModuleIdPath() const {
+ return {getTrailingObjectsNonStrict<IdentifierLoc>(),
+ getNumIdentifierLocs()};
+ }
+
+ SourceLocation getBeginLoc() const {
+ return getModuleIdPath().front().getLoc();
+ }
+ SourceLocation getEndLoc() const {
+ auto &Last = getModuleIdPath().back();
+ return Last.getLoc().getLocWithOffset(
+ Last.getIdentifierInfo()->getLength());
+ }
+ SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+ std::string str() const {
+ return ModuleLoader::getFlatNameFromPath(getModuleIdPath());
+ }
+};
+
/// Engages in a tight little dance with the lexer to efficiently
/// preprocess tokens.
///
@@ -339,8 +398,9 @@ class Preprocessor {
/// lexed, if any.
SourceLocation ModuleImportLoc;
- /// The import path for named module that we're currently processing.
- SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+ /// The source location of the \c module contextual keyword we just
+ /// lexed, if any.
+ SourceLocation ModuleDeclLoc;
llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
unsigned CheckPointCounter = 0;
@@ -351,6 +411,12 @@ class Preprocessor {
/// Whether the last token we lexed was an '@'.
bool LastTokenWasAt = false;
+ /// Whether we're importing a standard C++20 named Modules.
+ bool ImportingCXXNamedModules = false;
+
+ /// Whether the last token we lexed was an 'export' keyword.
+ ExportContextualKeywordInfo LastTokenWasExportKeyword;
+
/// First pp-token source location in current translation unit.
SourceLocation FirstPPTokenLoc;
@@ -562,9 +628,9 @@ class Preprocessor {
reset();
}
- void handleIdentifier(IdentifierInfo *Identifier) {
- if (isModuleCandidate() && Identifier)
- Name += Identifier->getName().str();
+ void handleModuleName(ModuleNameLoc *NameLoc) {
+ if (isModuleCandidate() && NameLoc)
+ Name += NameLoc->str();
else if (!isNamedModule())
reset();
}
@@ -576,13 +642,6 @@ class Preprocessor {
reset();
}
- void handlePeriod() {
- if (isModuleCandidate())
- Name += ".";
- else if (!isNamedModule())
- reset();
- }
-
void handleSemi() {
if (!Name.empty() && isModuleCandidate()) {
if (State == InterfaceCandidate)
@@ -639,10 +698,6 @@ class Preprocessor {
ModuleDeclSeq ModuleDeclState;
- /// Whether the module import expects an identifier next. Otherwise,
- /// it expects a '.' or ';'.
- bool ModuleImportExpectsIdentifier = false;
-
/// The identifier and source location of the currently-active
/// \#pragma clang arc_cf_code_audited begin.
IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1125,6 +1180,9 @@ class Preprocessor {
/// Whether tokens are being skipped until the through header is seen.
bool SkippingUntilPCHThroughHeader = false;
+ /// Whether the main file is preprocessed module file.
+ bool MainFileIsPreprocessedModuleFile = false;
+
/// \{
/// Cache of macro expanders to reduce malloc traffic.
enum { TokenLexerCacheSize = 8 };
@@ -1778,6 +1836,36 @@ class Preprocessor {
std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
bool ForHasEmbed);
+ /// Whether the main file is preprocessed module file.
+ bool isPreprocessedModuleFile() const {
+ return MainFileIsPreprocessedModuleFile;
+ }
+
+ /// Mark the main file as a preprocessed module file, then the 'module' and
+ /// 'import' directive recognition will be suppressed. Only
+ /// '__preprocessed_moduke' and '__preprocessed_import' are allowed.
+ void markMainFileAsPreprocessedModuleFile() {
+ MainFileIsPreprocessedModuleFile = true;
+ }
+
+ bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+ SmallVectorImpl<Token> &Suffix,
+ SmallVectorImpl<IdentifierLoc> &Path,
+ bool AllowMacroExpansion = true,
+ bool IsPartition = false);
+ void EnterModuleSuffixTokenStream(ArrayRef<Token> Toks);
+ void HandleCXXImportDirective(Token Import);
+ void HandleCXXModuleDirective(Token Module);
+
+ /// Callback invoked when the lexer sees one of export, import or module token
+ /// at the start of a line.
+ ///
+ /// This consumes the...
[truncated]
|
|
Clang 22 will branch in less than a week (13th of Jan). It would be nice if this PR could still make it in its third attempt |
|
Given that this failed to land twice, I think it needs more time to bake in |
|
The fact that it landed twice already could also be interpreted as having already flushed out most potential issues. And if it's still turns out to be a major problem, it's still possible to revert during the RC phase (but the opposite is not true, i.e. landing it during that time) |
I don't think the feature is so important. It has been reverted twice doesn't mean it is stable now. It shouldn't matter to land it later. |
|
+1 to not landing until after the branch point; having more bake time (particularly for something that's been reverted a few times) is not a bad thing. |
Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
|
Thanks for the review! I have found the root case and the memory issue was fixed by last 2 commits. I have run asan local, and it's no Lexer related failures. Could you help review this approach make sense? |
h-vetinari
left a comment
There was a problem hiding this comment.
now that clang 22 has branched off
Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
|
Lets land this this week, so it doesn't miss clang 23 :) Thanks for the very hard work! |
|
Thanks for the review! |
…ery" (llvm#173130)" (llvm#173789) The patch reapply llvm#173130. This patch implement the following papers: [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3). [P3034R1 Module Declarations Shouldn’t be Macros](https://wg21.link/P3034R1). [CWG2947](https://cplusplus.github.io/CWG/issues/2947.html). At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff: - After skipping horizontal whitespace are - at the start of a logical line, or - preceded by an export at the start of the logical line. - Are followed by an identifier pp token (before macro expansion), or - <, ", or : (but not ::) pp tokens for import, or - ; for module Otherwise the token is treated as an identifier. Additionally: - The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include. - The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion. - A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) - Preprocessor conditionals shall not span a module declaration. After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line). This patch also introduce 2 new keyword `__preprocessed_module` and `__preprocessed_import`. These 2 keyword was generated during `-E` mode. This is useful to avoid confusion with `module` and `import` keyword in preprocessed output: ```cpp export module m; struct import {}; #define EMPTY EMPTY import foo; ``` Fixes llvm#54047 The previous patch has an use-after-free issue in Lexer::LexTokenInternal function. Since C++20, the `export`, `import` and `module` identifiers may be an introducer of a C++ module declaration/importing directive, and the directive will handled in `LexIdentifierContinue`. Unfortunately, the EOF may be encountered in `LexIdentifierContinue` and `CurLexer` might be destructed in `HandleEndOfFile`, If the code after `LexIdentifierContinue` try to access `LangOps` or other class members in this Lexer, it will hit undefined behavior. This patch also fix the use-after-free issue in Lexer by introduce a mechanism to delay the destruction of `CurLexer` in `Preprocessor` class. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
|
FYI this patch still causes compile-time regressions: https://llvm-compile-time-tracker.com/compare.php?from=3ca2a5fc0b84762f0e7d8a0e613fd69f7e344219&to=1da403937eb9f48b2de9c27ba1aa0eba50bfdf5f&stat=instructions:u |
Thanks, I'll try to reduce the compile-time |
Mark [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3) as implemented. This paper was implemented in #173789. Signed-off-by: yronglin <yronglin777@gmail.com>
Mark [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3) as implemented. This paper was implemented in llvm/llvm-project#173789. Signed-off-by: yronglin <yronglin777@gmail.com>
…ery" (llvm#173130)" (llvm#173789) The patch reapply llvm#173130. This patch implement the following papers: [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3). [P3034R1 Module Declarations Shouldn’t be Macros](https://wg21.link/P3034R1). [CWG2947](https://cplusplus.github.io/CWG/issues/2947.html). At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff: - After skipping horizontal whitespace are - at the start of a logical line, or - preceded by an export at the start of the logical line. - Are followed by an identifier pp token (before macro expansion), or - <, ", or : (but not ::) pp tokens for import, or - ; for module Otherwise the token is treated as an identifier. Additionally: - The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include. - The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion. - A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) - Preprocessor conditionals shall not span a module declaration. After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line). This patch also introduce 2 new keyword `__preprocessed_module` and `__preprocessed_import`. These 2 keyword was generated during `-E` mode. This is useful to avoid confusion with `module` and `import` keyword in preprocessed output: ```cpp export module m; struct import {}; #define EMPTY EMPTY import foo; ``` Fixes llvm#54047 The previous patch has an use-after-free issue in Lexer::LexTokenInternal function. Since C++20, the `export`, `import` and `module` identifiers may be an introducer of a C++ module declaration/importing directive, and the directive will handled in `LexIdentifierContinue`. Unfortunately, the EOF may be encountered in `LexIdentifierContinue` and `CurLexer` might be destructed in `HandleEndOfFile`, If the code after `LexIdentifierContinue` try to access `LangOps` or other class members in this Lexer, it will hit undefined behavior. This patch also fix the use-after-free issue in Lexer by introduce a mechanism to delay the destruction of `CurLexer` in `Preprocessor` class. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
Mark [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3) as implemented. This paper was implemented in llvm#173789. Signed-off-by: yronglin <yronglin777@gmail.com>
Mark [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3) as implemented. This paper was implemented in llvm#173789. Signed-off-by: yronglin <yronglin777@gmail.com>
Mark [P1857R3 Modules Dependency Discovery](https://wg21.link/p1857r3) as implemented. This paper was implemented in llvm/llvm-project#173789. Signed-off-by: yronglin <yronglin777@gmail.com>
This patch fix this compile time regression that introduced in #173789. - Introduce a `TokenFlag::PhysicalStartOfLine` flag to replace `IsAtPhysicalStartOfLine` in a brunch of `Lexer` member functions and remove `ExportContextualKeywordInfo` struct. - Handle `import`, `module` and `export` keyword in `HandleIdentifier` instead of in a `Lexer` hot path. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
…77153) This patch fix this compile time regression that introduced in llvm/llvm-project#173789. - Introduce a `TokenFlag::PhysicalStartOfLine` flag to replace `IsAtPhysicalStartOfLine` in a brunch of `Lexer` member functions and remove `ExportContextualKeywordInfo` struct. - Handle `import`, `module` and `export` keyword in `HandleIdentifier` instead of in a `Lexer` hot path. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
This patch fix this compile time regression that introduced in llvm#173789. - Introduce a `TokenFlag::PhysicalStartOfLine` flag to replace `IsAtPhysicalStartOfLine` in a brunch of `Lexer` member functions and remove `ExportContextualKeywordInfo` struct. - Handle `import`, `module` and `export` keyword in `HandleIdentifier` instead of in a `Lexer` hot path. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
| // Consume the pp-import-suffix and expand any macros in it now. We'll add | ||
| // it back into the token stream later. | ||
| CollectPPImportSuffix(DirToks); | ||
| End = DirToks.back().getLocation(); |
There was a problem hiding this comment.
This is effectively dead code, following this block the next if/else statement sets End in both branches.
As far as I can tell the first assignment to End is also dead.
We should move the declaration of End down below here and set it using an IIFE to avoid having to declare it and then set it.
The patch reapply #173130.
This patch implement the following papers:
P1857R3 Modules Dependency Discovery.
P3034R1 Module Declarations Shouldn’t be Macros.
CWG2947.
At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:
Otherwise the token is treated as an identifier.
Additionally:
After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line).
This patch also introduce 2 new keyword
__preprocessed_moduleand__preprocessed_import. These 2 keyword was generated during-Emode. This is useful to avoid confusion withmoduleandimportkeyword in preprocessed output:Fixes #54047
The previous patch has an use-after-free issue in Lexer::LexTokenInternal function. Since C++20, the
export,importandmoduleidentifiers may be an introducer of a C++ module declaration/importing directive, and the directive will handled inLexIdentifierContinue. Unfortunately, the EOF may be encountered inLexIdentifierContinueandCurLexermight be destructed inHandleEndOfFile, If the code afterLexIdentifierContinuetry to accessLangOpsor other class members in this Lexer, it will hit undefined behavior.This patch also fix the use-after-free issue in Lexer by introduce a mechanism to delay the destruction of
CurLexerinPreprocessorclass.