From aa0df91bad545448decdcf01caefe3dde2c77df3 Mon Sep 17 00:00:00 2001 From: Theresa Foley Date: Wed, 6 Aug 2025 13:21:23 -0700 Subject: [PATCH] Enable on-demand deserialization of AST decls Overview -------- This change basically just flips a `#define` switch to enable the changes that were already checked in with PR #7482. That earlier change added the infrastructure required to do on-demand deserialization, but it couldn't be enabled at the time due to problematic interactions with the approach to AST node deduplication that was in place. PR #8072 introduced a new approach to AST node deduplication that eliminates the problematic interaction, and thus unblocks this feature. Impact ------ Let's look at some anecdotal performance numbers, collected on my dev box using a `hello-world.exe` from a Release x64 Windows build. The key performance stats from a build before this change are: ``` [*] loadBuiltinModule 1 254.29ms [*] checkAllTranslationUnits 1 6.14ms ``` After this change, we see: ``` [*] loadBuiltinModule 1 91.75ms [*] checkAllTranslationUnits 1 11.40ms ``` This change reduces the time spent in `loadBuiltinModule()` by just over 162ms, and increases the time spent in `checkAllTranslationUnits()` by about 5.25ms (the time spent in other compilation steps seems to be unaffected). Because `loadBuiltinModule()` is the most expensive step for trivial one-and-done compiles like this, reducing its execution time by over 60% is a big gain. For this example, the time spent in `checkAllTranslationUnits()` has almost doubled, due to operations that force AST declarations from the core module to be deserialized. Note, however, that in cases where multiple modules are compiled using the same global session, that extra work should eventually amortize out, because each declaration from the core module can only be demand-loaded once (after which the in-memory version will be used). Because of some unrelated design choices in the compiler, loading of the core module causes approximately 17% of its top-level declarations to be demand-loaded. After compiling the code for the `hello-world` example, approximately 20% of the top-level declarations have been demand-loaded. Further work could be done to reduce the number of core-module declarations that must always be deserialized, potentially reducing the time spent in `loadBuiltinModule()` further. The data above also implies that `loadBuiltinModule()` may include large fixed overheads, which should also be scrutinized further. Relationship to PR #7935 ------------------------ PR #7935, which at this time hasn't yet been merged, implements several optimizations to overall deserialization performance. On a branch with those optimizations in place (but not this change), the corresponding timings are: ``` [*] loadBuiltinModule 1 176.62ms [*] checkAllTranslationUnits 1 6.04ms ``` It remains to be seen how performance fares when this change and the optimizations in PR #7935 are combined. In principle, the two approaches are orthogonal, each attacking a different aspect of the performance problem. We thus expect the combination of the two to be better than either alone but, of course, testing will be required. --- source/slang/slang-serialize-ast.cpp | 45 ++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/source/slang/slang-serialize-ast.cpp b/source/slang/slang-serialize-ast.cpp index 261436e38a2..a288b3bd2b7 100644 --- a/source/slang/slang-serialize-ast.cpp +++ b/source/slang/slang-serialize-ast.cpp @@ -13,8 +13,49 @@ // #include "slang-serialize-ast.cpp.fiddle" +// By default, the declarations in a serialized AST module will be +// deserialized on-demand, in order to improve startup times. +// +// The on-demand loading logic understandably introduces more +// complexity, and it is possible that there will be debugging +// (or even deployment reasons) scenarios where it is desirable +// to be sure that all the AST nodes for a given module are +// fully deserialized by the time `readSerailizedModuleAST()` +// returns. For those cases, we provide a macro that can be +// used to force up-front loading. +// +// Note: this macro does *not* disable most of the infrastructure +// code related to on-demand loading; things like lookup on +// a `ContainerDecl` will still check for the on-demand loading +// case at runtime. All that setting this flag to `1` does is +// extend the "fixup" logic that runs when an AST node has been +// deserialized to also force deserialization of any direct +// member declarations of a `ContainerDecl` that has just been +// deserialized. +// +// The macro is being defined conditionally here, so that we +// have the option of introducing an option to control its +// value as part of configuration for the build of the compiler +// itself (if that ever becomes relevant). +// +#ifndef SLANG_DISABLE_ON_DEMAND_AST_DESERIALIZATION +#define SLANG_DISABLE_ON_DEMAND_AST_DESERIALIZATION 0 +#endif + +// In the case where on-demand deserialization is enabled, it +// can be helpful to know what fraction of the declarations +// from any given module end up getting deserialized (e.g., +// at the time this comment was written, compiling a small +// `.slang` file typically causes about 17-20% of the +// top-level declarations in the core module to get deserialized. +// +// Enabling this flag causes a message to be emitted every +// time a new top-level declaration gets deserialized for *any* +// module, so it generates a lot of output and is best seen +// as just a debugging option for use when trying to reduce +// the fraction of declarations that must be deserialized. +// #define SLANG_ENABLE_AST_DESERIALIZATION_STATS 0 -#define SLANG_DISABLE_ON_DEMAND_AST_DESERIALIZATION 1 FIDDLE() namespace Slang @@ -1645,7 +1686,7 @@ void ASTSerialReadContext::_cleanUpASTNode(NodeBase* node) #if SLANG_ENABLE_AST_DESERIALIZATION_STATS if (auto moduleDecl = as(decl->parentDecl)) { - auto& deserializedCount = _sharedContext->_deserializedTopLevelDeclCount; + auto& deserializedCount = _deserializedTopLevelDeclCount; deserializedCount++; Count totalCount = moduleDecl->getDirectMemberDeclCount();