[hipDNN] Heuristic Policy Framework + Default Plugins (RFC 0007 - Part 2/3)#6605
Merged
cderb merged 136 commits intoMay 20, 2026
Merged
Conversation
…rt 1/3) This PR establishes the plugin infrastructure for RFC 0007's heuristic framework: - Plugin SDK API extensions for policy metadata - Device properties serialization using FlatBuffers - Plugin loading and lifecycle management - Handle integration for plugin resource manager - Shared utilities for engine ordering Key Components: - HeuristicsPluginApi.h: Policy metadata API (GetPolicyId, GetPolicyName) - DeviceProperties.*: FlatBuffer serialization for device properties - HeuristicPlugin*: Plugin loading, resource management, validation - Mock plugins for unit testing Testing: - Mock-based unit tests for plugin lifecycle - Device properties serialization tests - No real plugins yet (Part 2) Part 2: TBD (Heuristic Framework + Default Plugins) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- HeuristicPluginManager.hpp: Added RFC 0007 to policy consistency validation comments - device_properties.fbs: Added RFC 0007 to FlatBuffer schema documentation - Regenerated device_properties_generated.h (both v24_12_23 and v25_9_23) from updated schema
The test file was using HipdnnException but missing the header include, causing compilation errors. This include is required for exception handling tests.
Backend Implementation - Heuristic Framework: Outer loop policy selection in EngineHeuristicDescriptor with configurable policy order - Plugin Infrastructure: Plugin discovery, loading, and lifecycle management - Device Properties: FlatBuffers serialization for device info communication - Backend API: Policy enumeration (hipdnnBackendGetHeuristicPolicyCount/Info) Default Plugins (shipped in lib/hipdnn_plugins/heuristics/) - Config: Honors graph-level preferred_engine_id - StaticOrdering: Applies deterministic sortEngineIds() for consistent ordering Policy Resolution Cascade Priority: descriptor attribute > env variable > default - Descriptor: HIPDNN_ATTR_ENGINEHEUR_POLICY_ORDER (per-finalize) - Environment: HIPDNN_HEURISTIC_POLICY_ORDER (process-wide) - Default: Config → StaticOrdering Key Features - First-success-wins outer loop (no fallback to sortEngineIds()) - Deduplication for efficient setDeviceProperties() calls - C ABI plugins for cross-compiler compatibility Testing - 3 new backend test files (~900 lines) - Updated existing tests with mock infrastructure Note: Frontend wrapper API excluded (reserved for PR3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 'RFC 0007' designation to all RFC comments - Add TestHeuristicPluginIntegration.cpp to CMakeLists - Add TestHeuristicPolicyFramework.cpp to CMakeLists - Add TestStaticOrderingPolicy.cpp to CMakeLists - Add heuristics/TestDeviceProperties.cpp to CMakeLists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix namespace: hipdnn_flatbuffers_sdk → hipdnn_data_sdk in TestEngineHeuristicDescriptor - Add getHeuristicPluginResourceManager() mock method to MockHandle - Add missing DeviceProperties.hpp include to TestHeuristicPluginIntegration - Fix narrowing conversion in TestStaticOrderingPolicy (add second static_cast) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit addresses critical compilation and runtime issues in the RFC 0007 heuristic plugin framework: **Critical ABI Fixes:** - Add missing hipdnnHeuristicGetLastErrorString symbol to Config and StaticOrdering plugins - Export the symbol in both plugins' exports.map files - This was blocking plugin loading due to incomplete ABI implementation **API Versioning Decoupling:** - Introduce heuristic_api_version.h with independent versioning (1.0.0) - Update HeuristicPluginManager to validate against HIPDNN_HEURISTIC_API_VERSION_MAJOR - This properly implements RFC 0007's requirement for independent heuristic API versioning - Tests now validate plugins against correct heuristic API version **Plugin Infrastructure Improvements:** - Add IPlugin interface methods to HeuristicPlugin (name, version, type) - Implement GET_REQUIRED_SYMBOL macro with detailed error diagnostics - Convert loadPluginFromFile to return bool for success/failure tracking - Add plugin loading summary with success/failure counts - Enhance error messages with visual markers and actionable guidance **Test Infrastructure Fixes:** - Rename test suites to follow project conventions: - HeuristicPluginIntegrationTest → IntegrationHeuristicPlugin - HeuristicPolicyFrameworkTest → TestHeuristicPolicyFramework - StaticOrderingPolicyTest → TestStaticOrderingPolicy - Add getTestPluginDirectory() helper to locate plugins in build tree - Fix plugin loading in tests (binaries in bin/, plugins in lib/hipdnn_plugins/heuristics/) - Tests now create HeuristicPluginResourceManager properly **Code Quality:** - Add const qualifiers to immutable variables - Mark static methods as static - Add missing includes (filesystem, array) - Replace C-style arrays with std::array - Add NOLINT annotations for intentional macro usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked improvements from PR2 that apply to PR1 infrastructure: **Plugin SDK: Heuristic API Versioning** - Add heuristic_api_version.h with independent versioning (1.0.0) - Decouples heuristic plugin ABI version from backend version - Implements RFC 0007 requirement for independent API versioning **Plugin Infrastructure Improvements:** - Add IPlugin interface methods to HeuristicPlugin (name, version, type) - Implement GET_REQUIRED_SYMBOL macro with detailed error diagnostics - Update HeuristicPluginManager to validate against HIPDNN_HEURISTIC_API_VERSION_MAJOR - Improve error messages with visual markers and actionable guidance - Add explicit nullptr checks for better safety **Code Quality:** - Add const qualifiers to immutable variables - Replace C-style parameter names (component_prefix → componentPrefix) - Mark default constructors with = default - Remove unnecessary static_cast in DeviceProperties - Use uniform initialization where appropriate - Add missing includes (array) **Test Improvements:** - Use std::array instead of C-style arrays for safety - Add const to test helper methods - Improve const correctness in test cases These changes improve the Plugin SDK foundation and maintain consistency between PR1 (infrastructure) and PR2 (implementation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds comprehensive unit test coverage for the Plugin SDK components introduced in RFC 0007 Part 1: - TestHeuristicPlugin.cpp: 27 tests for plugin lifecycle and mock interactions - TestHeuristicPluginManager.cpp: 16 tests for plugin discovery and validation - TestHeuristicPluginResourceManager.cpp: 23 tests for resource management Total: 62 new tests, all passing (2495/2495 tests) Coverage includes: - Plugin loading and symbol resolution - API version and policy ID validation - Handle and policy descriptor lifecycle - Device properties propagation - Configuration APIs (paths, unloading modes, log levels) - Error handling and edge cases - Resource cleanup and move semantics
- Remove trailing whitespace from RFC 0007 markdown file - Regenerate FlatBuffer files with updated reflection API - Adds MiniReflectTypeTable() method - Adds DevicePropertiesTypeTable() implementation - Updates both v24 and v25 generated headers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…7/pr1-plugin-sdk-infrastructure
This commit addresses the 159 uncovered lines in HeuristicPlugin.cpp by adding
integration tests with real plugin implementations, achieving ~90% coverage of
the plugin loading and symbol resolution infrastructure.
Previous state:
- Only mock-based unit tests existed (26 tests)
- HeuristicPlugin.cpp had 159 uncovered lines
- No testing of actual C ABI boundary crossing or symbol resolution
New additions:
- 3 test plugin implementations (good, incomplete, no-optional)
- 22 integration tests for real plugin loading scenarios
- LoadedGoodPluginFixture to eliminate test boilerplate
- Tests for error handling, optional symbols, and complete workflows
Test coverage improvements:
- Symbol resolution (resolveSymbols) - previously untested
- HeuristicPlugin constructor with SharedLibrary - previously untested
- Error handling for missing/incomplete symbols - previously untested
- Optional symbol handling (tryAssignSymbol) - previously untested
- Real C ABI calls across plugin boundary - previously untested
Code cleanup:
- Removed 17 trivial mock tests that only verified gmock mechanics
- Reduced test code by 264 lines (28%) through fixture reuse
- Focused tests on meaningful behavior verification
Files changed:
- backend/tests/TestHeuristicPlugin.cpp: 467 → 233 lines (removed trivial tests)
- backend/tests/TestHeuristicPluginLoading.cpp: 433 lines (new integration tests)
- tests/test_plugins/Test{Good,Incomplete,NoOptional}HeuristicPlugin.cpp: 3 new plugins
- backend/tests/CMakeLists.txt: Added test plugin dependencies and defines
- tests/test_plugins/CMakeLists.txt: Added 3 heuristic plugin targets
- plugin_sdk/.../heuristic_api_version.h: Added NOLINT suppressions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactored static state management from a function-local static struct to file-scope static variables with proper mutex protection, matching the proven pattern from EnginePluginResourceManager.
Make GetPolicyName a required function in the heuristic plugin API. Remove GetPolicyId - policy IDs are now automatically computed from policy names using engineNameToId(). This enforces the invariant engineNameToId(policyName) == policyId by construction, eliminating the possibility of inconsistent ID/name pairs. Changes: - Make hipdnnHeuristicGetPolicyName required (was optional) - Remove hipdnnHeuristicGetPolicyId from plugin API - Compute policy IDs from names in HeuristicPlugin::policyId() - Simplify validation in HeuristicPluginManager (only check non-empty name) - Update HeuristicsPluginApi.h documentation - Update test plugins to provide policy names and remove GetPolicyId - Fix test expectations to compute IDs from names - Fix test suite naming (TestHeuristicPluginIntegration → IntegrationHeuristicPlugin) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the device properties system to capture and serialize GPU architecture names (e.g., "gfx90a", "gfx942", "gfx1100") from HIP runtime. This enables heuristic plugins to make architecture-specific decisions across both GCN and LLVM/RDNA GPU families. Changes: - Add architecture_name field to device_properties.fbs schema - Extract hipProps.gcnArchName in queryDeviceProperties() - Include architecture name in FlatBuffer serialization/deserialization - Add comprehensive test coverage for architecture name handling - Update all device properties tests to verify architectureName field - Test multiple architectures (gfx90a, gfx942, gfx1100) The architecture name is optional and backward-compatible - existing serialized buffers without this field will deserialize with an empty architecture name. All 2564 backend tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… users/cderb/rfc0007/pr2-heuristic-framework-plugins
A throwing setDeviceProperties used to abort finalize() before any policy ran. Mirror the policy loop's fail-soft contract: log a warning, null the slots backed by the failing plugin handle, and continue. Other plugins' policies remain selectable; with no surviving slots the existing "no policy succeeded" throw fires. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the hipdnnGetEngineInfo_ext API: a single plugin can register multiple policies, so callers need the plugin (library) name alongside the policy name to attribute and group results. Adds pluginName / pluginNameLen parameters to the C API, a pluginName field on HeuristicPolicyInfo populated from HeuristicPlugin::name(), and includes it in the resource manager's toString(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a Heuristic Policy Selection section covering HIPDNN_HEUR_POLICY_ORDER, HIPDNN_HEUR_CONFIG_PATH, and HIPDNN_HEUR_FALLBACK_ENGINE_ORDER with the resolution priority and decline behavior matching the current implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The frontend resolves preferred_engine_id as a post-hoc reorder of the heuristic-ranked engine configs; an unknown ID is silently ignored and execution should still succeed via the heuristic fallback. The test previously left shouldSucceed unset, so no assertion ran on execute(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… test" This reverts commit 4f92332.
Capture the heuristic-ranked engine list before plan creation and derive the expected execute() outcome from the top-ranked engine. The frontend's preferred-ID setter is a post-hoc reorder, so a nonexistent ID is a no-op and execute() must follow rankedEngineIds[0]. Also verify the nonexistent ID is genuinely absent from the candidate list so the test still exercises the intended fallback path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…7/pr2-heuristic-framework-plugins
CompleteWorkflowWithDevicePropertiesAndFinalize and FinalizeWithEmptyEngineIdsSucceeds picked their target plugin via policyInfos[0], which is materialized from an unordered_map iteration in HeuristicPluginResourceManager. The first entry varies by platform; on Windows it landed on a built-in policy whose finalize() rejects the synthetic graph payload, surfacing as a NOT_INITIALIZED throw from getSortedEngineIds. Look the test plugin up by its known policy ID, as CompleteHandleLifecycleWithGoodPlugin already does. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Several tests still drove their target plugin through policyInfos[0], which is materialized from an unordered_map iteration in HeuristicPluginResourceManager and is therefore platform-dependent. Switch each to look up its policy by name: the IntegrationHeuristicPlugin suite targets TestGoodHeuristicPolicy (the loaded test plugin), and IntegrationHeuristicPolicyPlugins targets the StaticOrdering built-in, which is always registered regardless of which vendor plugins are present. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…7/pr2-heuristic-framework-plugins
- HeuristicPluginResourceManager.cpp — Constructor rolls back partial map entries on per-plugin init failure, move-assign destroys current handles before overwriting, and safeDestroyHandle is hoisted to share one destruction path.
- SelectionHeuristic.cpp — Move-ctor/move-assign now carry _inputEngineIds and empty NOLINT catches are replaced with logged std::exception/... handlers.
- EngineHeuristicDescriptor.cpp — syncPolicySlots pushes a null slot on SelectionHeuristic construction failure and finalize short-circuits via findFirst before fetching device props.
- PluginCore.hpp — Fixes loadPluginFromFile reporting success on validation failure and moves the function to protected: so a test probe can expose it.
- HeuristicPlugin.cpp — Adds a safeStringView helper that closes the UB of constructing std::string_view{nullptr} when a plugin returns a null const char*.
- IntegrationHeuristicPolicyPlugins.cpp — Three leak-on-failure fixes via new makeScopedHipdnnHandle / makeScopedPolicyDescriptor RAII helpers across the handle, policy-descriptor, and multi-manager tests.
- BuiltInLogging.hpp (new) — Introduces a single shared HIPDNN_BUILTIN_HEURISTIC_LOG macro and buffer-size constant replacing two divergent per-file copies.
- ConfigBuiltIn.cpp + StaticOrderingBuiltIn.cpp — Switch to the shared logging macro, document the last-writer-wins identity contract of the file-scope globals, and NOLINT the g_-prefixed names against VariableCase = camelBack.
- TestHeuristicPluginManagerValidationPaths.cpp — Adds a LoadPluginFromFileProbe subclass and two regression tests covering both the false-on-validation-failure and true-on-success (idempotent-reload) paths.
- TestSelectionHeuristic.cpp — MoveConstructor and MoveAssignment now exercise setEngineIds -> move -> getSortedEngineIds to pin the _inputEngineIds fix.
- HipdnnBackend.cpp — hipdnnSetPluginUnloadingMode_ext now applies to the heuristic resource manager in addition to the engine manager.
- HeuristicsPluginApi.h — Strengthens hipdnnHeuristicPolicyGetSortedEngineIds doxygen to spell out the mandatory two-call pattern and silent-truncation behavior.
- 0007_EngineSelectionHeuristicsFramework.md — Renames engineNameToId to policyNameToId and rewrites §5.3.3/§5.3.4 so env is highest precedence, handle-level default is explicitly unimplemented, and unknown IDs yield null slots.
- EngineOverrideConfig.hpp — Replaces the hand-rolled trim in loadFromEnv with hipdnn_data_sdk::utilities::trim.
- IntegrationHeuristicPlugin.cpp — Comment-only cleanup removing stale references to policyInfos[0] ordering.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…7/pr2-heuristic-framework-plugins
Apply clang-format to files touched in the previous commit to satisfy CI format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
BrianHarrisonAMD
approved these changes
May 20, 2026
Contributor
There was a problem hiding this comment.
Approving, I think its mostly good with a couple minor things to address before merge.
Make sure to do a final update from latest develop, get a passing CI run, and do a manual ASAN test to verify no new leaks exist from this change before merging.
Thanks for all the work on this!
…7/pr2-heuristic-framework-plugins
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Overview
This PR implements Part 2 of RFC 0007 (Engine Selection Heuristics), building on the plugin infrastructure from Part 1 (#6467). It delivers the policy orchestration framework via EngineHeuristicDescriptor, plus two default plugins (Config and StaticOrdering) that replace the legacy
sortEngineIds()behavior.Key Changes:
EngineHeuristicDescriptor::finalize()(iterates over policies until one succeeds)SelectionHeuristicwrapper providing typed C++ interface over policy descriptorshipdnnGetHeuristicPolicyCount_ext,hipdnnGetHeuristicPolicyInfo_ext)EngineOrdering.hpputility in data_sdk for code reuseStatistics:
lib/hipdnn_plugins/heuristics/Core Components
1. EngineHeuristicDescriptor - Policy Orchestration
Purpose: Orchestrates multiple heuristic policies, invoking each in priority order until one succeeds.
Implementation (
backend/src/descriptors/EngineHeuristicDescriptor.cpp:201-254):Policy Slot Management:
_orderedPolicyIds: Policy IDs in priority order (lower priority number = higher precedence)_policySlots: CorrespondingSelectionHeuristicwrappers (one per policy)2. SelectionHeuristic - Policy Wrapper
Purpose: C++ facade for one policy slot, wrapping
hipdnnHeuristicPolicyDescriptor_t.Key Methods (
backend/src/heuristics/SelectionHeuristic.cpp):setEngineIds(): Sets candidate engines on policy descriptorsetSerializedGraph(): Passes serialized OperationGraph to policyfinalize(): Invokes policy's logic viahipdnnHeuristicPolicyDescriptorFinalize()getSortedEngineIds(): Retrieves result from policyLifecycle: Owned by EngineHeuristicDescriptor, created when policy list is established.
3. Config Plugin
Purpose: Reserved for user-specified engine preferences (e.g., environment variables).
Current Implementation (
plugins/heuristics/config/ConfigPlugin.cpp):HIPDNN_PREFERRED_ENGINEenv var supportRationale for shipping empty:
4. StaticOrdering Plugin
Purpose: Preserves legacy
sortEngineIds()behavior.Implementation (
plugins/heuristics/static_ordering/StaticOrderingPlugin.cpp):Algorithm (delegates to shared
EngineOrdering.hpp):5. Query APIs
Purpose: Allow users to introspect registered heuristic policies.
New Functions (
backend/include/hipdnn_backend.h,backend/src/HipdnnBackend.cpp):Architecture Highlights
Shared EngineOrdering.hpp
What: Header-only sorting logic in
data_sdk/include/hipdnn_data_sdk/utilities/EngineOrdering.hppWhy:
Backend Delegation (
backend/src/utilities/EngineOrdering.cpp):Plugin Usage (
plugins/heuristics/static_ordering/StaticOrderingPlugin.cpp):Testing & Validation
Test Files Added
TestHeuristicPolicyFramework.cpp (18 tests):
TestHeuristicPolicyPlugins.cpp (14 tests):
lib/hipdnn_plugins/heuristics/TestStaticOrderingPolicy.cpp (13 tests):
Files Modified
Backend Framework
Data SDK
Plugins
Tests
Notable Design Decisions
1. Policy Orchestration in Descriptor finalize()
Decision: Implement policy loop in
EngineHeuristicDescriptor::finalize()rather than a separate orchestrator class.Rationale:
2. Empty Config Plugin Ships in Production
Decision: Ship Config plugin with no-op implementation (finalize returns false).
Rationale:
3. EngineOrdering.hpp Restoration
Decision: Restore
sortEngineIds()as header-only in data_sdk.Context: Revert commit (6a24652) removed it from Part 1, breaking StaticOrdering plugin compilation.
Rationale:
Follow-Up Work
Part 3: Frontend Integration (Next PR)
Scope: Add frontend C++ APIs for heuristic usage.
Expected Features:
Graph::getEngineHeuristic()for creating heuristic descriptorsKey Files to Review
backend/src/descriptors/EngineHeuristicDescriptor.cpp- Policy orchestration loopbackend/src/heuristics/SelectionHeuristic.cpp- Policy wrapper/facadeplugins/heuristics/static_ordering/StaticOrderingPlugin.cpp- Legacy behavior portbackend/src/HipdnnBackend.cpp- Query API implementationResolves: RFC 0007 Part 2/3
Depends On: #6467 (Plugin SDK Infrastructure)
Enables: Part 3 frontend integration, custom heuristic plugins