Skip to content

[hipDNN] Plugin SDK + Device Properties Infrastructure (RFC 0007 - Part 1/3)#6467

Merged
cderb merged 44 commits into
developfrom
users/cderb/rfc0007/pr1-plugin-sdk-infrastructure
Apr 23, 2026
Merged

[hipDNN] Plugin SDK + Device Properties Infrastructure (RFC 0007 - Part 1/3)#6467
cderb merged 44 commits into
developfrom
users/cderb/rfc0007/pr1-plugin-sdk-infrastructure

Conversation

@cderb
Copy link
Copy Markdown
Contributor

@cderb cderb commented Apr 15, 2026

Motivation

hipDNN currently uses hard-coded engine selection heuristics embedded in the backend, making it difficult for users to customize selection behavior or experiment with new ordering strategies. This inflexibility limits performance tuning opportunities and prevents domain-specific optimizations.

RFC 0007 introduces a pluggable heuristics framework that allows users to:

  • Write custom engine selection policies as loadable plugins
  • Override default ordering logic without modifying hipDNN source
  • Experiment with performance tuning strategies (e.g., architecture-specific ordering, tuning database lookups, ML-based selection)
  • Chain multiple policies with fallback behavior

This PR (Part 1/3) establishes the foundational plugin infrastructure required for the heuristics framework, including the complete C ABI, backend wrappers, resource management, device property serialization format, and comprehensive test coverage. Integration tests with real plugins are included (originally scoped without full framework logic or default plugins).

Technical Overview

Architecture

The heuristics plugin system introduces a separate C ABI distinct from the existing engine plugin API:

  • Engine plugins provide compute implementations (e.g., convolution kernels)
  • Heuristic plugins provide engine ordering/selection logic
  • A single plugin library (.so on Linux, .dll on Windows) is either an engine plugin OR a heuristic plugin, never both

Plugin Infrastructure Sharing:

  • Both plugin types share common management code via PluginResourceManagerBase<> (CRTP pattern)
  • Separate static storage (paths, configs, plugin managers) for each type
  • Unified patterns for loading, unloading, path configuration, log level propagation

Key Components

1. Plugin SDK API (plugin_sdk/include/hipdnn_plugin_sdk/HeuristicsPluginApi.h)

Terminology Note: This document uses "policy name" in prose, policy_name for C API parameters, and policyName() for C++ method names.

Exported Functions (C ABI):

// Module metadata
hipdnnPluginStatus_t hipdnnHeuristicGetApiVersion(const char** version);
hipdnnPluginStatus_t hipdnnHeuristicGetPolicyName(const char** name); // Required (not optional)
hipdnnPluginStatus_t hipdnnHeuristicGetPluginVersion(const char** version);

// NOTE: Policy ID is computed by backend via engineNameToId(policy_name), not exported by plugin

// Logging (shared with engine plugins)
hipdnnPluginStatus_t hipdnnHeuristicSetLoggingCallback(hipdnnCallback_t callback);
hipdnnPluginStatus_t hipdnnHeuristicSetLogLevel(hipdnnSeverity_t level); // Optional
void hipdnnHeuristicGetLastErrorString(const char** error_str);

// Session lifecycle
hipdnnPluginStatus_t hipdnnHeuristicHandleCreate(hipdnnHeuristicHandle_t* handle);
hipdnnPluginStatus_t hipdnnHeuristicHandleDestroy(hipdnnHeuristicHandle_t handle);
hipdnnPluginStatus_t hipdnnHeuristicHandleSetDeviceProperties(
    hipdnnHeuristicHandle_t handle,
    const hipdnnPluginConstData_t* serialized_device_properties);

// Policy descriptor lifecycle
hipdnnPluginStatus_t hipdnnHeuristicPolicyDescriptorCreate(
    hipdnnHeuristicHandle_t handle,
    hipdnnHeuristicPolicyDescriptor_t* descriptor);
hipdnnPluginStatus_t hipdnnHeuristicPolicyDescriptorDestroy(
    hipdnnHeuristicPolicyDescriptor_t descriptor);

// Policy execution
hipdnnPluginStatus_t hipdnnHeuristicPolicySetEngineIds(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    const int64_t* engine_ids,
    size_t count);
hipdnnPluginStatus_t hipdnnHeuristicPolicySetSerializedGraph(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    const hipdnnPluginConstData_t* serialized_graph);
hipdnnPluginStatus_t hipdnnHeuristicPolicyFinalize(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    int32_t* out_applied);
hipdnnPluginStatus_t hipdnnHeuristicPolicyGetSortedEngineIds(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    int64_t* engine_ids,
    size_t* count);  // Type-consistent with SetEngineIds

Design Decisions:

  • Two-phase execution: SetEngineIds + SetSerializedGraphFinalizeGetSortedEngineIds
  • Session handles per hipdnnHandle store plugin state (caches, tuning data)
  • Policy descriptors per EngineHeuristicDescriptor isolate per-graph selection state
  • FlatBuffer serialization for device properties and operation graphs (cross-ABI compatibility)
  • One policy per plugin module: Each plugin library exports exactly one policy name. Policy ID is computed by backend using engineNameToId(policy_name) (RFC 0007 §5.3, §8.2)
  • Type consistency: Both SetEngineIds and GetSortedEngineIds use size_t for counts

2. Device Properties (backend/src/heuristics/DeviceProperties.{hpp,cpp}) (FlatBuffers Schema: flatbuffers_sdk/schemas/device_properties.fbs)

Purpose: Define serialization format for device characteristics to be passed to heuristic plugins.

FlatBuffer Schema (flatbuffers_sdk/schemas/device_properties.fbs):

table DeviceProperties {
    device_id: int = -1;               // HIP device ID
    multi_processor_count: int = 0;    // Number of compute units
    total_global_mem: ulong = 0;       // Total GPU memory (bytes)
    architecture_name: string;         // GPU arch (e.g., "gfx90a", "gfx942")
    // Future optional fields (additive evolution):
    // wavefront_size: int;
    // max_threads_per_block: int;
}

Versioning: Pre-generated headers for FlatBuffers v24.12.23 and v25.9.23 ensure compatibility across PyTorch and hipDNN builds.

Part 1 Includes:

  • FlatBuffer schema and generated headers (40 lines schema, ~270 lines per generated header)
  • ABI entry point: hipdnnHeuristicHandleSetDeviceProperties()
  • Backend method: setDevicePropertiesOnAllHandles(const hipdnnPluginConstData_t*)
  • Integration tests manually create DevicePropertiesT objects and serialize them using FlatBuffers API to validate the format

Revision During Review: Early commits included C++ helper functions in backend/src/heuristics/DeviceProperties.{cpp,hpp} with dedicated unit tests, removed in commit 1b97bb9. Current approach: Part 1 uses the FlatBuffers SDK DevicePropertiesT type directly (flatbuffers_sdk/schemas/device_properties.fbs). Integration tests demonstrate the serialization format works correctly. Future work may add helper functions for convenience, but automatic device property querying is deferred to Part 2.

3. Plugin Resource Management

Base Class (backend/src/plugin/PluginResourceManagerBase.hpp):

Purpose: CRTP base class providing shared infrastructure for both engine and heuristic plugin management.

Template Signature:

template <typename Derived, typename PluginManagerType, typename PluginType>
class PluginResourceManagerBase;

Shared Static Methods (via CRTP):

  • setPluginPaths(paths, loadingMode) - Configure plugin search directories
  • getPluginPaths() - Retrieve current search paths
  • setPluginUnloadingMode(mode) - Configure EAGER/LAZY unloading
  • setPluginLogLevel(level) - Propagate log level to all loaded plugins
  • getOrCreatePluginManager() - Lazy initialization with weak_ptr/persistent_ptr pattern

Shared Instance Method:

  • getLoadedPluginFiles(numPlugins, pluginPaths, maxStringLen) - Query loaded plugin file paths

Derived Class (backend/src/plugin/HeuristicPluginResourceManager.{hpp,cpp}):

Responsibilities:

  • Load heuristic plugin libraries from HIPDNN_HEURISTIC_PLUGIN_DIR
  • Validate plugin ABI compatibility and metadata
  • Manage plugin lifecycle per hipdnnHandle
  • Create hipdnnHeuristicHandle_t session objects
  • Resolve policy IDs to plugin modules
  • Enforce policy ID ↔ policy name consistency (RFC 0007 §11, §5.3.1)

Key Methods:

class HeuristicPluginResourceManager
    : public PluginResourceManagerBase<HeuristicPluginResourceManager,
                                       HeuristicPluginManager,
                                       HeuristicPlugin> {
    // Instance methods
    virtual hipdnnHeuristicHandle_t getHeuristicHandleForPolicyId(int64_t policyId) const;
    virtual const HeuristicPlugin* getPluginForPolicyId(int64_t policyId) const;
    virtual std::vector<HeuristicPolicyInfo> getHeuristicPolicyInfos() const;
    virtual void setDevicePropertiesOnAllHandles(const hipdnnPluginConstData_t* devicePropsSerialized) const;
    virtual std::string toString() const;  // Includes plugin paths and policy metadata

    // Inherited static methods (via CRTP base):
    static void setPluginPaths(...);
    static std::set<std::filesystem::path> getPluginPaths();
    static void setPluginUnloadingMode(...);
    static void setPluginLogLevel(...);

    // Compatibility aliases:
    static void setHeuristicPluginPaths(...) { setPluginPaths(...); }
    static std::set<std::filesystem::path> getHeuristicPluginPaths() { return getPluginPaths(); }
};

Error Handling:

  • Missing plugins: Warning logged, continues loading other plugins
  • ABI mismatch: Plugin rejected, logged as error
  • Duplicate policy IDs: Plugin rejected with HipdnnException
  • Destructor safety: Catches all exceptions (std::exception + ...) when destroying handles, logs warnings (not errors)

4. Plugin Wrapper (backend/src/plugin/HeuristicPlugin.{hpp,cpp})

Purpose: C++ RAII wrapper around C ABI plugin functions.

Lifecycle:

class HeuristicPlugin {
    SharedLibrary _lib;  // RAII dlopen/dlclose wrapper
    // Function pointers to plugin exports

    explicit HeuristicPlugin(SharedLibrary&& lib);
    ~HeuristicPlugin() override = default;

    // Metadata queries
    std::string_view apiVersion() const override;
    virtual int64_t policyId() const;  // Computed via engineNameToId(policyName()), cached at load
    virtual std::string_view policyName() const;  // Calls hipdnnHeuristicGetPolicyName
    virtual std::string_view pluginVersion() const;

    // Session management
    virtual hipdnnHeuristicHandle_t createHandle() const;
    virtual void destroyHandle(hipdnnHeuristicHandle_t handle) const;
    virtual void setDeviceProperties(hipdnnHeuristicHandle_t handle, const hipdnnPluginConstData_t* devicePropsSerialized) const;

    // Policy descriptor management
    virtual hipdnnHeuristicPolicyDescriptor_t createPolicyDescriptor(hipdnnHeuristicHandle_t pluginHandle) const;
    virtual void destroyPolicyDescriptor(hipdnnHeuristicPolicyDescriptor_t desc) const;

    // Policy execution
    virtual void setEngineIds(hipdnnHeuristicPolicyDescriptor_t desc, const int64_t* engineIds, size_t engineIdCount) const;
    virtual void setSerializedGraph(hipdnnHeuristicPolicyDescriptor_t desc, const hipdnnPluginConstData_t* serializedGraph) const;
    virtual bool finalize(hipdnnHeuristicPolicyDescriptor_t desc) const;
    virtual std::vector<int64_t> getSortedEngineIds(hipdnnHeuristicPolicyDescriptor_t desc) const;

    // Logging
    hipdnnPluginStatus_t setLoggingCallback(hipdnnCallback_t) const;
    hipdnnPluginStatus_t setLogLevel(hipdnnSeverity_t) const;  // May be missing
};

5. Plugin Manager (backend/src/plugin/HeuristicPluginManager.hpp)

Purpose: Manages loading and validation of heuristic plugins, inherits from PluginManagerBase<HeuristicPlugin>.

Responsibilities:

  • Scan directories for plugin libraries
  • Load and validate plugins via HeuristicPlugin wrapper
  • Validate API version compatibility (major version must match)
  • Enforce policy ID uniqueness
  • Provide plugin lookup by policy ID

Key Validation (in validateBeforeAdding()):

// Validate heuristic C ABI major version
if (Version{plugin.apiVersion()}.major != HIPDNN_HEURISTIC_API_VERSION_MAJOR) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "ABI version mismatch");
}

// Validate unique policy ID
if (_policyIds.find(policyId) != _policyIds.end()) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "Duplicate policy ID");
}

// Validate policy name is non-empty (RFC 0007 requirement)
if (policyName.empty()) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "Policy name required");
}

6. Handle Integration (backend/src/handle/Handle.{hpp,cpp})

Changes:

  • Handle::Handle() now instantiates HeuristicPluginResourceManager and loads plugins
  • Handle::~Handle() cleans up plugin resources via RAII
  • Added getHeuristicPluginResourceManager() accessor

Note: Part 1 provides the plugin resource manager infrastructure and device properties API (setDevicePropertiesOnAllHandles). Automatic device property querying (e.g., calling hipGetDeviceProperties) and propagation during handle initialization is not included in Part 1.

6. Environment Variables (docs/Environment.md)

Documented Variables:

  • HIPDNN_PLUGIN_DIR - Search directory for engine plugins (default: hipdnn_plugins/engines/)
  • HIPDNN_HEURISTIC_PLUGIN_DIR - Search directory for heuristic plugins (default: hipdnn_plugins/heuristics/)

Files Added/Modified

Plugin SDK (3 files):

  • plugin_sdk/include/hipdnn_plugin_sdk/HeuristicsPluginApi.h (404 lines, complete C ABI with full documentation)
  • plugin_sdk/include/hipdnn_plugin_sdk/heuristic_api_version.h (API version macro HIPDNN_HEURISTIC_API_VERSION)
  • plugin_sdk/include/hipdnn_plugin_sdk/PluginApiDataTypes.h (Modified: added HIPDNN_PLUGIN_TYPE_HEURISTIC and two new status codes)

Backend - Plugin Infrastructure (6 new files + 2 modified):

New files:

  • backend/src/plugin/HeuristicPlugin.{hpp,cpp} (158 + 234 = 392 lines, C++ wrapper around C ABI)
  • backend/src/plugin/HeuristicPluginManager.hpp (96 lines, inherits from PluginManagerBase)
  • backend/src/plugin/HeuristicPluginResourceManager.{hpp,cpp} (181 + 312 = 493 lines, inherits from PluginResourceManagerBase)
  • backend/src/plugin/PluginResourceManagerBase.hpp (337 lines, CRTP base for common plugin management)

Modified files:

  • backend/src/plugin/EnginePluginResourceManager.{hpp,cpp} (now inherits from PluginResourceManagerBase)
  • backend/src/handle/Handle.{hpp,cpp} (added HeuristicPluginResourceManager member)

FlatBuffers SDK (3 files, device properties schema migrated from data_sdk in commit 1b97bb9):

  • flatbuffers_sdk/schemas/device_properties.fbs (40 lines)
  • flatbuffers_sdk/include/.../v25_9_23/.../device_properties_generated.h (~269 lines)
  • flatbuffers_sdk/include/.../v24_12_23/.../device_properties_generated.h (~269 lines)

Backend Tests (6 files):

  • backend/tests/TestHeuristicPlugin.cpp (233 lines, mock-based unit tests)
  • backend/tests/TestHeuristicPluginManager.cpp (392 lines, manager unit tests)
  • backend/tests/TestHeuristicPluginResourceManager.cpp (475 lines, resource manager unit tests)
  • backend/tests/TestHeuristicPluginIntegration.cpp (957 lines, integration tests with real plugins)
  • backend/tests/descriptors/mocks/MockHeuristicPlugin.hpp (71 lines, gmock-based mock plugin)
  • backend/tests/descriptors/mocks/MockHeuristicPluginResourceManager.hpp (33 lines, gmock-based mock resource manager)

Test Plugins (3 files):

  • tests/test_plugins/TestGoodHeuristicPlugin.cpp (307 lines, complete working plugin)
  • tests/test_plugins/TestNoOptionalHeuristicPlugin.cpp (254 lines, plugin without optional functions)
  • tests/test_plugins/TestIncompleteHeuristicApiPlugin.cpp (137 lines, negative test missing required functions)

Documentation (1 file):

  • docs/Environment.md (Modified: documented HIPDNN_PLUGIN_DIR and HIPDNN_HEURISTIC_PLUGIN_DIR)

Total: ~5,000+ lines across 26 files (expanded from initial 24 files with addition of integration tests and test plugins)

Test Plan

Unit Tests (120 heuristic plugin tests)

TestHeuristicPlugin (9 tests)

  • Mock plugin lifecycle (create/destroy handles, descriptors)
  • Policy ID caching
  • Empty/long policy names handling
  • Return value handling (null handles/descriptors)

TestHeuristicPluginManager (30 tests)

  • Plugin loading from directories
  • Multiple plugin instances independence
  • Plugin validation (API version, policy IDs, policy names)
  • Environment variable path handling (HIPDNN_HEURISTIC_PLUGIN_DIR)
  • Policy ID uniqueness enforcement
  • Error handling (missing files, invalid plugins)
  • Default search paths
  • Multiple load/unload cycles

TestHeuristicPluginResourceManager (37 tests)

  • Factory method (create())
  • Move semantics (constructor/assignment)
  • Policy lookup by ID (handle and plugin pointer)
  • Device properties propagation API
  • Plugin path configuration (static methods via CRTP base)
  • Unloading modes (EAGER/LAZY)
  • Log level propagation
  • getLoadedPluginFiles() implementation
  • toString() with plugin information and file paths
  • Multiple instances coexistence
  • Destructor cleanup

IntegrationHeuristicPlugin (31 tests)

  • Real plugin loading from test plugins
  • Complete handle lifecycle
  • Policy descriptor workflow:
    • Create descriptor
    • Set engine IDs (candidate list)
    • Set serialized graph
    • Finalize policy
    • Get sorted engine IDs
  • Device properties serialization:
    • Create mock DevicePropertiesT objects
    • Serialize using FlatBuffers
    • Pass as hipdnnPluginConstData_t* to plugins
    • Validate no exceptions thrown
  • Optional function handling:
    • Plugins without hipdnnHeuristicGetPolicyName (returns empty string)
    • Plugins without hipdnnPluginSetLogLevel hipdnnHeuristicSetLogLevel (returns INVALID_VALUE)
  • Error handling:
    • Plugin with missing required functions (negative test)
    • Invalid device properties buffers
    • Empty engine ID lists

TestHeuristicPluginLoadedGood (13 tests)

  • Loaded plugin metadata queries
  • Policy workflow with real plugin
  • Policy ID consistency
  • Complete handle and descriptor lifecycle

Mock Classes: GoogleMock-based mocks (MockHeuristicPlugin, MockHeuristicPluginResourceManager) are provided for descriptor-level unit tests, though integration tests primarily use real test plugins.

Integration Testing (Included in Part 1)

Test Plugins Provided:

  1. TestGoodHeuristicPlugin - Complete working implementation

    • Implements all required and optional functions
    • Validates device properties format
    • Implements simple sorting logic
    • Policy name: "TestGoodHeuristicPolicy"
  2. TestNoOptionalHeuristicPlugin - Plugin without optional functions

    • Missing hipdnnHeuristicGetPolicyName (uses generated name from policy ID)
    • Missing hipdnnHeuristicSetLogLevel
    • Tests graceful degradation when optional functions absent
  3. TestIncompleteHeuristicApiPlugin - Negative test

    • Missing required function hipdnnHeuristicPolicyDescriptorCreate
    • Tests that backend rejects incomplete plugins

Part 1 Test Coverage:

  • Plugin library loading from HIPDNN_HEURISTIC_PLUGIN_DIR
  • Policy metadata queries (policy ID, name, version)
  • Session handle creation and destruction
  • Device properties serialization format (manually created DevicePropertiesT objects passed to plugins)
  • Engine ID sorting workflow (SetEngineIds → Finalize → GetSortedEngineIds)
  • Optional function handling
  • Error handling (incomplete plugins, invalid inputs)

Validation Checklist

Before merging, verify:

  • All backend tests pass - 2542 tests passed (+120 heuristic plugin tests, -11 removed TestDeviceProperties tests from baseline 2433)
  • Mock classes compile and link - Successfully built with hipdnn_backend_tests
  • FlatBuffer headers identical - v24.12.23 and v25.9.23 generate same code (modulo version strings)
  • Plugin SDK C/C++ compatible - Proper extern "C" guards, no C++ features in ABI
  • No symbol collisions - hipdnnHeuristic* vs hipdnnEnginePlugin* namespaces are distinct
  • RFC 0007 references consistent - All references include "RFC 0007" prefix
  • ASAN validation - All 2542 tests pass with AddressSanitizer, NO MEMORY LEAKS detected
  • CRTP base class tested - PluginResourceManagerBase shared between Engine and Heuristic managers
  • Type consistency - GetSortedEngineIds uses size_t* matching SetEngineIds
  • Error handling - Plugin destructors catch all exceptions (std::exception + ...), log warnings
  • Environment variables documented - HIPDNN_PLUGIN_DIR and HIPDNN_HEURISTIC_PLUGIN_DIR in docs/Environment.md
  • Integration tests with real plugins - 3 test plugins, 31 integration tests validating complete workflows
  • Device properties schema defined - FlatBuffer schema with pre-generated headers for both FlatBuffers versions

Migration Impact

Backward Compatibility: Full

  • Existing hipdnnHandle code works unchanged
  • Plugin loading is automatic and transparent
  • Missing HIPDNN_HEURISTIC_PLUGIN_DIR logs warning but continues
  • No changes to existing engine plugin behavior

API Stability:

  • Plugin SDK ABI: Versioned via hipdnnHeuristicGetApiVersion() (current: 1 HIPDNN_HEURISTIC_API_VERSION = "0.0.1")
  • Internal C++ APIs: Subject to change (not user-facing)

Scope

This PR (Part 1/3) provides:

  • Complete heuristic plugin C ABI definition
  • Backend plugin loading and management infrastructure
  • Device property serialization format (FlatBuffers schema)
  • Comprehensive test coverage with 120 tests including integration tests with real plugins

This PR does NOT include:

  • EngineHeuristicDescriptor::finalize() implementation (policy chaining)
  • Default heuristic plugins (StaticOrdering, Config, etc.)
  • Frontend API wrappers
  • Automatic device property querying or propagation

Follow-Up Work (Parts 2 & 3)

Part 2: Default Plugins

  • EngineHeuristicDescriptor::finalize() - Policy chaining logic (iterate plugins, merge results)
  • Default heuristic plugin implementations:
    • StaticOrdering plugin (user-specified engine order)
    • Config plugin (load policies from configuration files)
    • Additional default plugins TBD based on RFC 0007 scope
  • Integration with engine configuration workflow
  • End-to-end backend tests validating policy execution during graph finalization

Part 3: Frontend API & Documentation

  • Frontend C++ API wrappers for heuristic plugin management
  • Public API for configuring plugin paths and loading modes
  • Plugin development guide and tutorials
  • Example plugin implementations demonstrating common patterns
  • Performance benchmarking framework for comparing policies

References

  • RFC 0007: Engine Selection Heuristics Framework (internal documentation)
  • Section References:
    • §5.3: Policy order resolution
    • §6: Device properties
    • §7: Plugin API design
    • §11: Policy metadata and validation
    • §13: Serialization protocols

cderb and others added 3 commits April 15, 2026 11:21
…rt 1/3)

This PR establishes the plugin infrastructure for RFC 0007's heuristic framework:
- Plugin SDK API extensions for policy metadata
- Device properties serialization using FlatBuffers
- Plugin loading and lifecycle management
- Handle integration for plugin resource manager
- Shared utilities for engine ordering

Key Components:
- HeuristicsPluginApi.h: Policy metadata API (GetPolicyId, GetPolicyName)
- DeviceProperties.*: FlatBuffer serialization for device properties
- HeuristicPlugin*: Plugin loading, resource management, validation
- Mock plugins for unit testing

Testing:
- Mock-based unit tests for plugin lifecycle
- Device properties serialization tests
- No real plugins yet (Part 2)

Part 2: TBD (Heuristic Framework + Default Plugins)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- HeuristicPluginManager.hpp: Added RFC 0007 to policy consistency validation comments
- device_properties.fbs: Added RFC 0007 to FlatBuffer schema documentation
- Regenerated device_properties_generated.h (both v24_12_23 and v25_9_23) from updated schema
The test file was using HipdnnException but missing the header include,
causing compilation errors. This include is required for exception
handling tests.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 65.29032% with 269 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ers_sdk/data_objects/device_properties_generated.h 29.61% 106 Missing and 1 partial ⚠️
...kend/src/plugin/HeuristicPluginResourceManager.cpp 75.94% 44 Missing and 7 partials ⚠️
...ects/hipdnn/backend/src/plugin/HeuristicPlugin.cpp 77.24% 14 Missing and 19 partials ⚠️
...pdnn/backend/src/plugin/HeuristicPluginManager.hpp 29.27% 26 Missing and 3 partials ⚠️
...n/backend/src/plugin/PluginResourceManagerBase.hpp 79.29% 23 Missing and 6 partials ⚠️
...backend/src/plugin/EnginePluginResourceManager.cpp 80.56% 7 Missing ⚠️
...ects/hipdnn/backend/src/plugin/HeuristicPlugin.hpp 73.08% 6 Missing and 1 partial ⚠️
projects/hipdnn/backend/src/handle/Handle.cpp 25.00% 3 Missing ⚠️
...ojects/hipdnn/backend/src/plugin/SharedLibrary.cpp 60.00% 2 Missing ⚠️
.../include/hipdnn_plugin_sdk/heuristic_api_version.h 50.00% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (47.50%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6467      +/-   ##
===========================================
+ Coverage    65.87%   69.85%   +3.98%     
===========================================
  Files         1472     1828     +356     
  Lines       255621   270797   +15176     
  Branches     36134    35629     -505     
===========================================
+ Hits        168374   189140   +20766     
+ Misses       72731    69758    -2973     
+ Partials     14516    11899    -2617     
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (+0.02%) ⬆️ Carriedforward from 72a83c0
hipBLASLt 39.95% <ø> (-0.05%) ⬇️ Carriedforward from 72a83c0
hipCUB 82.21% <ø> (ø) Carriedforward from 72a83c0
hipDNN 80.27% <65.29%> (-0.20%) ⬇️
hipFFT 55.00% <ø> (?) Carriedforward from 72a83c0
hipRAND 76.12% <ø> (?) Carriedforward from 72a83c0
hipSOLVER ?
rocBLAS ?
rocFFT 47.50% <ø> (-2.42%) ⬇️ Carriedforward from 72a83c0
rocPRIM 38.96% <ø> (?) Carriedforward from 72a83c0
rocRAND 57.09% <ø> (?) Carriedforward from 72a83c0
rocSPARSE 71.52% <ø> (-0.13%) ⬇️ Carriedforward from 72a83c0
rocThrust 91.37% <ø> (?) Carriedforward from 72a83c0

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
projects/hipdnn/backend/src/handle/Handle.hpp 25.00% <ø> (ø)
projects/hipdnn/backend/src/logging/Logging.hpp 88.24% <ø> (ø)
...kend/src/plugin/HeuristicPluginResourceManager.hpp 100.00% <100.00%> (ø)
...nclude/hipdnn_plugin_sdk/PluginDataTypeHelpers.hpp 100.00% <100.00%> (ø)
.../include/hipdnn_plugin_sdk/heuristic_api_version.h 50.00% <50.00%> (ø)
...ojects/hipdnn/backend/src/plugin/SharedLibrary.cpp 73.81% <60.00%> (-1.19%) ⬇️
projects/hipdnn/backend/src/handle/Handle.cpp 69.70% <25.00%> (-6.17%) ⬇️
...backend/src/plugin/EnginePluginResourceManager.cpp 71.88% <80.56%> (-1.36%) ⬇️
...ects/hipdnn/backend/src/plugin/HeuristicPlugin.hpp 73.08% <73.08%> (ø)
...pdnn/backend/src/plugin/HeuristicPluginManager.hpp 29.27% <29.27%> (ø)
... and 4 more

... and 1200 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread projects/hipdnn/backend/src/handle/Handle.cpp
Comment thread projects/hipdnn/backend/src/heuristics/DeviceProperties.hpp Outdated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread projects/hipdnn/backend/src/heuristics/DeviceProperties.hpp Outdated
cderb and others added 2 commits April 15, 2026 23:53
Cherry-picked improvements from PR2 that apply to PR1 infrastructure:

**Plugin SDK: Heuristic API Versioning**
- Add heuristic_api_version.h with independent versioning (1.0.0)
- Decouples heuristic plugin ABI version from backend version
- Implements RFC 0007 requirement for independent API versioning

**Plugin Infrastructure Improvements:**
- Add IPlugin interface methods to HeuristicPlugin (name, version, type)
- Implement GET_REQUIRED_SYMBOL macro with detailed error diagnostics
- Update HeuristicPluginManager to validate against HIPDNN_HEURISTIC_API_VERSION_MAJOR
- Improve error messages with visual markers and actionable guidance
- Add explicit nullptr checks for better safety

**Code Quality:**
- Add const qualifiers to immutable variables
- Replace C-style parameter names (component_prefix → componentPrefix)
- Mark default constructors with = default
- Remove unnecessary static_cast in DeviceProperties
- Use uniform initialization where appropriate
- Add missing includes (array)

**Test Improvements:**
- Use std::array instead of C-style arrays for safety
- Add const to test helper methods
- Improve const correctness in test cases

These changes improve the Plugin SDK foundation and maintain consistency
between PR1 (infrastructure) and PR2 (implementation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
  Adds comprehensive unit test coverage for the Plugin SDK components
  introduced in RFC 0007 Part 1:

  - TestHeuristicPlugin.cpp: 27 tests for plugin lifecycle and mock interactions
  - TestHeuristicPluginManager.cpp: 16 tests for plugin discovery and validation
  - TestHeuristicPluginResourceManager.cpp: 23 tests for resource management

  Total: 62 new tests, all passing (2495/2495 tests)

  Coverage includes:
  - Plugin loading and symbol resolution
  - API version and policy ID validation
  - Handle and policy descriptor lifecycle
  - Device properties propagation
  - Configuration APIs (paths, unloading modes, log levels)
  - Error handling and edge cases
  - Resource cleanup and move semantics
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginManager.hpp Outdated
Comment thread projects/hipdnn/backend/src/utilities/EngineOrdering.cpp
cderb and others added 2 commits April 16, 2026 11:09
- Remove trailing whitespace from RFC 0007 markdown file
- Regenerate FlatBuffer files with updated reflection API
  - Adds MiniReflectTypeTable() method
  - Adds DevicePropertiesTypeTable() implementation
  - Updates both v24 and v25 generated headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cderb cderb force-pushed the users/cderb/rfc0007/pr1-plugin-sdk-infrastructure branch from a583877 to 853b064 Compare April 16, 2026 17:12
cderb added 7 commits April 16, 2026 12:48
  This commit addresses the 159 uncovered lines in HeuristicPlugin.cpp by adding
  integration tests with real plugin implementations, achieving ~90% coverage of
  the plugin loading and symbol resolution infrastructure.

  Previous state:
  - Only mock-based unit tests existed (26 tests)
  - HeuristicPlugin.cpp had 159 uncovered lines
  - No testing of actual C ABI boundary crossing or symbol resolution

  New additions:
  - 3 test plugin implementations (good, incomplete, no-optional)
  - 22 integration tests for real plugin loading scenarios
  - LoadedGoodPluginFixture to eliminate test boilerplate
  - Tests for error handling, optional symbols, and complete workflows

  Test coverage improvements:
  - Symbol resolution (resolveSymbols) - previously untested
  - HeuristicPlugin constructor with SharedLibrary - previously untested
  - Error handling for missing/incomplete symbols - previously untested
  - Optional symbol handling (tryAssignSymbol) - previously untested
  - Real C ABI calls across plugin boundary - previously untested

  Code cleanup:
  - Removed 17 trivial mock tests that only verified gmock mechanics
  - Reduced test code by 264 lines (28%) through fixture reuse
  - Focused tests on meaningful behavior verification

  Files changed:
  - backend/tests/TestHeuristicPlugin.cpp: 467 → 233 lines (removed trivial tests)
  - backend/tests/TestHeuristicPluginLoading.cpp: 433 lines (new integration tests)
  - tests/test_plugins/Test{Good,Incomplete,NoOptional}HeuristicPlugin.cpp: 3 new plugins
  - backend/tests/CMakeLists.txt: Added test plugin dependencies and defines
  - tests/test_plugins/CMakeLists.txt: Added 3 heuristic plugin targets
  - plugin_sdk/.../heuristic_api_version.h: Added NOLINT suppressions

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
 Refactored static state management from a function-local static struct to file-scope static variables with proper mutex protection, matching the proven pattern from EnginePluginResourceManager.
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPlugin.cpp Outdated
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginResourceManager.cpp Outdated
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginResourceManager.cpp Outdated
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginResourceManager.cpp Outdated
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginResourceManager.cpp Outdated
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPlugin.cpp
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPluginManager.hpp
Comment thread projects/hipdnn/backend/src/plugin/HeuristicPlugin.cpp
cderb added 7 commits April 21, 2026 14:12
    loading modes forHeuristicPluginResourceManager
    eager caching for heuristic policy id
    add HIPDNN_PLUGIN_TYPE_HEURISTIC
    type consistency for hipdnnHeuristicPolicyGetSortedEngineIds
unify implementation of common methods in base manager, EnginePluginResourceManager and HeuristicPluginResourceManager now extend this
HeuristicPluginResourceManager refinements
  made getLoadedPluginFiles part of the base class and removed getLoadedHeuristicPluginFiles
  added path info to toString
  updated safeDestroyHandle implementation
added env documentation
Copy link
Copy Markdown
Contributor

@BrianHarrisonAMD BrianHarrisonAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Feedback addressed.

Some minor clean-up to consider:

  • HeuristicsPluginApi.h:82-89 — empty @defgroup HeuristicPluginExtensions block (leftover; remove or populate).
  • HeuristicsPluginApi.h:297 — header names parameter num_engines, implementations name it count — consistent type,
    inconsistent naming.

@cderb
Copy link
Copy Markdown
Contributor Author

cderb commented Apr 23, 2026

Some minor clean-up to consider:

  • HeuristicsPluginApi.h:82-89 — empty @defgroup HeuristicPluginExtensions block (leftover; remove or populate).
  • HeuristicsPluginApi.h:297 — header names parameter num_engines, implementations name it count — consistent type,
    inconsistent naming.

I'll roll this into the part2 PR

@cderb cderb merged commit 9a8c2d8 into develop Apr 23, 2026
23 checks passed
@cderb cderb deleted the users/cderb/rfc0007/pr1-plugin-sdk-infrastructure branch April 23, 2026 20:11
aledudek pushed a commit that referenced this pull request May 20, 2026
…rt 1/3) (#6467)

## Motivation

hipDNN currently uses hard-coded engine selection heuristics embedded in
the backend, making it difficult for users to customize selection
behavior or experiment with new ordering strategies. This inflexibility
limits performance tuning opportunities and prevents domain-specific
optimizations.

**RFC 0007** introduces a **pluggable heuristics framework** that allows
users to:
- Write custom engine selection policies as loadable plugins
- Override default ordering logic without modifying hipDNN source
- Experiment with performance tuning strategies (e.g.,
architecture-specific ordering, tuning database lookups, ML-based
selection)
- Chain multiple policies with fallback behavior

This PR (Part 1/3) establishes the **foundational plugin
infrastructure** required for the heuristics framework, including the
complete C ABI, backend wrappers, resource management, device property
serialization format, and comprehensive test coverage. Integration tests
with real plugins are included (originally scoped without full framework
logic or default plugins).

## Technical Overview

### Architecture

The heuristics plugin system introduces a **separate C ABI** distinct
from the existing engine plugin API:
- **Engine plugins** provide compute implementations (e.g., convolution
kernels)
- **Heuristic plugins** provide engine ordering/selection logic
- A single plugin library (`.so` on Linux, `.dll` on Windows) is either
an engine plugin OR a heuristic plugin, never both

**Plugin Infrastructure Sharing**:
- Both plugin types share common management code via
`PluginResourceManagerBase<>` (CRTP pattern)
- Separate static storage (paths, configs, plugin managers) for each
type
- Unified patterns for loading, unloading, path configuration, log level
propagation

### Key Components

#### 1. Plugin SDK API
(`plugin_sdk/include/hipdnn_plugin_sdk/HeuristicsPluginApi.h`)

**Terminology Note**: This document uses "policy name" in prose,
`policy_name` for C API parameters, and `policyName()` for C++ method
names.

**Exported Functions** (C ABI):
```c
// Module metadata
hipdnnPluginStatus_t hipdnnHeuristicGetApiVersion(const char** version);
hipdnnPluginStatus_t hipdnnHeuristicGetPolicyName(const char** name); // Required (not optional)
hipdnnPluginStatus_t hipdnnHeuristicGetPluginVersion(const char** version);

// NOTE: Policy ID is computed by backend via engineNameToId(policy_name), not exported by plugin

// Logging (shared with engine plugins)
hipdnnPluginStatus_t hipdnnHeuristicSetLoggingCallback(hipdnnCallback_t callback);
hipdnnPluginStatus_t hipdnnHeuristicSetLogLevel(hipdnnSeverity_t level); // Optional
void hipdnnHeuristicGetLastErrorString(const char** error_str);

// Session lifecycle
hipdnnPluginStatus_t hipdnnHeuristicHandleCreate(hipdnnHeuristicHandle_t* handle);
hipdnnPluginStatus_t hipdnnHeuristicHandleDestroy(hipdnnHeuristicHandle_t handle);
hipdnnPluginStatus_t hipdnnHeuristicHandleSetDeviceProperties(
    hipdnnHeuristicHandle_t handle,
    const hipdnnPluginConstData_t* serialized_device_properties);

// Policy descriptor lifecycle
hipdnnPluginStatus_t hipdnnHeuristicPolicyDescriptorCreate(
    hipdnnHeuristicHandle_t handle,
    hipdnnHeuristicPolicyDescriptor_t* descriptor);
hipdnnPluginStatus_t hipdnnHeuristicPolicyDescriptorDestroy(
    hipdnnHeuristicPolicyDescriptor_t descriptor);

// Policy execution
hipdnnPluginStatus_t hipdnnHeuristicPolicySetEngineIds(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    const int64_t* engine_ids,
    size_t count);
hipdnnPluginStatus_t hipdnnHeuristicPolicySetSerializedGraph(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    const hipdnnPluginConstData_t* serialized_graph);
hipdnnPluginStatus_t hipdnnHeuristicPolicyFinalize(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    int32_t* out_applied);
hipdnnPluginStatus_t hipdnnHeuristicPolicyGetSortedEngineIds(
    hipdnnHeuristicPolicyDescriptor_t descriptor,
    int64_t* engine_ids,
    size_t* count);  // Type-consistent with SetEngineIds
```

**Design Decisions**:
- **Two-phase execution**: `SetEngineIds` + `SetSerializedGraph` →
`Finalize` → `GetSortedEngineIds`
- **Session handles** per `hipdnnHandle` store plugin state (caches,
tuning data)
- **Policy descriptors** per `EngineHeuristicDescriptor` isolate
per-graph selection state
- **FlatBuffer serialization** for device properties and operation
graphs (cross-ABI compatibility)
- **One policy per plugin module**: Each plugin library exports exactly
one policy name. Policy ID is computed by backend using
`engineNameToId(policy_name)` (RFC 0007 §5.3, §8.2)
- **Type consistency**: Both `SetEngineIds` and `GetSortedEngineIds` use
`size_t` for counts

#### 2. Device Properties
~~(`backend/src/heuristics/DeviceProperties.{hpp,cpp}`)~~ (FlatBuffers
Schema: `flatbuffers_sdk/schemas/device_properties.fbs`)

**Purpose**: Define serialization format for device characteristics to
be passed to heuristic plugins.

**FlatBuffer Schema** (`flatbuffers_sdk/schemas/device_properties.fbs`):
```fbs
table DeviceProperties {
    device_id: int = -1;               // HIP device ID
    multi_processor_count: int = 0;    // Number of compute units
    total_global_mem: ulong = 0;       // Total GPU memory (bytes)
    architecture_name: string;         // GPU arch (e.g., "gfx90a", "gfx942")
    // Future optional fields (additive evolution):
    // wavefront_size: int;
    // max_threads_per_block: int;
}
```

**Versioning**: Pre-generated headers for FlatBuffers v24.12.23 and
v25.9.23 ensure compatibility across PyTorch and hipDNN builds.

**Part 1 Includes**:
- FlatBuffer schema and generated headers (40 lines schema, ~270 lines
per generated header)
- ABI entry point: `hipdnnHeuristicHandleSetDeviceProperties()`
- Backend method: `setDevicePropertiesOnAllHandles(const
hipdnnPluginConstData_t*)`
- Integration tests manually create `DevicePropertiesT` objects and
serialize them using FlatBuffers API to validate the format

**Revision During Review**: Early commits included C++ helper functions
in `backend/src/heuristics/DeviceProperties.{cpp,hpp}` with dedicated
unit tests, removed in commit 1b97bb9. **Current approach**: Part 1
uses the FlatBuffers SDK `DevicePropertiesT` type directly
(`flatbuffers_sdk/schemas/device_properties.fbs`). Integration tests
demonstrate the serialization format works correctly. Future work may
add helper functions for convenience, but automatic device property
querying is deferred to Part 2.

#### 3. Plugin Resource Management

**Base Class** (`backend/src/plugin/PluginResourceManagerBase.hpp`):

**Purpose**: CRTP base class providing shared infrastructure for both
engine and heuristic plugin management.

**Template Signature**:
```cpp
template <typename Derived, typename PluginManagerType, typename PluginType>
class PluginResourceManagerBase;
```

**Shared Static Methods** (via CRTP):
- `setPluginPaths(paths, loadingMode)` - Configure plugin search
directories
- `getPluginPaths()` - Retrieve current search paths
- `setPluginUnloadingMode(mode)` - Configure EAGER/LAZY unloading
- `setPluginLogLevel(level)` - Propagate log level to all loaded plugins
- `getOrCreatePluginManager()` - Lazy initialization with
weak_ptr/persistent_ptr pattern

**Shared Instance Method**:
- `getLoadedPluginFiles(numPlugins, pluginPaths, maxStringLen)` - Query
loaded plugin file paths

**Derived Class**
(`backend/src/plugin/HeuristicPluginResourceManager.{hpp,cpp}`):

**Responsibilities**:
- Load heuristic plugin libraries from `HIPDNN_HEURISTIC_PLUGIN_DIR`
- Validate plugin ABI compatibility and metadata
- Manage plugin lifecycle per `hipdnnHandle`
- Create `hipdnnHeuristicHandle_t` session objects
- Resolve policy IDs to plugin modules
- Enforce policy ID ↔ policy name consistency (RFC 0007 §11, §5.3.1)

**Key Methods**:
```cpp
class HeuristicPluginResourceManager
    : public PluginResourceManagerBase<HeuristicPluginResourceManager,
                                       HeuristicPluginManager,
                                       HeuristicPlugin> {
    // Instance methods
    virtual hipdnnHeuristicHandle_t getHeuristicHandleForPolicyId(int64_t policyId) const;
    virtual const HeuristicPlugin* getPluginForPolicyId(int64_t policyId) const;
    virtual std::vector<HeuristicPolicyInfo> getHeuristicPolicyInfos() const;
    virtual void setDevicePropertiesOnAllHandles(const hipdnnPluginConstData_t* devicePropsSerialized) const;
    virtual std::string toString() const;  // Includes plugin paths and policy metadata

    // Inherited static methods (via CRTP base):
    static void setPluginPaths(...);
    static std::set<std::filesystem::path> getPluginPaths();
    static void setPluginUnloadingMode(...);
    static void setPluginLogLevel(...);

    // Compatibility aliases:
    static void setHeuristicPluginPaths(...) { setPluginPaths(...); }
    static std::set<std::filesystem::path> getHeuristicPluginPaths() { return getPluginPaths(); }
};
```

**Error Handling**:
- Missing plugins: Warning logged, continues loading other plugins
- ABI mismatch: Plugin rejected, logged as error
- Duplicate policy IDs: Plugin rejected with `HipdnnException`
- **Destructor safety**: Catches all exceptions (`std::exception` +
`...`) when destroying handles, logs warnings (not errors)

#### 4. Plugin Wrapper (`backend/src/plugin/HeuristicPlugin.{hpp,cpp}`)

**Purpose**: C++ RAII wrapper around C ABI plugin functions.

**Lifecycle**:
```cpp
class HeuristicPlugin {
    SharedLibrary _lib;  // RAII dlopen/dlclose wrapper
    // Function pointers to plugin exports

    explicit HeuristicPlugin(SharedLibrary&& lib);
    ~HeuristicPlugin() override = default;

    // Metadata queries
    std::string_view apiVersion() const override;
    virtual int64_t policyId() const;  // Computed via engineNameToId(policyName()), cached at load
    virtual std::string_view policyName() const;  // Calls hipdnnHeuristicGetPolicyName
    virtual std::string_view pluginVersion() const;

    // Session management
    virtual hipdnnHeuristicHandle_t createHandle() const;
    virtual void destroyHandle(hipdnnHeuristicHandle_t handle) const;
    virtual void setDeviceProperties(hipdnnHeuristicHandle_t handle, const hipdnnPluginConstData_t* devicePropsSerialized) const;

    // Policy descriptor management
    virtual hipdnnHeuristicPolicyDescriptor_t createPolicyDescriptor(hipdnnHeuristicHandle_t pluginHandle) const;
    virtual void destroyPolicyDescriptor(hipdnnHeuristicPolicyDescriptor_t desc) const;

    // Policy execution
    virtual void setEngineIds(hipdnnHeuristicPolicyDescriptor_t desc, const int64_t* engineIds, size_t engineIdCount) const;
    virtual void setSerializedGraph(hipdnnHeuristicPolicyDescriptor_t desc, const hipdnnPluginConstData_t* serializedGraph) const;
    virtual bool finalize(hipdnnHeuristicPolicyDescriptor_t desc) const;
    virtual std::vector<int64_t> getSortedEngineIds(hipdnnHeuristicPolicyDescriptor_t desc) const;

    // Logging
    hipdnnPluginStatus_t setLoggingCallback(hipdnnCallback_t) const;
    hipdnnPluginStatus_t setLogLevel(hipdnnSeverity_t) const;  // May be missing
};
```

#### 5. Plugin Manager (`backend/src/plugin/HeuristicPluginManager.hpp`)

**Purpose**: Manages loading and validation of heuristic plugins,
inherits from `PluginManagerBase<HeuristicPlugin>`.

**Responsibilities**:
- Scan directories for plugin libraries
- Load and validate plugins via `HeuristicPlugin` wrapper
- Validate API version compatibility (major version must match)
- Enforce policy ID uniqueness
- Provide plugin lookup by policy ID

**Key Validation** (in `validateBeforeAdding()`):
```cpp
// Validate heuristic C ABI major version
if (Version{plugin.apiVersion()}.major != HIPDNN_HEURISTIC_API_VERSION_MAJOR) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "ABI version mismatch");
}

// Validate unique policy ID
if (_policyIds.find(policyId) != _policyIds.end()) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "Duplicate policy ID");
}

// Validate policy name is non-empty (RFC 0007 requirement)
if (policyName.empty()) {
    throw HipdnnException(HIPDNN_STATUS_PLUGIN_ERROR, "Policy name required");
}
```

#### 6. Handle Integration (`backend/src/handle/Handle.{hpp,cpp}`)

**Changes**:
- `Handle::Handle()` now instantiates `HeuristicPluginResourceManager`
and loads plugins
- `Handle::~Handle()` cleans up plugin resources via RAII
- Added `getHeuristicPluginResourceManager()` accessor

**Note**: Part 1 provides the plugin resource manager infrastructure and
device properties API (`setDevicePropertiesOnAllHandles`). Automatic
device property querying (e.g., calling `hipGetDeviceProperties`) and
propagation during handle initialization is not included in Part 1.

#### 6. Environment Variables (`docs/Environment.md`)

**Documented Variables**:
- `HIPDNN_PLUGIN_DIR` - Search directory for engine plugins (default:
`hipdnn_plugins/engines/`)
- `HIPDNN_HEURISTIC_PLUGIN_DIR` - Search directory for heuristic plugins
(default: `hipdnn_plugins/heuristics/`)

### Files Added/Modified

**Plugin SDK** (3 files):
- `plugin_sdk/include/hipdnn_plugin_sdk/HeuristicsPluginApi.h` (404
lines, complete C ABI with full documentation)
- `plugin_sdk/include/hipdnn_plugin_sdk/heuristic_api_version.h` (API
version macro `HIPDNN_HEURISTIC_API_VERSION`)
- `plugin_sdk/include/hipdnn_plugin_sdk/PluginApiDataTypes.h` (Modified:
added `HIPDNN_PLUGIN_TYPE_HEURISTIC` and two new status codes)

**Backend - Plugin Infrastructure** (6 new files + 2 modified):

New files:
- `backend/src/plugin/HeuristicPlugin.{hpp,cpp}` (158 + 234 = 392 lines,
C++ wrapper around C ABI)
- `backend/src/plugin/HeuristicPluginManager.hpp` (96 lines, inherits
from `PluginManagerBase`)
- `backend/src/plugin/HeuristicPluginResourceManager.{hpp,cpp}` (181 +
312 = 493 lines, inherits from `PluginResourceManagerBase`)
- `backend/src/plugin/PluginResourceManagerBase.hpp` (337 lines, CRTP
base for common plugin management)

Modified files:
- `backend/src/plugin/EnginePluginResourceManager.{hpp,cpp}` (now
inherits from `PluginResourceManagerBase`)
- `backend/src/handle/Handle.{hpp,cpp}` (added
`HeuristicPluginResourceManager` member)

**FlatBuffers SDK** (3 files, device properties schema migrated from
data_sdk in commit 1b97bb9):
- `flatbuffers_sdk/schemas/device_properties.fbs` (40 lines)
-
`flatbuffers_sdk/include/.../v25_9_23/.../device_properties_generated.h`
(~269 lines)
-
`flatbuffers_sdk/include/.../v24_12_23/.../device_properties_generated.h`
(~269 lines)

**Backend Tests** (6 files):
- `backend/tests/TestHeuristicPlugin.cpp` (233 lines, mock-based unit
tests)
- `backend/tests/TestHeuristicPluginManager.cpp` (392 lines, manager
unit tests)
- `backend/tests/TestHeuristicPluginResourceManager.cpp` (475 lines,
resource manager unit tests)
- `backend/tests/TestHeuristicPluginIntegration.cpp` (957 lines,
integration tests with real plugins)
- `backend/tests/descriptors/mocks/MockHeuristicPlugin.hpp` (71 lines,
gmock-based mock plugin)
-
`backend/tests/descriptors/mocks/MockHeuristicPluginResourceManager.hpp`
(33 lines, gmock-based mock resource manager)

**Test Plugins** (3 files):
- `tests/test_plugins/TestGoodHeuristicPlugin.cpp` (307 lines, complete
working plugin)
- `tests/test_plugins/TestNoOptionalHeuristicPlugin.cpp` (254 lines,
plugin without optional functions)
- `tests/test_plugins/TestIncompleteHeuristicApiPlugin.cpp` (137 lines,
negative test missing required functions)

**Documentation** (1 file):
- `docs/Environment.md` (Modified: documented `HIPDNN_PLUGIN_DIR` and
`HIPDNN_HEURISTIC_PLUGIN_DIR`)

**Total**: ~5,000+ lines across 26 files (expanded from initial 24 files
with addition of integration tests and test plugins)

## Test Plan

### Unit Tests (120 heuristic plugin tests)

#### TestHeuristicPlugin (9 tests)
- Mock plugin lifecycle (create/destroy handles, descriptors)
- Policy ID caching
- Empty/long policy names handling
- Return value handling (null handles/descriptors)

#### TestHeuristicPluginManager (30 tests)
- Plugin loading from directories
- Multiple plugin instances independence
- Plugin validation (API version, policy IDs, policy names)
- Environment variable path handling (`HIPDNN_HEURISTIC_PLUGIN_DIR`)
- Policy ID uniqueness enforcement
- Error handling (missing files, invalid plugins)
- Default search paths
- Multiple load/unload cycles

#### TestHeuristicPluginResourceManager (37 tests)
- Factory method (`create()`)
- Move semantics (constructor/assignment)
- Policy lookup by ID (handle and plugin pointer)
- Device properties propagation API
- Plugin path configuration (static methods via CRTP base)
- Unloading modes (EAGER/LAZY)
- Log level propagation
- `getLoadedPluginFiles()` implementation
- `toString()` with plugin information and file paths
- Multiple instances coexistence
- Destructor cleanup

#### IntegrationHeuristicPlugin (31 tests)
- Real plugin loading from test plugins
- Complete handle lifecycle
- Policy descriptor workflow:
  - Create descriptor
  - Set engine IDs (candidate list)
  - Set serialized graph
  - Finalize policy
  - Get sorted engine IDs
- **Device properties serialization**:
  - Create mock `DevicePropertiesT` objects
  - Serialize using FlatBuffers
  - Pass as `hipdnnPluginConstData_t*` to plugins
  - Validate no exceptions thrown
- Optional function handling:
- Plugins without `hipdnnHeuristicGetPolicyName` (returns empty string)
- Plugins without ~~`hipdnnPluginSetLogLevel`~~
`hipdnnHeuristicSetLogLevel` (returns INVALID_VALUE)
- Error handling:
  - Plugin with missing required functions (negative test)
  - Invalid device properties buffers
  - Empty engine ID lists

#### TestHeuristicPluginLoadedGood (13 tests)
- Loaded plugin metadata queries
- Policy workflow with real plugin
- Policy ID consistency
- Complete handle and descriptor lifecycle

**Mock Classes**: GoogleMock-based mocks (`MockHeuristicPlugin`,
`MockHeuristicPluginResourceManager`) are provided for descriptor-level
unit tests, though integration tests primarily use real test plugins.

### Integration Testing (Included in Part 1)

**Test Plugins Provided**:
1. **TestGoodHeuristicPlugin** - Complete working implementation
   - Implements all required and optional functions
   - Validates device properties format
   - Implements simple sorting logic
   - Policy name: "TestGoodHeuristicPolicy"

2. **TestNoOptionalHeuristicPlugin** - Plugin without optional functions
- Missing `hipdnnHeuristicGetPolicyName` (uses generated name from
policy ID)
   - Missing `hipdnnHeuristicSetLogLevel`
   - Tests graceful degradation when optional functions absent

3. **TestIncompleteHeuristicApiPlugin** - Negative test
   - Missing required function `hipdnnHeuristicPolicyDescriptorCreate`
   - Tests that backend rejects incomplete plugins

**Part 1 Test Coverage**:
- Plugin library loading from `HIPDNN_HEURISTIC_PLUGIN_DIR`
- Policy metadata queries (policy ID, name, version)
- Session handle creation and destruction
- Device properties serialization format (manually created
`DevicePropertiesT` objects passed to plugins)
- Engine ID sorting workflow (SetEngineIds → Finalize →
GetSortedEngineIds)
- Optional function handling
- Error handling (incomplete plugins, invalid inputs)

### Validation Checklist

Before merging, verify:
- [x] **All backend tests pass** - 2542 tests passed (+120 heuristic
plugin tests, -11 removed TestDeviceProperties tests from baseline 2433)
- [x] **Mock classes compile and link** - Successfully built with
hipdnn_backend_tests
- [x] **FlatBuffer headers identical** - v24.12.23 and v25.9.23 generate
same code (modulo version strings)
- [x] **Plugin SDK C/C++ compatible** - Proper `extern "C"` guards, no
C++ features in ABI
- [x] **No symbol collisions** - `hipdnnHeuristic*` vs
`hipdnnEnginePlugin*` namespaces are distinct
- [x] **RFC 0007 references consistent** - All references include "RFC
0007" prefix
- [x] **ASAN validation** - All 2542 tests pass with AddressSanitizer,
**NO MEMORY LEAKS** detected
- [x] **CRTP base class tested** - `PluginResourceManagerBase` shared
between Engine and Heuristic managers
- [x] **Type consistency** - `GetSortedEngineIds` uses `size_t*`
matching `SetEngineIds`
- [x] **Error handling** - Plugin destructors catch all exceptions
(`std::exception` + `...`), log warnings
- [x] **Environment variables documented** - `HIPDNN_PLUGIN_DIR` and
`HIPDNN_HEURISTIC_PLUGIN_DIR` in `docs/Environment.md`
- [x] **Integration tests with real plugins** - 3 test plugins, 31
integration tests validating complete workflows
- [x] **Device properties schema defined** - FlatBuffer schema with
pre-generated headers for both FlatBuffers versions

## Migration Impact

**Backward Compatibility**: Full
- Existing `hipdnnHandle` code works unchanged
- Plugin loading is automatic and transparent
- Missing `HIPDNN_HEURISTIC_PLUGIN_DIR` logs warning but continues
- No changes to existing engine plugin behavior

**API Stability**:
- **Plugin SDK ABI**: Versioned via `hipdnnHeuristicGetApiVersion()`
(current: ~~1~~ `HIPDNN_HEURISTIC_API_VERSION = "0.0.1"`)
- **Internal C++ APIs**: Subject to change (not user-facing)

## Scope

This PR (Part 1/3) provides:
- Complete heuristic plugin C ABI definition
- Backend plugin loading and management infrastructure
- Device property serialization format (FlatBuffers schema)
- Comprehensive test coverage with 120 tests including integration tests
with real plugins

This PR does NOT include:
- `EngineHeuristicDescriptor::finalize()` implementation (policy
chaining)
- Default heuristic plugins (StaticOrdering, Config, etc.)
- Frontend API wrappers
- Automatic device property querying or propagation

## Follow-Up Work (Parts 2 & 3)

**Part 2: Default Plugins**
- `EngineHeuristicDescriptor::finalize()` - Policy chaining logic
(iterate plugins, merge results)
- Default heuristic plugin implementations:
  - StaticOrdering plugin (user-specified engine order)
  - Config plugin (load policies from configuration files)
  - Additional default plugins TBD based on RFC 0007 scope
- Integration with engine configuration workflow
- End-to-end backend tests validating policy execution during graph
finalization

**Part 3: Frontend API & Documentation**
- Frontend C++ API wrappers for heuristic plugin management
- Public API for configuring plugin paths and loading modes
- Plugin development guide and tutorials
- Example plugin implementations demonstrating common patterns
- Performance benchmarking framework for comparing policies

## References

- **RFC 0007**: Engine Selection Heuristics Framework (internal
documentation)
- **Section References**:
  - §5.3: Policy order resolution
  - §6: Device properties
  - §7: Plugin API design
  - §11: Policy metadata and validation
  - §13: Serialization protocols

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jason Campbell <jascampb@amd.com>
cderb added a commit that referenced this pull request May 20, 2026
…t 2/3) (#6605)

## PR Overview

This PR implements **Part 2 of RFC 0007** (Engine Selection Heuristics),
building on the plugin infrastructure from Part 1 (#6467). It delivers
the **policy orchestration framework** via EngineHeuristicDescriptor,
plus two **default plugins** (Config and StaticOrdering) that replace
the legacy `sortEngineIds()` behavior.

**Key Changes:**
- Policy orchestration logic in `EngineHeuristicDescriptor::finalize()`
(iterates over policies until one succeeds)
- `SelectionHeuristic` wrapper providing typed C++ interface over policy
descriptors
- Two default heuristic plugins: Config (placeholder for user
preferences) and StaticOrdering (legacy behavior)
- Query APIs for introspecting registered policies
(`hipdnnGetHeuristicPolicyCount_ext`,
`hipdnnGetHeuristicPolicyInfo_ext`)
- Shared `EngineOrdering.hpp` utility in data_sdk for code reuse

**Statistics:**
- **29 files changed**, 3,488 insertions(+), 55 deletions(-) (relative
to PR1)
- **45 new tests** across 3 test files (policy framework, plugins,
static ordering)
- **2 production plugins** shipping in `lib/hipdnn_plugins/heuristics/`

---

## Core Components

### 1. EngineHeuristicDescriptor - Policy Orchestration

**Purpose**: Orchestrates multiple heuristic policies, invoking each in
priority order until one succeeds.

**Implementation**
(`backend/src/descriptors/EngineHeuristicDescriptor.cpp:201-254`):
```cpp
bool success = false;
for (size_t i = 0; i < _policySlots.size(); ++i) {
    auto& selection = _policySlots[i];
    if (selection == nullptr) continue;  // Plugin not loaded

    try {
        selection->setEngineIds(candidates);
        selection->setSerializedGraph(&serializedGraph);

        if (!selection->finalize()) {
            continue;  // Policy declined
        }

        candidates = selection->getSortedEngineIds();
        success = true;
        break;  // First policy wins
    } catch (const HipdnnException& e) {
        continue;  // Policy failed, try next
    }
}

if (!success) {
    throw HipdnnException(..., "No heuristic policy succeeded.");
}
```

**Policy Slot Management**:
- `_orderedPolicyIds`: Policy IDs in priority order (lower priority
number = higher precedence)
- `_policySlots`: Corresponding `SelectionHeuristic` wrappers (one per
policy)
- Default ordering: Config (priority 100) before StaticOrdering
(priority 1000)

---

### 2. SelectionHeuristic - Policy Wrapper

**Purpose**: C++ facade for one policy slot, wrapping
`hipdnnHeuristicPolicyDescriptor_t`.

**Key Methods** (`backend/src/heuristics/SelectionHeuristic.cpp`):
- `setEngineIds()`: Sets candidate engines on policy descriptor
- `setSerializedGraph()`: Passes serialized OperationGraph to policy
- `finalize()`: Invokes policy's logic via
`hipdnnHeuristicPolicyDescriptorFinalize()`
- `getSortedEngineIds()`: Retrieves result from policy

**Lifecycle**: Owned by EngineHeuristicDescriptor, created when policy
list is established.

---

### 3. Config Plugin

**Purpose**: Reserved for user-specified engine preferences (e.g.,
environment variables).

**Current Implementation**
(`plugins/heuristics/config/ConfigPlugin.cpp`):
- **No-op placeholder**: Policy finalize returns false (declines all
requests)
- **Priority 100**: First in default ordering
- **Future roadmap**: Will implement `HIPDNN_PREFERRED_ENGINE` env var
support

**Rationale for shipping empty**:
- Validates plugin loading infrastructure
- Reserves priority slot for future implementation

---

### 4. StaticOrdering Plugin

**Purpose**: Preserves legacy `sortEngineIds()` behavior.

**Implementation**
(`plugins/heuristics/static_ordering/StaticOrderingPlugin.cpp`):
- **Direct port** of old backend sorting logic
- **Engine ordering**: MIOPEN_ENGINE first, MIOPEN_ENGINE_DETERMINISTIC
last, others middle
- **Stable sort**: Preserves relative order of engines with same
priority
- **Priority 1000**: Last in default ordering (deterministic fallback)
- **Always succeeds**: Returns sorted list, finalize() always returns
true

**Algorithm** (delegates to shared `EngineOrdering.hpp`):
```cpp
auto getPriority = [](int64_t engineId) -> int {
    if (engineId == MIOPEN_ENGINE_ID) return 0;
    if (engineId == MIOPEN_ENGINE_DETERMINISTIC_ID) return 2;
    return 1; // Other engines
};
```

---

### 5. Query APIs

**Purpose**: Allow users to introspect registered heuristic policies.

**New Functions** (`backend/include/hipdnn_backend.h`,
`backend/src/HipdnnBackend.cpp`):
```cpp
// Get number of registered policies
hipdnnStatus_t hipdnnGetHeuristicPolicyCount_ext(
    hipdnnHandle_t handle,
    int64_t* count);

// Get metadata for specific policy (name, ID, priority)
hipdnnStatus_t hipdnnGetHeuristicPolicyInfo_ext(
    hipdnnHandle_t handle,
    int64_t policyIndex,
    hipdnnHeuristicPolicyInfo_t* policyInfo);
```

---

## Architecture Highlights

### Shared EngineOrdering.hpp

**What**: Header-only sorting logic in
`data_sdk/include/hipdnn_data_sdk/utilities/EngineOrdering.hpp`

**Why**:
- Both backend and StaticOrdering plugin need sortEngineIds()
- Plugins cannot link against backend (circular dependency)
- Header-only = compile-time inclusion, zero runtime dependencies

**Backend Delegation** (`backend/src/utilities/EngineOrdering.cpp`):
```cpp
void sortEngineIds(std::vector<int64_t>& engineIds) {
    hipdnn_data_sdk::utilities::sortEngineIds(engineIds);
}
```

**Plugin Usage**
(`plugins/heuristics/static_ordering/StaticOrderingPlugin.cpp`):
```cpp
#include <hipdnn_data_sdk/utilities/EngineOrdering.hpp>
// ...
hipdnn_data_sdk::utilities::sortEngineIds(d->sortedEngineIds);
```


---

## Testing & Validation

### Test Files Added

**TestHeuristicPolicyFramework.cpp** (18 tests):
- Multi-policy orchestration and fallback behavior
- Policy slot ordering (priority-based)
- EngineHeuristicDescriptor lifecycle

**TestHeuristicPolicyPlugins.cpp** (14 tests):
- Config and StaticOrdering plugin loading
- Plugin discovery from `lib/hipdnn_plugins/heuristics/`
- Policy behavior validation

**TestStaticOrderingPolicy.cpp** (13 tests):
- Engine ordering permutations (MIOPEN/DETERMINISTIC/OTHER)
- Stable sort validation
- Edge cases (empty lists, single engine)

---

## Files Modified

### Backend Framework
```
backend/include/
  └── hipdnn_backend.h                      (+2 query APIs)

backend/src/
  ├── HipdnnBackend.cpp                     (+96: query API impl)
  ├── descriptors/
  │   ├── EngineHeuristicDescriptor.cpp    (+347: policy orchestration)
  │   └── EngineHeuristicDescriptor.hpp    (+27: policy slots)
  ├── heuristics/
  │   ├── SelectionHeuristic.cpp           (+121: policy wrapper)
  │   └── SelectionHeuristic.hpp           (+137: wrapper interface)
  └── utilities/EngineOrdering.cpp         (-36: delegate to data_sdk)
```

### Data SDK
```
data_sdk/include/hipdnn_data_sdk/
  └── utilities/
      └── EngineOrdering.hpp                (+66: header-only sortEngineIds)
```

### Plugins
```
plugins/
  ├── CMakeLists.txt                        (new: plugin build root)
  └── heuristics/
      ├── CMakeLists.txt                    (new: heuristics plugin build)
      ├── config/
      │   ├── ConfigPlugin.cpp              (+379: no-op policy)
      │   ├── README.md                     (plugin documentation)
      │   └── CMakeLists.txt                (build config)
      └── static_ordering/
          ├── StaticOrderingPlugin.cpp      (+410: legacy sorting)
          ├── README.md                     (plugin documentation)
          └── CMakeLists.txt                (build config)
```

### Tests
```
backend/tests/
  ├── TestHeuristicPolicyFramework.cpp      (+316: 18 tests)
  ├── TestHeuristicPolicyPlugins.cpp        (+368: 14 tests)
  └── TestStaticOrderingPolicy.cpp          (+264: 13 tests)
```

---

## Notable Design Decisions

### 1. Policy Orchestration in Descriptor finalize()

**Decision**: Implement policy loop in
`EngineHeuristicDescriptor::finalize()` rather than a separate
orchestrator class.

**Rationale**:
- Descriptor already owns policy slots and ordering
- Keeps orchestration logic with state
- Simpler API (no additional orchestrator object)

---

### 2. Empty Config Plugin Ships in Production

**Decision**: Ship Config plugin with no-op implementation (finalize
returns false).

**Rationale**:
- Validates plugin loading infrastructure
- Reserves priority slot (100) in default ordering
- Future PR adds env var logic without framework changes

---

### 3. EngineOrdering.hpp Restoration

**Decision**: Restore `sortEngineIds()` as header-only in data_sdk.

**Context**: Revert commit (6a24652) removed it from Part 1,
breaking StaticOrdering plugin compilation.

**Rationale**:
- Plugins need sorting but cannot link against backend
- Header-only = zero runtime dependencies
- Backend delegates to avoid duplication

---

## Follow-Up Work

### Part 3: Frontend Integration (Next PR)

**Scope**: Add frontend C++ APIs for heuristic usage.

**Expected Features**:
- `Graph::getEngineHeuristic()` for creating heuristic descriptors
- Policy registration from user code
- High-level engine selection workflows

---

### Key Files to Review

1. `backend/src/descriptors/EngineHeuristicDescriptor.cpp` - Policy
orchestration loop
2. `backend/src/heuristics/SelectionHeuristic.cpp` - Policy
wrapper/facade
3. `plugins/heuristics/static_ordering/StaticOrderingPlugin.cpp` -
Legacy behavior port
4. `backend/src/HipdnnBackend.cpp` - Query API implementation

---

**Resolves**: RFC 0007 Part 2/3
**Depends On**: #6467 (Plugin SDK Infrastructure)
**Enables**: Part 3 frontend integration, custom heuristic plugins

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jason Campbell <jascampb@amd.com>
Co-authored-by: Claude Filter Strategist <claude@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants