[RFC] Integrated Boot Configuration System (build, provisioning, runtime)

_TLDR; Please see #76903 which exemplifies most of the concepts laid out in this RFC in a PR that is much easier to review than this detailed RFC. This continues the discussion started in #68127 in more detail._

# Introduction



Currently, we have no satisfactorily integrated solution to configure in-memory software component instances (as opposed to software features) at build and provisioning time.

Existing approaches like [Zephyr's current flavor of Devicetree (DT), Kconfig](https://docs.zephyrproject.org/latest/build/dts/dt-vs-kconfig.html) or the [settings subsystem](https://docs.zephyrproject.org/latest/services/settings/index.html) provide partial solutions but, as generic software component instance configuration systems, they lack in overall structure, flexibility, scope or scalability.

Examples:
* Network Stack Configuration (#68127) including IEEE 802.15.4 subsystem configuration (#63484).
* Possibly: Configuration of initialization priorities (#73836)
* Possibly: Provisioning a PSA secure storage target (#75275)
* Arch WG: "Sensing needs something similar."
* Building or provisioning directly to the settings subsystem (e.g. for BLE and BLE Mesh configuration).
* Support of alternative structurally compatible source formats like protobuf, Thrift, a configuration database or YAML/JSON over a network.

Apart from these pending functional requirements, we have created a rather artificial and complicated (from a user perspective) distinction of the relative domains of applicability of Kconfig and DT due to an insufficiently precise ontological hardware/software divide and structural deficiencies of Kconfig.

The Kconfig/DT distinction SHALL be made more precise, practically useful and enforceable and the resulting software component instance configuration SHALL be represented in a more maintainable and scalable unified format with currently required Zephyr-specific Kconfig/DT "quirks" removed.

## Problem description


Note: See motivating use cases for the following requirements in "Exemplary Use Cases" below.

A. This RFC addresses the following specific problems:
1. The proposed solution SHALL support easy and intuitive (ie. user-centric) configuration of multi-instance software components.
1. Single and multi-instance software component configuration SHALL be united based on criteria of usability and maintainability rather than structural or "philosophical" constraints of DT vs. Kconfig.
1. The hard-to-enforce, imprecise and confusing ontological hardware/software criterion SHALL be replaced by an easy-to-enforce, user-centric software feature (Kconfig) vs. software component instance (CT) distinction. This SHALL improve the overall configuration design on objective engineering measures of encapsulation, maintainability and data model normalization, ultimately leading to improved usability and developer experience.
1. A unified and normalized abstract conceptual configuration data model SHALL be defined and decoupled of its sources and targets of serialization. Users' build and provisioning time requirements as well as requirements for different representations like YAML, protobuf, Thrift, the settings subsystem, secure keystores, etc. SHALL be based on the same self-documenting, intuitive and maintainable abstract data model w/o unnecessary incompatibilities.
1. Configuration artifacts (partial serialization snippets) SHALL be grouped based on precise and easy-to-enforce user-centric criteria of modularization like the information hiding principle, the principle of least surprise, deployment context (e.g. security requirements), rate-of-change, deployment time (e.g. build time vs. provisioning time), etc. eventually leading to a more maintainable and more intuitive configuration system.
1. Eventually all Zephyr-specific "quirks", extensions and exceptions SHALL be removed from both, Kconfig and DT: Kconfig SHALL be re-focussed on its original purpose of feature selection rather than in-memory software component instance configuration. DT SHALL be re-focussed on its original purpose of inter-OS portability (namely compatibility with Linux). Both not so much for technical or ontological reasons but to recover their full usefulness in letting users coming from Linux transfer existing knowledge w/o compromises.

Of course such a transitions requires time and effort. Therefore this RFC proposes a gradual migration path from the current state to the target state maintaining long-term backwards compatibility at every step forward w/o introducing further inconsistencies. No one SHALL be distracted from their actual goals or invest extra effort in the migration of configuration if not on their own demand to satisfy their own needs and requirements.

## Proposed Change



This RFC proposes an abstract conceptual data model, serialized - in a first step - to a backwards compatible, semantic extension of the DT format. This DT superset is called "configtree" (CT) in the following.

The solution will be exemplified for network interface settings but will be extensible to all subsystems and applications as laid out in the problem description.

The proposed architecture allows for later addition of alternative source or target serializations, e.g. settings subsystem key/value pairs, property, protobuf IDL or Thrift files, integration with externally managed databases or secure key stores, JSON or YAML files provided locally or retrieved from a network location.

_Note:_ See [System Device Tree's simplified YAML serialization of DT](https://github.com/devicetree-org/lopper/blob/master/specification/source/chapter6-simplified-yaml.rst) (and CT) as one option to represent CT as YAML.

CT is proposed as a first serialization format for pragmatic reasons of usability, simplicity, initial effort and long-term maintainability. It will be shown, that it is entirely capable to represent the proposed abstract data model in an - as we find - rather intuitive way. It satisfies all technical, logical and business requirements of a serialization source and intermediate unified format within the proposed overall configuration approach. 

The proposed migration path consists of the following steps (not necessarily in this order):
* Introducing, documenting and exemplifying CT for the network subsystem. This includes migrating single- and multi-instance network subsystem specific Kconfig software component configuration parameters (NET_CONFIG_*) to CT and deprecate them in Kconfig. This exemplifies an improved "feature selection" vs. "configuration" Kconfig/CT divide and properly encapsulates CT network defaults and bindings in the file system tree as near as possible to actual usage sites. For backwards compatibility, deprecated Kconfig parameters will be regarded as just another source serialization format and Linux DTSpec compliant DT will be recoverable from CT for network devices w/o any Zephyr specific exceptions or extensions.
* Extending the CT approach to drivers and other subsystems as needed including deprecation of Kconfig software instance configuration parameters.
* Adding support for the settings subsystem as a (partial) configuration target. This requires additional tooling to _split_ the unified intermediate configuration space among targets and translate hierarchical properties to a flat settings list.
* Adding support for secure configuration storage sources and targets. This requires additional tooling to _merge_ different configuration sources into the unified intermediate configuration space.
* Adding additional source serialization formats or transports on an as-needed basis.

_Note:_ Splitting and merging CT could be achieved with the [Lopper tool](https://github.com/devicetree-org/lopper) from the [System Device Tree](https://github.com/devicetree-org/lopper/tree/master/specification/source) project. It allows to manipulate DT (and CT) files based on a syntax [similar to XPath](https://github.com/devicetree-org/lopper/blob/master/README-architecture.md).

# Detailed RFC


This RFC specifies an improved overall hardware, software feature and software component configuration for Zephyr as existing configuration approaches are lacking:
* Kconfig currently mixes software feature selection (include/exclude subsystems, drivers, feature switches, etc.) with singleton software component instance configuration. Kconfig was not conceived as software component instance configuration space and therefore conceptually lacks the ability to configure collections of structured software object _instances_.
* Devicetree covers configuration for driver instances (peripheral-to-driver mapping, clock frequencies, interrupt lines, driver subsystem configuration, etc.) including Zephyr-specific `zephyr,...` and `<vendor>,...` extensions. It was designed to represent hardware _independently of any specific operating system_. Its current tree structure and usage rules _in Zephyr_ do not represent a normalized graph of distributed configuration object instances and breaks encapsulation rules.
* The settings subsystem addresses boot-time key/value configuration but it cannot efficiently handle a large build-time graph of structured instances of configuration objects in the context of a low-power/low-resource RTLS as its name space is rather limited and its currently available backends require settings to consume non-volatile memory. OTOH, the settings subsystem goes beyond build-time configuration by allowing for provisioning time or runtime configuration in persistent device storage. The settings property structure can however not be integrated with the configuration property structure at present which requires redundant code and makes it harder to maintain a consistent and self-validating configuration data model.

A few solutions for specific application/subsystem configuration problems exist
* The standard DT `/chosen` node (DTSpec v0.4, section 3.6) allows to refer to other DT nodes to configure global switches related to/referring to hardware/driver configuration. In Zephyr these are mostly used to configure samples, basic OS features or choose hardware for specific use cases (e.g. the console target or the settings partition). This approach only allows to set `<phandle>`s or aliases and does therefore not scale.
* The custom DT [`/zephyr,user`](https://docs.zephyrproject.org/latest/build/dts/zephyr-user-node.html) node allows application developers to define simple key/value pairs. It is conceived as an ad-hoc configuration mechanism, though, that does not scale to the required structures.
* There are still a few Kconfig (e.g. `CONFIG_SOMETHING_0/1/2/...`) "hacks" that work around Kconfig's lack of object instance support. This approach does not scale and it can only be applied to fixed multiplicities.

None of the existing approaches scales to the levels required in Zephyr today. In the absence of a proper configuration system they tend to be (ab)used for properties that should better be represented in a well-defined application/subsystem configuration framework. This RFC tries to lay out the requirements of such a system as well as proposes a specific implementation and migration approach.


## Exemplary Use Cases

The following use cases illustrate and motivate detailed requirements.

**Note**: These use cases don't necessarily cover all features of the proposed configuration approach. If some requirement is neither self-evident nor covered by a corresponding use case, please comment and let me know.

### Scalable, Resource-Optimized Build and Provisioning Time Boot Configuration

As an embedded application developer I want to configure immutable boot defaults across all enabled subsystems consistently at build-time w/o incurring avoidable resource usage (e.g. CPU cycles, RAM or ROM). I want only such boot configuration to consume non-volatile memory that needs to be injected at provisioning time and/or changed at runtime. I also want to scale effortlessly from a single instance to a multi instance software component configuration or promote build-time to a provisioning-time configuration or vice-versa w/o having to migrate properties between independent configuration approaches (e.g. from Kconfig to DT to the Settings Subsystem and back).

### Extensible and Re-Usable Configuration of Samples

As a maintainer or contributor I want to create driver- or subsystem-specific samples that can as effortlessly as possible be _combined and extended_ by embedded application developers into fully-functional customized solutions. The sample build boot configuration should therefore use the same format and tools required for single instance and multi-instance build-time or provisioning-time software component boot configuration as a scaled custom application.

### Build Time Injection of Boot Configuration

As a large-scale application developer I want to be able to define large amounts of build time configuration variants externally to Zephyr. I want to use my own custom configuration format (e.g. Thrift or protobuf), possibly editable and sourced dynamically from a database or network location independently from Zephyr and application code repositories.

### Provisioning Time (e.g. End-of-Line) Boot Configuration

As a production engineer I want to be able to provision device specific settings as fast as possible to target devices w/o having to re-compile the device's firmware, e.g. as a separate settings image via JTAG or a SPI flash tool to an EEPROM, flash partition or dedicated flash storage. To develop or debug end-of-line configuration, as a firmware application developer, I want to be able to simulate end-of-line configurations at build time w/o having to use complex production-specific tooling or migrate configuration properties between separate configuration approaches.

### Runtime Boot Configuration

As an end user of a device, I want to be able to change provisioned boot-time defaults of my device persistently at runtime (e.g. to configure custom network details if Zephyr is powering a typical home router device). As an application firmware developer, I do not want to incur extra effort to provide provisioning and runtime configuration through separate configuration approaches.

### Declare initialization and reverse dependencies between software component instances

As a maintainer or contributor, I want to declare default initialization dependencies and sequences of related software component instances ("services"). As an firmware application developer, I want to be able to override default initialization dependencies and sequences. As a maintainer, contributor or firmware application developer, I want to specify and configure arbitrary lifetime hooks in addition to the default initialization callback that should be respecting the inversion-of-control principle w/o the software component instance having to "know" (ie. depend on) the caller.

### Supply security material from secure sources to secure targets

As a production engineer I want to be able to inject confidential security material directly from a secure key vault to a secure embedded storage at the end of my production line.


## Detailed Requirements

This section describes detailed requirements in addition to the main functional requirements A.1 through 6.

B. Scope:
1. The configuration design SHALL enable injection of persisted boot-time configuration at build time, provisioning time and runtime.
1. The configuration subsystem SHALL support and be useful to all Zephyr hardware, subsystems, samples, libraries, modules and applications including out-of-tree software components and applications.

C. Source and Target Serializations:
1. Configuration SHALL be (partially) serializable to and from every source or target representation capable to hold sufficiently typed hierarchically organized key/value pairs. This includes - but is not limited to - the following specific formats: CT/CT bindings as specified in this RFC, YAML/YAML Schema, Thrift/Thrift Schema, protobuf/protobuf IDL, the settings subsystem, JSON/JSON Schema including remote and local sources and targets.
1. All source and target representations SHALL be completely decoupled by a merged, fully normalized, human-readable canonical intermediate serialization artifact. This RFC proposes CT encoding as intermediate representation for the moment being. Should it turn out at a later point in time that this format is lacking, CT shall be downgraded to a source or target serialization and the infrastructure SHALL be switched to whatever improved intermediate representation is considered more suitable at that time.
1. The default source serialization SHALL enable a hierarchy of configuration layers that MAY override each other in the same way as is currently possible in the Kconfig and DT implementations.
1. Any source serialization SHALL allow for easy-to-read internal or external pointers (references) between any two represented entities including - but not limited to - all driver, subsystem and application software component instances, etc.
1. Any source serialization SHALL be accompanied by schema definition files that determine at least a fully C-typed target representation including C primitive types, types from `<stdint.h>`, structs and pointers. Additional target type systems as used in Rust or C++ SHOULD additionally be supported, at least in principle.
1. Zephyr's default target serialization SHALL be representable and usable in C code without additional runtime resource usage (notably CPU cycles, RAM, NVM), e.g. as macros.
1. Users SHOULD eventually be able to develop and contribute their own standard-based or custom source or target representations while being able to re-use the intermediate infrastructure.

D. Maintainability:
1. Zephyr's default source serialization SHALL and non-default serializations SHOULD be divisible in arbitrary configuration snippets to be merged at build time. Notably the overall data model represented by the configuration SHALL be fully decoupled from its division into arbitrary deployment artifacts (files and folders).
1. Zephyr's default source serialization SHALL be self-validating and self-consistent. By current modeling standards, this can best be achieved by normalizing the conceptual data model while keeping the physical (i.e. serialized) data model as close as possible to the conceptual data model.
1. If several proposed configuration approaches fulfil all functional requirements, then we SHALL prefer the one that re-uses most of the existing in-tree infrastructure and out-of-tree community invest and requires less initial and long-term maintenance effort (total cost of ownership).

E. Documentation:
1. Zephyr's default source and target serializations as well as related tools and processes SHALL NOT require considerable new syntactical or conceptual learning effort by users in addition to the current configuration approach.
1. Zephyr's default source and target serializations, as well as related tools and processes SHALL be well documented.
1. Schema definition files SHALL allow for machine-readable inline documentation of all entities and properties. Any complete schema definition SHALL thereby provide a full specification and documentation of the underlying abstract conceptual data model.
1. Schema description and model metadata SHALL be accessible by Zephyr's automated documentation system and be included in the documentation build.

F. Machine-readable metadata describing configuration data and schemas:
1. It SHALL be possible to attach machine-readable documentary or technical model-metadata to both, configuration files and schema files, e.g. to consistently document hardware-specific driver capabilities.
1. Metadata schemas SHALL be enforceable by the same means as the model data itself.

G. CT-specific requirements:
1. A fully compliant DT SHALL be recoverable at any moment from the CT w/o manual interaction.

Note: CT as default source and intermediate serialization format together with existing DT macros, tooling and corresponding documentation satisfy almost all of these requirements out-of-the-box with minimal initial implementation effort.

Note: Initialization dependency properties MAY be modeled as just another kind of composable binding schema that MAY be applied to certain CT nodes according to CT normalization rules. Nodes representing initializable software component instances declare initialization dependencies to other initializable software component instances via hierarchy or `<phandle>`. All default initializations may be accumulated in a single file or distributed over subsystems according to CT encapsulation rules.

## Proposed change (Detailed)

### Configtree (CT) Specification

CT is a natural semantic superset of DTSpec (and the upcoming System DT). CT SHALL use the same syntax as DTSpec without hardware specific properties in non-device/non-hardware nodes. Allowed standard properties in non-device/non-hardware nodes are "status" and "compatible". CT MAY introduce additional `<prop-encoded-array>` if required. Currently no such requirement is known, though.

CT SHALL be backed by a well-defined Zephyr-specific abstract conceptual configuration data model (the "Zephyr configuration space") that includes existing DT entities and attributes as well as CT-specific extensions. The abstract data model SHOULD be documented in the Zephyr user documentation using adequate textual graphing techniques (e.g. based on mermaid) for easy review. Alternatively the model MAY be generated automatically from improved binding sources that not only specify properties but also relations.

CT introduces additional nodes and properties into the device tree (called "configtree" for CT) that structurally relate 1-to-n or n-to-m to existing driver or hardware nodes. Software component instance related configuration properties SHALL be introduced into existing DT nodes if they structurally relate 1-to-1 (bijectively) to existing nodes.

Nodes that structurally relate 1-to-n to existing nodes SHALL be n-side subnodes of the 1-side node. A collection of 1-to-1 related properties inside the same node MAY be grouped in their own subnode for improved encapsulation (e.g. for separate subsystems or larger semantically related properties), similarly to the structures currently generated in Kconfig. It SHALL at all times be clearly specified, though, how nodes map to the abstract unified configuration space in order to prove normalization of the CT model representation.

Nodes that structurally relate n-to-m to device/peripheral-related nodes require an additional top-level sub-space to be introduced. The structure of CT-specific top-level subspaces SHALL follow the file structure of the drivers or subsystems that require the additional node. DT or CT SHALL NOT introduce additional top-level nodes based on other custom encapsulation criteria. Existing non-standard top-level nodes other than those explicitly defined in DT or CT SHALL be regarded as "modeling bugs" and corresponding issues SHALL be opened to document and fix them.

References between n-to-m related nodes and nodes inside DT or CT-specific DT extensions SHALL be made explicitly using a DTSpec `<phandle>`. References using alternative custom primary keys (e.g. driver or interface names) or logic ("the first matching interface") SHALL not be used. 1-to-1 references SHALL not be allowed as they obviously breach CT normalization requirements.

CT SHALL use the same Zephyr-specific .yaml binding files and macro targets as DT. CT-specific macro targets MAY be added. They are prefixed with "CT_" unless they can also be applied to DT.

CT introduces strict encapsulation rules. Files representing CT (including DT) and corresponding binding files SHALL be modularized into files according to the following rules:
* Subsystem or driver specific default properties and binding files that are _exclusively_ used in Zephyr (namely unavailable in Linux) SHALL be placed inside Zephyr's directory tree based on the "least visibility" encapsulation rule, i.e. as deep as possible in the directory tree and as close to in-tree usage sites. 
* Alternatively we MAY consider having non-Linux default properties and bindings side-by-side with corresponding public header files of drivers or subsystems that use them if we consider them to be part of the public user API.
* Vendor, architecture, hardware or application specific configuration default property and binding files SHALL be placed as close as possible to their respective usage sites, too e.g. inside vendor-, architecture-, hardware- or application-specific directories.
* Default properties that are semantically defined and used by Linux and their corresponding binding files SHALL be placed in a shared top-level folder structure separate from their usage sites and from all other default property and binding files. This enables us to automatically recover a fully Linux-compatible DT from CT at any times.
* File names SHALL be chosen to place default property and binding files as close to related source files inside a folder when ordered alphanumerically.
* No files SHALL be placed based on imprecisely or subjectively defined and hard-to-enforce ontologies like "hardware" vs. "software" properties.
* Existing deviations from these encapsulation rules SHALL be documented as issues when found and fixed accordingly over time.

### The Zephyr Configuration Space

The following diagram proposes an initial abstract conceptual data model of the Zephyr Configuration Space. See #76903 which demos the model in CT serialization.

![zephyr-config](https://github.com/user-attachments/assets/14f42b35-2ab4-4877-a194-688b203b7591)

Also see https://drive.google.com/file/d/1sQuen1Y0bAIS5PX_kKRmSTNA_g4gd-gT/view?usp=sharing (requires the Google Drive draw.io plugin) for a possibly updated version.

This model SHALL be updated based on rules of normalization whenever additional entities need to be added. Property documentation MAY be added to illustrate normalization. Any serialization SHALL be validated against this conceptual data model and SHALL be rejected if not matched. Binding files SHALL document the (collection of) entities to which they can be applied. These binding file restrictions SHALL be verified during build.

### Additional/Improved Binding File Semantics

**Composition over Inheritance**

Currently binding files only allow for inheritance of types. Composition of types (mix-ins) cannot be defined. This makes it unnecessarily hard (and sometimes impossible) to properly design a well encapsulated design hierarchy.

The following example shows current binding file design practice in Zephyr:

Example:
```
compatible: "adi,ad559x-adc"

include: adc-controller.yaml
```

This binds a specific device to a driver-specific software programming model _in practice_. We justified this in the past by asserting that "adc-controller-yaml" would be exclusively determined by an objectively correct hardware-only ontology of an abstract ADC hardware model sufficient to all imaginable driver implementations.

This promise was of course rarely kept in practice. Driver internals regularly leak into supposedly hardware only "base types" which breaks encapsulation. This is not surprising as DT properties are largely determined by their usage inside Zephyr, not by an independent commonly accepted shared industry standard outside Zephyr except for properties introduced by Linux for which it may be argued that they represent a de-facto standard.

In practice our inheritance tree forces a client programming model onto the peripheral which is what DT was originally conceived for but may not always be compatible with Zephyr's claim to be "vendor agnostic" and "customizable". OTOH Zephyr has no requirement to be OS agnostic. So adding Zephyr-specific additions to CT is not a problem as they can be easily ignored by custom or vendor-specific driver or subsystem implementations. But if doing so, they need to be composable as not to pollute the inheritance hierarchy and they need to be properly encapsulated.

From a data model perspective, drivers are related (=chosen) to a combined hardware instance + application key, i.e. the correct combined abstract normalized "key" to such a compositional configuration class would be the `(app-id, peripheral-id)` tuple. This means that the above example there should be some `zephyr,adc-controller` compatible mixed into the peripheral's node as default driver programming model which could be overridden on application level. The `zephyr,adc-controller` compatible then matches a `zephyr,adc-controller.yaml` file which is placed near the corresponding adc.h or adc driver folder where all the drivers reside that follow its programming model. The application would provide a partial DTS fragment that overrides or extends the driver's node "compatible" with its own custom driver client programming model if required, possibly in a directory hierarchy that again closely couples with the driver hierarchy. Introducing a proper naming convention, such rules could be verified automatically during build with resolution based error messages.

The composition of bindings for _typing_ should not be confused with the actual driver selection at build time. Driver selection follows the logic described in DTSpec: From left to right, drivers matching one of the compatible strings are being located in the build (as configured by Kconfig and cmake). If none or more than one matches _per peripheral_, a warning or error message is generated and the build possibly stops. This means that the `app-id` part of the conceptual driver implementation key above will be provided by app-specific Kconfig feature-inclusion mechanisms while the `peripheral-id` part will be specified by the first matching compatible of the corresponding CT node.

This rationale results in the following additional requirements for the Zephyr binding system:

* The Zephyr bindings system SHALL be able to match more than one binding file per node, up to one per string in the compatible string list. This allows us to apply the "composition over inheritance" heuristic to the CT type system.
* Nodes SHALL be validated against _all_ matching binding files in the build and the corresponding target serialization(s) be generated. This allows us to compose arbitrary "hardware programming models" (used to match drivers based on hardware as defined in DTSpec) with Zepyhr or application specific "client programming models" (used to match alternative driver implementations for the same hardware). This was not required in DTSpec which is explicitly conceived as being "client agnostic" but makes sense for Zephyr of course.
* Custom drivers MAY extend the Zephyr driver programming model by inheriting from the corresponding Zephyr default driver subsystem binding files. This SHOULD however respect rules of encapsulation as defined elsewhere and therefore the existing hard-coded inheritance hierarchy SHALL be migrated to a compositional model over time on an as-needed basis.

**Nomenclature and Directory Structure**

Currently we require binding files to reside in separate top-level directories. This places binding files far from corresponding default DT source files and from usage sites and thereby breaks the above formulated CT encapsulation requirements.

This RFC therefore proposes an alternative naming schema `<[vendor,]programming-model>.binding.ya[m]l`. Files following this nomenclature MAY be placed anywhere in Zephyr's directory tree. They SHALL be placed as closely to their usage sites as possible, see CT encapsulation naming rules. The vendor part is optional when it is clear from context. Namely inside the Zephyr source tree the `zephyr,` prefix SHALL be left out, to ensure that files can be placed near to other similarly named files based on the programming model (i.e. API).

**Additional Restrictions placed on Tree Structures**

Similarly to JSON Schema and YAML Schema, we SHOULD be able to not only validate properties based on compatible strings but to also restrict node names for certain bindings (e.g. `channel` in ADC or `iface`, `ipv6`, etc. for well-defined network configuration nodes).

We SHOULD be able to restrict subnodes to certain parents, e.g. `iface` SHALL have to be explicitly whitelisted as allowable child node by network peripherals or `channel` as child node of ADC peripherals possibly including multiplicity in both cases. Therefore the `iface` and `channel` bindings SHALL be marked as "whitelist-only" and corresponding network driver nodes will have to include them explicitly in their "subnode-whitelist".

Similarly it SHOULD be possible to place restrictions on allowable parent nodes, e.g. to only let `ipv6` or `ieee802154` define that they only SHALL be subnodes of `iface`. This time encapsulation requirements are opposite, therefore it suffices to include a "parent-whitelist" property to such bindings possibly including allowed multiplicity ranges.


### Configtree vs. Devicetree vs. Kconfig

**Single instance vs. multi instance software component configuration**

All software component instance configuration properties SHALL be deprecated in Kconfig and migrated to CT under the above rule sets.

Kconfig SHALL be exclusively responsible to select features, while all software component instance configuration SHALL be reserved to CT (including DT).

To make this more precise, the following rules SHALL apply:
* Features (Kconfig) are represented as code while instance configuration will at some time be represented in memory (CT):
  * Use Kconfig to include or exclude code or enable or disable coded logic in the build.
  * Use CT to configure runtime software parameters, be they singletons (e.g. subsystem-level parameters or singleton drivers) or multiple instances (e.g. multiple instances of the same driver, multiple protocol instances, etc.).
* Contributors can roughly use the following heuristic: _Will the switch that I'm requiring mainly configure the content of flashed .text sections (Kconfig) or the content of boot-time stack/heap memory structures as initially represented by .data/.bss sections (CT)?_

Kconfig SHOULD thereby be re-focused on its original intent to describe, compose and configure software features (in terms of included source code or logic) and software feature dependencies. This is not so much required as an end of itself (Kconfig "conformance") but has the following practical advantages:
* Users coming from Linux will recognize more familiar Kconfig patterns.
* Kconfig configuration will be more concise and focused.
* Newly introduced configuration variables will no longer fall in the "initially single instance" trap that has often hindered maintainability and maintainable evolution of drivers from single to multi-instance as it led to considerable redundant development effort (see the "old" and "new" USB driver approach or L1/2 and L3+ network subsystem configuration). All software components should be conceived as intrinsically instantiable in the future.
* Most importantly: The artificial and arbitrary distinction between single and multi instance software component configuration due to purely technical restrictions will be replaced with the more intuitive and precise feature vs. memory distinction for improved usability and a more level learning curve.

Configuring runtime software components, be it "in-memory" or as global runtime parameters, SHOULD be migrated to CT, especially such parameters that strictly belong to one of the CT abstract modeling concepts by normalization rules.

Backwards compatibility to deprecated Kconfig MAY be maintained as long as required as laid out in the requirements section.

**Hardware vs. Software Configuration**

The hardware vs. software distinction SHALL be dropped in favor of the following, more precise and easier-to-enforce rules:
* Properties defined and used by Linux DT SHALL be part of the DT tree as specified above for CT.
* Software component instance configuration properties defined and used exclusively by Zephyr, vendors or users SHALL be placed in CT and distributed across the file system based on normalization and encapsulation rules as specified above for CT.
* Software features SHALL be selected via Kconfig switches.

## Dependencies

Direct dependencies exist to Kconfig, DT and the settings-subsystem. Indirect dependencies exist to all configurable drivers or subsystems.

## Concerns and Unresolved Questions

This section answers questions and evaluates concerns brought forward while discussing the aptitude of DT as a configuration source.

Concerns are responded to based on Zephyr-specific requirements and pragmatic engineering approaches, namely the concepts of data model normalization (similarly to 3NF for relational data models) and encapsulation/modularization.

Work-in-progress - please comment, I'll collect all concerns and questions here.

**Is DT syntax capable to address all our software configuration requirements?**

Yes. DT is just a tree of nodes with key/property values and references (phandles) that can easily be mapped via bindings to any primitive C type, `<stdint.h>` type, struct and pointer. Any normalized data model can obviously be mapped to DT. This should be good enough under all reasonable circumstances and is theoretically very well founded. We have a semantic modeling challenge before us, not a syntax or serialization challenge.

Also compare the DTSpec archeology section below.

**People don't like DT or cannot understand DT, DT is awkward.**:

As we will not dispose of DT to use YAML everywhere, no matter how bad DT is, everyone who uses Zephyr has to know and work with it anyway. From a usability pov it doesn't matter what serialization we choose as long as we choose a single one, fix the quirks and document it well.

**On a Linux box you have to deal with many different config files, too.**

"Because Linux does it" is not requirement or engineering argument as such. We have no Zehyr-specific requirement that forces us to use many distinct config formats. There are good usability arguments that prefer an integrated approach. Note that this RFC favors distribution of configuration over many files (see the encapsulation/modularization argument), just not many distinct semantics and syntax variants.

**We should probably start solving domain-specific problems.**

We have a an obvious requirement to design something that can be extended to other subsystems plus can be integrated with the settings subsys, used for provisioning and be serialized to other formats like protobuf IDL or Thrift which we should not ignore. Above all we have to be able to serialize to any syntax based on some abstract conceptual data model.

**A YAML-based solution is easier to understand and maintain.**:

As laid out in the "Alternatives" section, a YAML-based solution is going to be a huge maintenance and documentation nightmare. We have to re-invent every wheel that has been invented for DT: type binding, inline documentation, integration with the doc system, mappings to macros, overlay mechanisms, naming patterns, etc. Just matching and syncing with the existing DT macrobatics will be a huge effort initially and over time. The problem we're facing is not syntax but semantics and the surrounding infrastructure and tooling.

**It is easy to distinguish between HW and SW properties, that's how we should separate configuration.**

It is not. This is a perceptual bias: We tend to confuse our internal models and heuristics with what is out there in the world. The reality is: We fight over each and every addition to DT because some say "it's SW" others say "it's HW". If not even _we_ are able to precisely define the line between SW and HW, how will our users? If we have to explain to our users that what they find intuitive is wrong then _we_ are wrong.

**Devicetree was derived from the [...] Open Firmware project.**

Nope. See DTSpec, section 1.2:
> The text of this document was derived from ePAPR.

But it's not entirely wrong either as [ePAPR](https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference) itself was derived from the Open Firmware spec (aka IEEE 1275-1994).

**DTSpec was designed to describe hardware only.**

Why it is important to insist on ePAPR rather than Open Firmware as main DTSpec predecessor is that it was only the former that removed user configuration from DT and restricted its applicability to hardware due the changed focus on backing the Power ISA boot firmware then again re-generalized by DTSpec.

IEEE 1275-1994 specified allowable contents of the Device Tree in section 3.2 as:
> The device tree [...] describes [hardware and] _user configuration choices_ [...among other things unrelated to our discussion...]. [...]

Section 3.3.1 adds:
> The list of configuration variables varies from system to system.

IEEE 1275-1994 had an `/options` root node specifically reserved to store such non-volatile user configuration which received a default at _build time_ and could be updated at _provisioning or runtime_ by the end user. So exactly the use case I'm envisioning for DT.

Note: U-Boot uses DT for user configuration, too. They seem to have used the IEEE 1275-1994 `/options` node first but now introduced a custom `/config` node. Of course they _are_ a bootloader, so they need less user config than an application development platform like Zephyr.

Saying that DTSpec was designed to describe "hardware" is therefore at least misleading. DTSpec was designed to back OS-independent bootloaders, see DTSpec, section 1.1:
> The Devicetree Specification provides a complete _boot_ program to _client_ program interface definition.

In other words: DT is a simple HAL but a HAL is of course as much influenced by its client as by the abstracted hardware itself. And Zephyr is _not_ a bootloader nor is Linux. So the "abuse" (or as I'd say "pragmatic re-interpretation") started when focusing DTSpec on describing OS specific device abstractions to become vendor- and architecture independent which reversed the original intent of DTSpec, ie. abstracting OS differences away.

This shows that the simplified conventional wisdom "DT is for hardware only" has never been as "pure" as one might have thought and there is no need to protect its "purity" either. Such an argument proves nothing and should be replaced by requirements analysis: Being OS-independent was _their_ requirement but it was never _ours_ which explains why we never truly enforced it (e.g. in the build infrastructure) except for improved knowledge transfer from Linux (see below). _Our_ main requirement always was vendor-agnosticism.

DTSpec is careful to introduce HW specifics in a separate section after laying out a general hierarchical key/value store with generic typing. We can trivially keep all HW specific parts out of nodes that don't need it by extracting `status` and `compatible` into a separate generic `node.yaml` binding file which will replace `base.yaml` for those nodes.

In the end it doesn't even matter that much anyway. Our discussion re software/hardware is mostly academic: Structurally (i.e. by normalization criteria) the large majority of our subsystem config requirements map to existing device tree structures naturally (1-to-1 or 1-to-n). The remaining m-to-n related nodes can be isolated into top-level namespaces as inspired by IEEE 1275-1994 and referred to from inside the actual device-specific tree, see https://github.com/fgrandel/zephyr/blob/rfc/76902-systree-config/samples/net/sockets/echo/app.overlay as an example.

**Wherever it made sense, we've tried to be compliant with Linux devicetree bindings.**

This rule continues to be applied and even fortified by this RFC as laid out in the CT specification section. Not to ensure OS independence of Zephyr's DT (which never was a sensible requirement) but because it helps people who know Linux. They will find it easier to learn Zephyr which again is a real requirement of _ours_. Still we have deviated far enough from Linux (for good reasons) that it can hardly be argued that we're still "compatible" in any sensible way. That's why the above CT specification re-establishes and distinguishes much more precisely between Linux-compatible and Zephyr-specific DT parts.

**The HW/SW split is "cleaner" or at some time in the past was "cleaner" than mixing up hardware and software properties in the same DT nodes.**:

Our use of DT has broken basic data modeling practices from day one, namely normalization and encapsulation. Both are precisely defined design rules:
* Normalization defines with mathematical precision that HW and SW properties **SHALL** be kept in the same abstract entity if they both functionally depend on it (ie. 1-to-1) to ensure model integrity and validity.
* Physical deployment artifacts OTOH **SHALL** be split up along rules of encapsulation and modularization, an argument as precise as a search/find operation over our code base.

Our DTS and bindings are mostly kept far apart from usage sites instead. We have invented Zephyr-specific (but vendor agnostic) "hardware properties" that neither exist in datasheets nor in Linux and put them where the hardware lives based on imprecise ontological assumptions of what is "hardware". This is wrong: By DDD rules and Conway's law we should know that any context-agnostic ontology is doomed to fail. And by the encapsulation argument we should place Zephyr-driver-specific DT snippets near the drivers that use them exclusively while keeping shared concerns at as central a place as required but still as local as possible.

Further de-normalized and de-modularized configuration will inevitable lead to more modeling inconsistencies and less readability/maintainability _in practice_ as the model is not self-validating and consistency cannot be automatically enforced with sensible effort (examples of which abound in our own partially de-normalized DT variant today).

We have to distinguish between the global conceptual data model and its local physical representation instead. YAML doesn't determine a data model. But the model is much more relevant to usability and maintainability than the syntax. This shows how far our discussion has strayed from the real problem so far. Zephyr is an application development platform, as such application architecture concepts are to be applied.

**While DT has been promoted as a great solution to many problems, to me, it has several drawbacks on the way it is implemented in Zephyr.**

This is true. It is due to Zephyr-specific architectural and implementation deficiencies (many of which have been laid out in this RFC) that our use of DT feels awkward. Not due to its syntax. This can be fixed.

**If I had to start Zephyr again, I'd probably stay away from DT.**

Maybe, but that's not an option in practice.

**If we start diverging from [DT], we either define our own spec, or it'll just organically grow into a mess.**

True. This is why CT is specified much more precisely than our current use of DT while acknowledging additional practical requirements that had not been systematically covered by DT so far.

DT and DT bindings have come a long way. Lets focus our resources on making DT more intuitive by fixing a few "quirks" rather than starting from zero because this will immediately benefit us doubly: on the hardware and on the software modeling side. As soon as the cracks in the YAML approach are inevitably going to appear everyone will wish that we had not opened another Pandora's box.


# Alternatives


An alternative, separate YAML-based approach has been considered and rejected in this RFC for the following reasons:
* It would further fragment and complicate Zephyr's boot-time configuration system which already is considered rather complex and hard-to-learn by users today.
* Even if we manage to come up with a good definition of what goes where in a combined KConfig, DT, YAML approach, the community will inevitably misunderstand and misuse it because no one will read let alone understand such an artificial definition. This would cause additional review effort for maintainers and collaborators.
* The Kconfig/DT divide regularly causes confusion for newcomers. A YAML based approach would not contribute to making this distinction more intuitive and natural.
* A separate YAML-based approach would have to duplicate many of the existing DT structures, processes and tools which would cause a multiple of the initial development and long-term maintenance effort required for the above proposed CT approach.
* It would be very hard to keep separate YAML-based tooling and macros in sync with DT tooling and macros for users to transfer knowledge between the two configuration subsystems.
* These are big future problems already solved in a way in CT that everyone in the community understands well. Building on top of the DT infrastructure and knowledge is a huge advantage and gives us a considerable head start wrt existing tooling.
* The obvious community pressure to add more and more software component instance configuration to DT shows that it is an intuitive target where everyone expects application config, too. We should channel this energy productively rather than risking organic uncontrolled growth of DT based on misleading ontologies and assumptions.

The settings subsystem was considered as an exclusive configuration target but was then conceived as optional part of this more general RFC because it would be lacking as a general configuration subsystem as laid out in the "Detailed RFC" section.

Thrift and protobuf were proposed as exclusive configuration sources but were then conceived as optional part of this more general RFC as convergence could hardly be achieved in the community to a single binary source. Apart from that all arguments listed under the YAML approach apply to these source serializations as well.

Kconfig-based approaches are not adequate due to Kconfig's structural limitations as laid out in the "Detailed RFC" section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Integrated Boot Configuration System (build, provisioning, runtime) #76902

Introduction

Problem description

Proposed Change

Detailed RFC

Exemplary Use Cases

Scalable, Resource-Optimized Build and Provisioning Time Boot Configuration

Extensible and Re-Usable Configuration of Samples

Build Time Injection of Boot Configuration

Provisioning Time (e.g. End-of-Line) Boot Configuration

Runtime Boot Configuration

Declare initialization and reverse dependencies between software component instances

Supply security material from secure sources to secure targets

Detailed Requirements

Proposed change (Detailed)

Configtree (CT) Specification

The Zephyr Configuration Space

Additional/Improved Binding File Semantics

Configtree vs. Devicetree vs. Kconfig

Dependencies

Concerns and Unresolved Questions

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Integrated Boot Configuration System (build, provisioning, runtime) #76902

Description

Introduction

Problem description

Proposed Change

Detailed RFC

Exemplary Use Cases

Scalable, Resource-Optimized Build and Provisioning Time Boot Configuration

Extensible and Re-Usable Configuration of Samples

Build Time Injection of Boot Configuration

Provisioning Time (e.g. End-of-Line) Boot Configuration

Runtime Boot Configuration

Declare initialization and reverse dependencies between software component instances

Supply security material from secure sources to secure targets

Detailed Requirements

Proposed change (Detailed)

Configtree (CT) Specification

The Zephyr Configuration Space

Additional/Improved Binding File Semantics

Configtree vs. Devicetree vs. Kconfig

Dependencies

Concerns and Unresolved Questions

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions