Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/design/datacontracts/contract-descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,17 @@ a JSON integer constant.
"s_pThreadStore": [ 0 ], // indirect from pointer data offset 0
"RuntimeID": "win-x64" // string value
},
"sub-descriptors":
{
"GCDescriptor": [ 1 ]
Copy link
Member

@jkotas jkotas Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want to have the sub-descriptors in the json. We do not necessarily know how many of them we are going to have and what their names are going to be at build time. (We happen to know for GC that motivated this change, but it would be nice to allow for optional dynamically loaded components.)

Copy link
Member Author

@max-charlamb max-charlamb Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought having a separate section of pointers to sub-descriptors would be the cleanest way to implement them on the parser side. It would allow the parser to read the complete set of datadescriptors without outside information.

The name here isn't strictly required but I left it in for help debugging and to match the global spec. The parser machinery looks at the listed sub-descriptor pointers and if the values are non-null would recursively read in and merge the sub-descriptor. This would allow us to have sub-descriptor 'slots' that are not always used.

The alternative design I considered was to have the sub-descriptors be standard global values which are well-known to the relevant contracts. These contracts would use a new API on the Target to fetch this addition data. This would require a name and add more complexity to the Target as it's datastores would be mutable after creation.

The drawback is that the sub-descriptors couldn't be dynamically loaded (as you mention). I'm trying to understand if that would be an issue. Given the cDAC operates on a paused target, the memory between data descriptor initialization and contract use should not change (except for writes initiated by the cDAC), if there is a data descriptor JSON that can be loaded (ie no conflicts) would there be a benefit of pre-reading it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser machinery looks at the listed sub-descriptor pointers and if the values are non-null would recursively read in and merge the sub-descriptor. This would allow us to have sub-descriptor 'slots' that are not always used.

It requires us to know all types of sub-descriptors that we may possibly reference upfront (when we are generating the json at build time). After giving it more thought, it should not be a problem in practice. It is very unlikely that we will allow extending the runtime in unknown ways. Consider this feedback resolved.

The drawback is that the sub-descriptors couldn't be dynamically loaded (as you mention).

My concern was about dynamic loading at runtime. The difference is whether the runtime can load arbitrary unknown components dynamically, or whether the runtime can only load a known set of components dynamically. As I have said, I think it is fine to limit the runtime to known components.

Given the cDAC operates on a paused target, the memory between data descriptor initialization and contract use should not change

Yes, this should not be a problem with what we have now. (My gut feel is that we may need it to evolve the cDAC architecture to cache more and be less eager with pre-computing once we get to scenarios like single stepping, but that is a problem for future.)

Copy link
Member

@noahfalk noahfalk Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formalizing sub-descriptors seems like unnecessary complexity and constraint to me. The existing cDAC code seems amenable enough to dynamically loading new descriptors.

For example a runtime contract can declare any arbitrary field of a data structure to be a contract descriptor pointer:

CDAC_TYPE_FIELD(GCDacVars, /*pointer*/, StandaloneGcContractDesc, ...)

And cDAC code can load that contract descriptor on the fly if it needs to use it:

class GCContract : IGCContract
{
    IGCContract _standaloneGC;
    GCContract(Target t)
    {
        // do normal cDac stuff to read the value of a field
        ulong standaloneContractDesc = GetDescriptorFromGCDacVars(t);
        
        if(standaloneContractDesc != 0)
        {
             ContractDescriptorTarget.TryCreate(
                standaloneContractDesc,
                GetReadDelegate(target),
                GetGetThreadContextDelegate(target),
                out ContractDescriptorTarget standaloneGcTarget);
             standaloneGC = standaloneGcTarget.ContractRegistry.GCContract;
        }
    }
    
    public void EnumerateHeap()
    {
        if(_standaloneGC != null)
        {
            _standaloneGC.EnumerateHeap();
            return;
        }
        // do the normal built-in GC enumerate heap algorithm
    }
}

We could further simplify this a bit, it just shows the basic idea without diverging too far from how the code is currently structured. Is there any significant issue not treating ContractDescriptorTarget as a singleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the contracts don't know about ContractDescriptorTarget or the read/write delegates. They interact with the target through the abstract Target class.

This change would be possible but require adding some complexity to the managed side. Either having multiple targets (and dealing with properly flushing and using the correct one) or merging the globals/types together.

The current plan is to always load a sub-descriptor for the GC contract, even when we use the default GC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current plan is to always load a sub-descriptor for the GC contract, even when we use the default GC.

I think we'd be better off changing that plan and doing light changes to the managed code instead. Its harder to evolve interfaces between components (the contract descriptor format) than it is to change their internal implementation details so if complexity needs to be added somewhere I think we should bias towards putting it in the managed cDac implementation. Sub-descriptors also add constraints that we might find awkward later:

  • They prevent dynamic loaded dlls from defining new contracts. For example if one day we thought the GC contract would be better factored as two separate contracts it would be a breaking change for standalone GC that prevents it from being used on downlevel runtimes.
  • They prevent dynamic loaded dlls from having patterns other than singletons. For example imagine we have JIT plugin interface and we'd like to use the built-in JIT to compile some methods and the plugin JIT compiles other methods. We might want to access two instances of some JIT contract at the same time, not have one replace the other.

},
"contracts": {"Thread": 1, "GCHandle": 1, "ThreadStore": 1}
}
```

## Contract symbol

To aid in discovery, the contract descriptor should be exported by the module hosting the .NET
To aid in discovery, the main contract descriptor should be exported by the module hosting the .NET
runtime with the name `DotNetRuntimeContractDescriptor` using the C symbol naming conventions of the
target platform.

Expand Down
21 changes: 19 additions & 2 deletions docs/design/datacontracts/data_descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ endianness. The types `nint`, `nuint` and `pointer` have target architecture po
The data descriptor consists of:
* a collection of type structure descriptors
* a collection of global value descriptors
* an optional collection of pointers to sub-descriptors

## Types

Expand Down Expand Up @@ -92,6 +93,15 @@ The value must be an integral constant within the range of its type. Signed val
natural encoding. Pointer values need not be aligned and need not point to addressable target
memory.

## Sub-descriptor descriptors

Each sub-descriptor descriptor is effectively a global with a type of `pointer`. They will consist of:
* a name
* a pointer value

If the value is non-null, the pointer points to another [contract descriptor](contract-descriptor.md#contract-descriptor-1).

When parsing a data descriptor with sub-descriptors each sub-descriptor should be parsed then its type, global, and contract values should be merged in. If any conflicts arise when merging in sub-descriptor data, this is an error and behavior is undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any conflicts arise when merging in sub-descriptor data, this is an error and behavior is undefined.

This design means that the components involved need to be aware of each other to avoid conflicts. Just pointing it out.


## Physical descriptors

Expand Down Expand Up @@ -129,6 +139,7 @@ The toplevel dictionary will contain:
* optional `"baseline": "BASELINE_ID"` see below
* `"types": TYPES_DESCRIPTOR` see below
* `"globals": GLOBALS_DESCRIPTOR` see below
* optional `"sub-descriptors": SUB_DESCRIPTORS_DESCRIPTOR` see below

Additional toplevel keys may be present. For example, the in-memory data descriptor will contain a
`"contracts"` key (see [contract descriptor](./contract_descriptor.md#Compatible_contracts)) for the
Expand Down Expand Up @@ -233,7 +244,9 @@ Note that a two element array is unambiguously "type and value", whereas a one-e
unambiguously "indirect value".


**Both formats**
### Sub-descriptor Values

Sub-descriptor values will be an additional array, with the same specification as [global values](#Global-values) with the exception that the only valid value type is a `pointer`.

#### Specification Appendix

Expand Down Expand Up @@ -284,7 +297,7 @@ string. For pointers, the address can be stored at a known offset in an in-proc
array of pointers and the offset written into the constant JSON string.

The indirection array is not part of the data descriptor spec. It is part of the [contract
descriptor](./contract_descriptor.md#Contract_descriptor).
descriptor](./contract-descriptor.md#Contract_descriptor).


## Example
Expand Down Expand Up @@ -345,6 +358,10 @@ The following is an example of an in-memory descriptor that references the above
"FEATURE_COMINTEROP": 0,
"s_pThreadStore": [ 0 ], // indirect from aux data offset 0
"RuntimeID": "windows-x64"
},
"sub-descriptors":
{
"GC": [ 1 ] // indirect from aux data offset 1
}
}
```
Expand Down
6 changes: 5 additions & 1 deletion docs/design/datacontracts/datacontracts_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ More details are provided in the [data descriptor spec](./data_descriptor.md).

#### Global Values

Global values which can be either primitive integer constants or pointers.
Global values which can be either primitive integer constants, pointers, or strings.
All global values have a string describing their name, a type, and a value of one of the above types.

#### Data Structure Layout
Expand All @@ -41,6 +41,10 @@ The determinate size of a structure may be larger than the sum of the sizes of t
in the data descriptor (that is, the data descriptor does not include every field and may not
include padding bytes).

#### (Optional) Sub-descriptor pointers

Sub-descriptors are special global values which contain a pointer to another data descriptor. These are used when data definitions are not known by the runtime at compile time but may be known by an external component. In that case the data descriptor defers to the external component to describe its data.

### Compatible Contract

Each compatible contract is described by a string naming the contract, and a uint32 version. It is an ERROR if multiple versions of a contract are specified in the contract descriptor.
Expand Down