Skip to content

Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397#7247

Merged
stephentoub merged 5 commits intomainfrom
copilot/fix-datauriparser-default-behaviour
Feb 3, 2026
Merged

Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397#7247
stephentoub merged 5 commits intomainfrom
copilot/fix-datauriparser-default-behaviour

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

  • Analyze the issue and RFC 2397 specification
  • Confirm RFC 2397 states: when media type is omitted, default to text/plain;charset=US-ASCII
  • Find the relevant code in DataUriParser.cs and DataContent.cs
  • Find existing test file at DataContentTests.cs
  • Fix the DataUriParser.Parse method to return default media type when omitted
  • Add DefaultMediaType => DefaultMediaType to top of known media types switch (per review feedback)
  • Refactor Ctor_OmittedMediaType_DefaultsToTextPlain test to use [Theory] with [InlineData]
  • Refactor Ctor_OmittedMediaType_CanBeOverridden test to use [Theory] with [InlineData]
  • Refactor test assertions to use static local function (per review feedback)
  • Build and run tests to verify the fix (113 DataContent tests pass)
  • Run code review and address feedback
  • Run CodeQL security check (no security issues found)
Original prompt

This section details on the original issue you should resolve

<issue_title>Extensions.AI: DataUriParser does not honour RFC 2397 default behaviour when media type omitted</issue_title>
<issue_description>### Description

The Microsoft.Extensions.AI.Abstractions includes a DataUriParser used by DataContent to parse data URIs as part of ChatMessages. The parser is documented as being a minimal data URI parser based on RFC 2397 (see comment at the top of the class).

However, it does not conform to RFC 2397 when the media type is omitted. According to the RFC, omitting the media type should default to text/plain;charset=US-ASCII. Instead, the parser throws an exception when the media type is missing.

This prevents valid RFC-compliant data URIs from being parsed successfully, causing errors.

RFC Reference

RFC 2397 states:

If is omitted, it defaults to text/plain;charset=US-ASCII.

Example of valid URI per RFC:

data:;base64,77u/QWER...

The link to this very RFC doc is included in the comment inside the DataUriParser code itself:
https://datatracker.ietf.org/doc/html/rfc2397

Reproduction Steps

using Microsoft.Extensions.AI;

var uri = new Uri("data:;base64,SGVsbG8=");

var content = new DataContent(uri);

Expected behavior

Successfully parse using default media type text/plain;charset=US-ASCII if non is present between the data: and ;base64 tags

Actual behavior

Instead, we are receiving an error: uri did not contain a media type, and mediaType was not provided. (Parameter 'mediaType')

This comes from this exact line in the DataContent.cs#L99

Regression?

No response

Known Workarounds

Before passing the URI to the DataContent class, check if there is a missing media type between the data: and base64; tags and insert text/plain as per the RFC standard.

Configuration

.NET v10
C# c14
Microsoft.Extensions.AI v10.2.0

Other information

We discovered this because we are using the built-in parser from Dart in our Frontend which seems to conform to the RFC standard and therefore omits the text/plain mimetype when parsing txt files. We'd prefer not to have to hack in the text/plain ourselves and expect the DataUriParser to conform to the RFC standards as it says.</issue_description>

<agent_instructions>Please double-check the validity of the claims made in this issue as well as the current DataContent / DataUriParser implementation. If there's a bug to be fixed, please fix it and ensure there are appropriate tests added / changed in the existing test files to fully cover it.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Fix DataUriParser to honour RFC 2397 default behaviour Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397 Jan 30, 2026
Copilot AI requested a review from stephentoub January 30, 2026 15:23
@stephentoub stephentoub marked this pull request as ready for review January 30, 2026 15:30
@stephentoub stephentoub requested a review from a team as a code owner January 30, 2026 15:30
Copilot AI review requested due to automatic review settings January 30, 2026 15:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the AI abstractions data URI handling to conform to RFC 2397 by defaulting omitted media types to text/plain;charset=US-ASCII, and adds tests to cover the new behavior.

Changes:

  • Introduces DataUriParser.DefaultMediaType (text/plain;charset=US-ASCII) and uses it in Parse when the media-type metadata section is empty.
  • Extends IsValidMediaType’s fast-path table to recognize text/plain;charset=US-ASCII without invoking MediaTypeHeaderValue.TryParse.
  • Adds unit tests verifying that omitted media types in data URIs default correctly and that an explicitly supplied media type still overrides the URI’s default.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataUriParser.cs Adds a default media type constant, applies RFC 2397 defaulting logic when the metadata span is empty, and recognizes the default in the known media types fast-path table.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/DataContentTests.cs Adds tests to validate defaulting behavior for omitted media types (including base64 and non-base64 cases, URI vs string constructors) and that an explicit mediaType parameter overrides the default.

@stephentoub stephentoub merged commit 99b3272 into main Feb 3, 2026
6 checks passed
@stephentoub stephentoub deleted the copilot/fix-datauriparser-default-behaviour branch February 3, 2026 15:26
This was referenced Feb 11, 2026
ptr727 added a commit to ptr727/LanguageTags that referenced this pull request Feb 12, 2026
Updated [csharpier](https://github.com/belav/csharpier) from 1.2.5 to
1.2.6.

<details>
<summary>Release notes</summary>

_Sourced from [csharpier's
releases](https://github.com/belav/csharpier/releases)._

## 1.2.6

## What's Changed
### [Bug]: XML with DOCTYPE results in "invalid xml" warning
[#​1809](belav/csharpier#1809)
CSharpier was not formatting xml that included a doctype and instead
reporting that it was invalid xml.
```xml
<?xml version="1.0"?>
<!DOCTYPE staff SYSTEM "staff.dtd"[
    <!ENTITY ent1 "es">
]>
<staff></staff>
```
### [Bug]: Initializing a span using `stackalloc` leads to different
formatting compared to `new`
[#​1808](belav/csharpier#1808)
When initializing a spacn using stackalloc, it was not being formatting
consistently with other code
```c#
// input & expected output
Span<int> metatable = new int[]
{
    00000000000000000000000001,
    00000000000000000000000002,
    00000000000000000000000003,
};

Span<int> metatable = stackalloc int[]
{
    00000000000000000000000001,
    00000000000000000000000002,
    00000000000000000000000003,
};

// 1.2.5
Span<int> metatable = new int[]
{
    00000000000000000000000001,
    00000000000000000000000002,
    00000000000000000000000003,
};

Span<int> metatable =
    stackalloc int[] {
        00000000000000000000000001,
        00000000000000000000000002,
        00000000000000000000000003,
    };

```
### [Bug]: Comments in otherwise empty object pattern disappear when
formatting [#​1804](belav/csharpier#1804)
CSharpier was removing comments if they were the only content of an
object pattern.
```c#
// input & expected output
var match = obj is {
    //Property: 123
 ... (truncated)

Commits viewable in [compare view](belav/csharpier@1.2.5...1.2.6).
</details>

Updated [Microsoft.Extensions.Http.Resilience](https://github.com/dotnet/extensions) from 10.2.0 to 10.3.0.

<details>
<summary>Release notes</summary>

_Sourced from [Microsoft.Extensions.Http.Resilience's releases](https://github.com/dotnet/extensions/releases)._

## 10.3.0

## What's Changed
* Bump version to 10.3.0 for next development cycle by @​Copilot in dotnet/extensions#7197
* Fix race condition in UnreliableL2Tests.WriteFailureInvisible by @​Copilot in dotnet/extensions#7075
* Set Microsoft.McpServer.ProjectTemplates version to align with MCP packages by @​jeffhandley in dotnet/extensions#7170
* ToChatResponse: Merge AdditionalProperties into ChatMessage instead of ChatResponse by @​Copilot in dotnet/extensions#7194
* Fix NRT resolution for AIFunction parameters. by @​eiriktsarpalis in dotnet/extensions#7200
* Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @​dependabot[bot] in dotnet/extensions#7198
* Add .npmrc next to package.json and add lockfile for PublishAIEvaluationReport by @​akoeplinger in dotnet/extensions#7108
* Bump qs from 6.14.0 to 6.14.1 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @​dependabot[bot] in dotnet/extensions#7189
* Bump js-yaml from 4.1.0 to 4.1.1 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @​dependabot[bot] in dotnet/extensions#7054
* Bump validator from 13.15.20 to 13.15.23 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @​dependabot[bot] in dotnet/extensions#7103
* Update AI changelogs by @​stephentoub in dotnet/extensions#7206
* Merge changes from internal after 10.2 release by @​joperezr in dotnet/extensions#7205
* Merge changes from release/10.2 to main by @​joperezr in dotnet/extensions#7209
* Categorize MEAI001 experimental APIs by @​Copilot in dotnet/extensions#7116
* [main] Update dependencies from dotnet/arcade by @​dotnet-maestro[bot] in dotnet/extensions#7212
* Update Package Validation Baseline to 10.2.0 by @​Copilot in dotnet/extensions#7208
* Enable package validation for M.E.AmbientMetadata.Build by @​evgenyfedorov2 in dotnet/extensions#7213
* [5752] FakeLogCollector waiting capabilities by @​Demo30 in dotnet/extensions#6228
* Set network isolation policy for extensions-ci by @​wtgodbe in dotnet/extensions#7221
* Fix FunctionInvokingChatClient invoke_agent span detection with exact match or space delimiter by @​Copilot in dotnet/extensions#7224
* Add Ordinal into ordering by @​cincuranet in dotnet/extensions#7225
* Remove AIFunctionDeclaration tools on last iteration in FunctionInvokingChatClient by @​Copilot in dotnet/extensions#7207
* Remove unnecessary description tags by @​gewarren in dotnet/extensions#7226
* Fix FunctionInvokingChatClient to respect ChatOptions.Tools modifications by function tools by @​Copilot in dotnet/extensions#7218
* Add LoadFromAsync and SaveToAsync helper methods to DataContent by @​Copilot in dotnet/extensions#7159
* Bump lodash from 4.17.21 to 4.17.23 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript by @​dependabot[bot] in dotnet/extensions#7227
* Add logging to FunctionInvokingChatClient for approval flow, error handling, and loop control by @​Copilot in dotnet/extensions#7228
* [main] Update dependencies from dotnet/arcade by @​dotnet-maestro[bot] in dotnet/extensions#7230
* Allow FunctionResultContent pass-through when CallId matches by @​Copilot in dotnet/extensions#7229
* Propagate CachedInputTokenCount in OpenTelemetry telemetry by @​Copilot in dotnet/extensions#7234
* Add InvocationRequired property to FunctionCallContent by @​Copilot in dotnet/extensions#7126
* Escape the JSON data before embedding in Evaluation reports by @​peterwald in dotnet/extensions#7238
* Update mcpserver template to ModelContextProtocol 0.7.0-preview.1 by @​Copilot in dotnet/extensions#7236
* Update aiagent-webapi template to Agent Framework 1.0.0-preview.260127.1 by @​Copilot in dotnet/extensions#7237
* Fix token metric unit to use UCUM format {token} by @​stephentoub in dotnet/extensions#7241
* Add server tool call support to OpenTelemetryChatClient per semantic conventions by @​Copilot in dotnet/extensions#7240
* Preserve extra JSON schema properties in ToolJson serialization by @​Copilot in dotnet/extensions#7250
* Bring new cpu.requests formula from Kubernetes by @​amadeuszl in dotnet/extensions#7239
* Update M.E.AI changelogs with recent changes by @​stephentoub in dotnet/extensions#7242
* Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397 by @​Copilot in dotnet/extensions#7247
* Fix deadlock in ServiceEndpointWatcher when disposing change token registration by @​ReubenBond in dotnet/extensions#7255
* Rename FunctionCallContent.InvocationRequired to InformationalOnly with inverted polarity by @​Copilot in dotnet/extensions#7262
* Fix approval request/response correlation in FunctionInvokingChatClient by @​Copilot in dotnet/extensions#7261
* Add ReasoningOptions to ChatOptions by @​Copilot in dotnet/extensions#7252

## New Contributors
* @​cincuranet made their first contribution in dotnet/extensions#7225
* @​ReubenBond made their first contribution in dotnet/extensions#7255

 ... (truncated)

Commits viewable in [compare view](dotnet/extensions@v10.2.0...v10.3.0).
</details>

Updated [Microsoft.Extensions.Logging.Abstractions](https://github.com/dotnet/dotnet) from 10.0.2 to 10.0.3.

<details>
<summary>Release notes</summary>

_Sourced from [Microsoft.Extensions.Logging.Abstractions's releases](https://github.com/dotnet/dotnet/releases)._

## 10.0.3

[Release](https://github.com/dotnet/core/releases/tag/v10.0.3)

Commits viewable in [compare view](https://github.com/dotnet/dotnet/commits/v10.0.3).
</details>

Updated [Microsoft.SourceLink.GitHub](https://github.com/dotnet/dotnet) from 10.0.102 to 10.0.103.

<details>
<summary>Release notes</summary>

_Sourced from [Microsoft.SourceLink.GitHub's releases](https://github.com/dotnet/dotnet/releases)._

## 10.0.103

You can build .NET 10.0 from the repository by cloning the release tag `v10.0.103` and following the build instructions in the [main README.md](https://github.com/dotnet/dotnet/blob/v10.0.103/README.md#building).

Alternatively, you can build from the sources attached to this release directly.
More information on this process can be found in the [dotnet/dotnet repository](https://github.com/dotnet/dotnet/blob/v10.0.103/README.md#building-from-released-sources).

Attached is the PGP signature for the GitHub generated tarball. You can find the public key at https://dot.net/release-key-2023

Commits viewable in [compare view](https://github.com/dotnet/dotnet/commits/v10.0.103).
</details>

Updated [Serilog](https://github.com/serilog/serilog) from 4.3.0 to 4.3.1.

<details>
<summary>Release notes</summary>

_Sourced from [Serilog's releases](https://github.com/serilog/serilog/releases)._

## 4.3.1

## What's Changed
* Remove SourceLink by @​SimonCropp in serilog/serilog#2183
* Handle Exception.ToString failures in text formatter by @​krisbiradar in serilog/serilog#2197
* Remove char[] allocation by @​karpinsn in serilog/serilog#2198
* Remove backpressure from XMLDoc by @​timothycoleman in serilog/serilog#2203
* Don't enable XDOC for tests by @​nblumhardt in serilog/serilog#2205
* Target and test on net10 by @​SimonCropp in serilog/serilog#2206
* Fix trimming error when Serilog is a transitive dependency by @​Numpsy in serilog/serilog#2214
* Inline TraceId and SpanId JSON string formatting by @​SimonCropp in serilog/serilog#2215

## New Contributors
* @​krisbiradar made their first contribution in serilog/serilog#2197
* @​karpinsn made their first contribution in serilog/serilog#2198
* @​timothycoleman made their first contribution in serilog/serilog#2203
* @​Numpsy made their first contribution in serilog/serilog#2214

**Full Changelog**: serilog/serilog@v4.3.0...v4.3.1

Commits viewable in [compare view](serilog/serilog@v4.3.0...v4.3.1).
</details>

Updated [System.CommandLine](https://github.com/dotnet/dotnet) from 2.0.2 to 2.0.3.

<details>
<summary>Release notes</summary>

_Sourced from [System.CommandLine's releases](https://github.com/dotnet/dotnet/releases)._

No release notes found for this version range.

Commits viewable in [compare view](https://github.com/dotnet/dotnet/commits).
</details>

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pieter Viljoen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extensions.AI: DataUriParser does not honour RFC 2397 default behaviour when media type omitted

3 participants