-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: Non-validated HttpHeaders enumeration #35126
Comments
Tagging subscribers to this area: @dotnet/ncl |
What exactly does the validation do? I am assuming its primarily mapping the header name to the specific properties on the headers class? Or is this to split them up into content and request collections? |
It parses the header value according to the rules for that header, creating the relevant data structure to hold the pieces; in some cases that's just the raw string, in some cases it's a primitive like an int or a DateTimeOffset, in some cases it's a dedicated type for that header, such as a MediaTypeHeaderValue. See all of the XxParser and XxValue types in https://github.com/dotnet/runtime/tree/master/src/libraries/System.Net.Http/src/System/Net/Http/Headers. |
Does SocketsHttpHandler need the validated headers, or is that more because of legacy API design choices for the headers collection? Could we have something like
That would expose all the headers as a dictionary that could be enumerated, inserted into, removed, modified without having to validate or know whether its a content header or not? |
Legacy design choices. The default enumerator was supposed to filter out invalid headers to stay consistent with the strongly typed properties. |
The ones it needs it will have already validated, e.g. Content-Length. It won't have done anything with the rest.
It would. It would also be an additional allocation for the
Request headers are already not validated by the implementation. If someone adds one with HttpHeaders.Add, the Add method itself is validating them, but if you call TryAddWithoutValidation, then it obviously doesn't, and SocketsHttpHandler isn't either (other than ones it needs, e.g. Content-Length). Similarly, SocketsHttpHandler isn't doing any general validation of response headers, just storing them into the relevant response headers collection with TryAddWithoutValidation. It's the act of taking them out that's validating them, whether via the enumerator or via the strongly-typed properties. This issue is about a mechanism to get them without that validation. |
Yuck - IMHO the current WithoutValidation functions are a band-aid on what is already becoming a clunky API - its not clear what the validation is actually doing. It may force an allocation, but putting a new collection on the request/response is probably cleaner, and could lead to eventual deprecation of the typed/split collections we have today. |
That suggests you think they were added later? They were not. |
The initial design emphasized protocol correctness and strongly typed object models representing aspects of that protocol. That's quite at odds with the current emphasis on performance.
That's unlikely, and would be a step backwards from a general usability perspective. Developers use the strongly typed APIs all the time. They just don't work as well for the bulk copy scenario we're focused on at the moment. Yes we likely need some new APIs for this copy scenario, but they'll need to interop with the old ones indefinitely. |
Triage:
Useful for reverse proxy and gRPC scenarios. |
@alnikola can you please take a look at this one as well? |
I wonder if we could get away with simply changing the enumerator here to not parse and return invalid values. It's a change in behavior, but I wonder how many people are actually relying on this. That wouldn't solve all of the problems enumerated above, but it would be a big improvement, and it wouldn't require any new API. |
What we need for YARP is the ability to read and write headers as fast as possible as we transfer data from the source request to the proxied call and back again. When looking at perf graphs of RPS/size, the per-request overhead of YARP seems to be a weak point compared to the other proxies - and headers are going to be an important part of that. |
Agree, it looks like the best solution from cost/benefit perspective. |
Triage: Plan AThe simplest approach that is to disable validation on condition is a safe change. Plan BWe decided to proceed with one of the following options for .NET 5. foreach (var h in resp.Headers.GetWithoutValidation()) { }
foreach (var h in resp.Content.Headers.GetWithoutValidation()) { } foreach (var h in resp.Headers.NonValidated) { }
foreach (var h in resp.Content.Headers.NonValidated) { } |
What is the static type returned from these? |
Something like this: public struct HeadersWithoutValidation : IEnumerable<string, IEnumerable<string>> {} |
Should it implement IDictionary and/or expose such lookup methods? |
(And as long as we're exposing this, we don't also want to avoid the per header allocation for the value (s)?) |
True. We could have something like: struct HeaderValuesWithoutValidation : IEnumerable<string> {}
struct HeadersWithoutValidation : IEnumerable<KeyValuePair<string, HeaderValuesWithoutValidation>> {} |
That does seem like goodness, but then, HttpHeaders itself does not implement IDictionary, so it would be weird to do it here.
I can certainly see some value here... Methods on HttpHeaders like GetValues, TryGetValues, Contains, etc presumably don't operate on invalid header values today, so it makes sense to have equivalents of these on HeadersWithoutValidation (or whatever this type is called). We already have HttpHeaders.TryAddWithoutValidation, but it would make sense to have an equivalent Add or TryAdd method on HeadersWithoutValidation. This all kinda makes me want to avoid creating new API here and just change so we no longer validate (i.e. Plan A). |
Would this contain a single value in the case we have the value as a single string, or still try to split it into values? |
I would leave these as minimal design -- only enumeration -- and we can add more APIs (lookup etc.) later if we want. |
How does one construct an |
Testing what? The implementation Isn't currently tied to the internal details of HttpHeaders, but it easily could be (and I've considered doing so). |
I meant consumers testing their own header processing code. I guess the answer is: write your own structs or use |
Or if you mean they want to unit test their code that consumes HeaderStringValues from an HttpResponseMessage, something like: private static HeaderStringValues Create(string headerName, string[] values)
{
var m = new HttpResponseMessage();
m.Headers.TryAddWithoutValidation(headerName, values);
return m.Headers.NonValidated[headerName];
} |
The proposed amendments look good. The constructors can always be added later when there's a compelling scenario. The net approved shape is now namespace System.Net.Http.Headers
{
public partial class HttpHeaders
{
public HttpHeadersNonValidated NonValidated { get; }
}
public readonly struct HttpHeadersNonValidated :
IReadOnlyDictionary<string, HeaderStringValues>
{
public int Count { get; }
public bool Contains(string headerName);
public bool TryGetValues(string headerName, out HeaderStringValues values);
public HeaderStringValues this[string headerName];
public Enumerator GetEnumerator();
public readonly struct Enumerator : IEnumerator<KeyValuePair<string, HeaderStringValues>>
{
public bool MoveNext();
public KeyValuePair<string, HeaderStringValues> Current { get; }
public void Dispose();
... // explicitly implemented interface members
}
... // explicitly implemented interface members
}
public readonly struct HeaderStringValues :
IReadOnlyCollection<string>
{
public int Count { get; }
public override string ToString();
public Enumerator GetEnumerator();
public readonly struct Enumerator : IEnumerator<string>
{
public bool MoveNext();
public string Current { get; }
public void Dispose();
... // explicitly implemented interface members
}
... // explicitly implemented interface members
}
} |
HeaderStringValues is basically a clone of StringValues but in the System.Net.Http.Headers namespace right? |
Seems like, and YARP will have to convert it to a StringValues as efficiently as possible. |
At the moment implementation is very similar, a wrapper for a string or a string[]. I've toyed with instead having it be a wrapper for the internal HeaderItemInfoStore, but the benefit of doing that is primarily in the "the header has been parsed into multiple values" case, which should be relatively rare in this scenario, such that the 99% case should just be HeaderStringValues is wrapping a string. The ToString in HeaderStringValues also factors in the right separator based on the internal HeaderDescriptor used by HttpHeaders, whereas StringValues always uses ",". And then of course there's the layering, namespace, etc. We discussed this a bit in the API review (noting that this re-review was just about adding the additional members to the struct that was already approved a year ago), and @halter73 is on point to start a discussion about what it would mean to move StringValues lower in the stack. |
We should spike this as soon as the new APIs are available. |
If you bring |
I've started the discussion with @davidfowl. If YARP can convert efficiently enough, it's probably not worth the time and effort to get a Microsoft.Extensions API into the core runtime for .NET 6. I'm assuming if we don't do it in .NET 6, we probably will never move StringValues. The redundancy is a little sad, but not a huge deal. |
The thing is that we have to live with decisions like this for decades. Lets try and clean this up in 6, or define an interface for the pattern so the different classes can be interoperable. |
That would defeat the non-allocating nature in the common case. |
If we're really concerned (and don't move down StringValues), we could add a (FWIW, I expect the HeaderStringValues to StringValues conversion cost to not be more than a blip either way. ) |
Both are structs. |
It could, but that would be about usability rather than efficiency, in terms of getting the data out of HeaderStringValues. |
Right, I kind of assumed that even if we were to TF |
At the moment we would: as with StringValues, HeaderStringValues just wraps a string or a string[]. If they remain separate, it's possible in the future that could change. Functionally right now the main difference is that HeaderStringValues.ToString() uses the proper separator for the represented header (rather than always using ","), which means it also needs to store that. |
EDIT June 1, 2021 by @stephentoub: This was previously approved, but we'd like to make some tweaks: #35126 (comment)
Proposed API
New
NonValidated
property returning all headers received by the client without any validation is to be added toHttpHeaders
collection.Usage example
Original proposal
Today to enumerate the response headers from an
HttpClient
HTTP request, you can write code like this:There are a few issues, here:
TryAddWithoutValidation
can be used to add request headers to be sent, and those headers will never be validated. And the handler can similarly useTryAddWithoutValidation
to store response headers into the response headers collection. But enumerating the headers in the above manner will force validation of all the headers, which incurs measurable overhead.response.Headers
andresponse.Content.Headers
.We can address (1) and (2) by adding an API like this:
As a test, I looked to see what headers my Edge browser is sending when connecting to a particular service on the net, and what response headers I got in response, ~12 headers of varying lengths in each direction, some standard, some non-standard, and I put that into a benchmark that hits a local server. The benchmark makes the request and enumerates all the response and response content headers:
With .NET Core 3.1, I get results like this:
We've already made very measurable perf improvements for .NET 5, so running against master I get:
When I then change the test to use an implementation of the above proposal:
I get:
Note that very similar numbers are also possible without a new API but with a (small?) behavioral change: we could change
HttpHeaders.GetEnumerator
to a) not validate and b) to return arrays that are only valid untilMoveNext
is called again, such that we can reuse the same array for all headers in the enumeration.Another variation on this would also address the 3rd cited issue around verbosity: we could instead expose such an API on
HttpResponseMessage
:in which case it internally would enumerate both
Headers
andContent.Headers
. If we did that, to go along with it we could add:which would determine whether the specified header should go into
Headers
orContent.Headers
.If we did want to add new API here, there's a question around the
HeaderStringValues
type above. ASP.NET uses theStringValues
type from Microsoft.Extensions.Primitives. A variety of options exist:Microsoft.
namespace.System.
namespace that's similar in nature, albeit with a few things cleaned up.cc: @scalablecory, @davidsh, @Tratcher, @samsp-msft
The text was updated successfully, but these errors were encountered: