Add an explainer for traffic advice. #10

jeremyroman · 2021-04-08T22:19:36Z

@buettner @domenic PTAL? If this clears this round of bikeshedding I can expand this into a more detailed spec in Bikeshed or something (though honestly it won't be a very long spec anyhow).

domenic · 2021-04-09T20:52:17Z

traffic-advice.md

+```
+
+Each agent has a series of identifiers it recognizes, in order of specificity:
+* its own agent name


What happens if I say {"user_agent": "Chrome", "disallow": true}? Will Chrome block user-initiated traffic to the origin?

In my view, no, Chrome would not respect (or even request) this advice with regard to user-initiated traffic. Chrome could do so with regard to traffic which is not immediately user-visible, like prefetch traffic. This declaration, supposing Chrome implemented this (separately from the prefetch proxy) would advise Chrome to shed any traffic it was willing and able to. I think Chrome should not consider a user navigation sheddable, though.

The latest draft is pretty clear now, but the relationship between the "client" and the "agent" could use a bit more. I'd suggest:

Adding ", e.g. ExamplePrefetchProxy" here (perhaps linked to the below section)?

Adding an example of "a client of the proxy service": something like "(e.g. a web browser)" would be accurate, I think?

domenic · 2021-04-09T20:53:30Z

traffic-advice.md

+]
+```
+
+Each agent has a series of identifiers it recognizes, in order of specificity:


I think "agent" needs much more definition. Is Chrome a user agent? Is curl? Is googlebot?

It sounds like the intent is to cover "agents" which don't send "direct user traffic". Those don't sound like user_agents to me...

I mean the term in the HTTP sense; the application which is constructing HTTP requests and sending them to a server, i.e. the application which would ordinarily identify itself with the User-Agent request header.

I don't love the term in this context, but I think it's consistent with RFC 7231 and the robots.txt format, both of which would consider curl and googlebot to be user agents.

domenic · 2021-04-09T20:55:01Z

traffic-advice.md

+
+Currently the only advice is the key `"disallow"`, which specifies a boolean which, if present and `true`, advises the agent not to establish connections to the origin. In the future other advice may be added.
+
+If the response has a `404 Not Found` status, on the other hand, the agent should apply its default behavior.


404 not found or other error condition (e.g. invalid JSON, network error, 5xx, 409, ...). You cover some of this above so just making this less specific will do the trick.

I was trying to focus on the obvious error case at the explainer level of depth.

There are a few details here that I think might merit some detailed discussion but which don't really affect the high-level proposal. For instance:

should 3XX redirects be followed? should they be cached?

perhaps a network error or 503 Service Unavailable (or maybe all 5XX codes) indicate that the server is busy and a prefetch proxy should prefer not to send it additional traffic at that time (either at the agent's discretion, or after the Retry-After if present), rather than assuming it's okay -- and is this a behavior that generalizes to all agents

Agreed that this will require more detail to specify but is this the right place for it?

Well, my general philosophy is the explainer doesn't need to be exhaustive, but should also avoid being inaccurate. So this sentence in particular is troubling because it's over-specific, and implies that e.g. 409s will not get the same treatment.

I've added "or a similar status" to hopefully imply that I'm not saying 404 need be the only status with this treatment. I have difficult seeing why a 409 response would make sense here (can GET requests generally generate conflicts?), and some like 429 Too Many Requests might conceptually make sense to parse as "disallow" (if there are too many requests for what is basically a static resource, should we really be sending more non-essential traffic).

It definitely is the case that handling of the various 3XX, 4XX and 5XX statuses will need slightly more text than I want here. I only meant to say what happens in the two cases likely to happen in practice.

domenic · 2021-04-09T20:55:30Z

traffic-advice.md

+
+## Proposal
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


Will the request for this resource go through an applicable service worker?

This probably arises from the clarity of how I'm using the term "agent" here, which I'm meaning to be closer to "HTTP user agent" than "web browser". Many agents, including web crawlers and the proposed proxy servers, don't implement HTML (thus service workers) at all.

Yeah, right now it's not clear whether the web browser would consult this file before using a proxy, versus the proxy doing this without the web browser in the loop.

I've added an example at the bottom. For a prefetch proxy, it makes the most sense for the proxy to do this (as it can easily and transparently cache the conclusion).

domenic · 2021-04-09T20:57:06Z

traffic-advice.md

+
+## Why not robots.txt?
+
+`robots.txt` is designed for crawlers, especially search engine crawlers, and so site owners have likely already established robots rules because they wish to limit traffic from crawlers -- even though they have no such concern about prefetch proxy traffic. The `robots.txt` format is also designed to limit traffic by path, which isn't appropriate for agents which do not know the path of the requests they are responsible for throttling (as with a CONNECT proxy carrying TLS traffic).


This kind of ties into my above feedback about agents. I think you want to divide the universe up into three categories:

Whatever agents this applies to

Whatever agents robots.txt applies to

Agents which won't care about either (I think normal-navigations from browsers fall into this category)

Added a short explanation of who should consider being an "agent which respects traffic advice". (I do consider it a politeness thing more than anything else; HTTP clients can and will send traffic anyway if they think it's more important than the server's advice.)

domenic · 2021-04-09T20:57:36Z

traffic-advice.md

+
+## Proposal
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


Any OK response should be allowed I think, not just 200

I think so, possibly minus 204 and 205, though honestly any except 200 is pretty silly here. (201 Created? What did I create?) I'm trying to explain the "good" behavior here; it wasn't clear to me that going through the exact handling of this edge case made the idea clearer.

domenic · 2021-04-09T20:57:56Z

traffic-advice.md

+
+## Proposal
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


Should this have a custom MIME type? Seems like most modern JSON formats do.

I didn't know of a reason to here (if there is one, happy to consider). The usual justification (e.g. for application/importmap+json) seems to be to prevent some JSON not intended for use as an import map from being used as one to modify browser behavior.

That doesn't seem to apply here, though, since no resource besides /.well-known/traffic-advice.json could be parsed as such.

domenic · 2021-04-09T20:58:18Z

traffic-advice.md

+
+## Proposal
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


Including file extensions is generally not good for well-known URLs, from what I understand.

I haven't seen this mentioned in RFC 8615 or the registry. I do note that most do not, however many of these correspond to types without a well-recognized file extension or are not intended to serve a response body.

My primary motivation for including a file extension and MIME type which are well established is that it enables a publisher to create this file in common HTTP servers like Apache and IIS by simply adding it to the web root, without needing to otherwise modify server configuration to set the correct MIME type.

This would be the first web spec that uses file extensions from what I understand. I'd really strongly recommend we not do this. That means you do need server configuration access, but you also should need server configuration access to modify /.well-known anyway.

Okay. I suspect modifying response headers is more work than adding a file in the web root, but probably at the same privilege level and hopefully not too difficult.

domenic · 2021-04-09T20:59:18Z

traffic-advice.md

+
+## Proposal
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


Somewhere you should mention that we'll always decode this as UTF-8 (ignoring charset) like other modern web platform features.

Happy to do this; do you have a reference handy where this has been done elsewhere? We ignore charset even if specified as non-UTF-8?

Import maps, WebSockets, server-sent events, module scripts, worker scripts, and Fetch's json() come to mind.

domenic

LGTM with a bit more clarity on the client/agent distinction.

domenic · 2021-04-12T21:46:36Z

traffic-advice.md

+```
+
+Each agent has a series of identifiers it recognizes, in order of specificity:
+* its own agent name


The latest draft is pretty clear now, but the relationship between the "client" and the "agent" could use a bit more. I'd suggest:

Adding ", e.g. ExamplePrefetchProxy" here (perhaps linked to the below section)?

Adding an example of "a client of the proxy service": something like "(e.g. a web browser)" would be accurate, I think?

jeremyroman · 2021-04-13T17:29:49Z

Resolved latest comments.

mnot · 2022-03-28T00:53:50Z

Please register the .well-known location if you're going to use this -- see https://github.com/protocol-registries/well-known-uris

jeremyroman · 2022-03-28T19:39:45Z

Provisional registration request submitted: protocol-registries/well-known-uris#22

Add an explainer for traffic advice.

1d4403d

domenic reviewed Apr 12, 2021

View reviewed changes

jeremyroman added 2 commits April 12, 2021 12:00

explain agents a little more

9433c52

no file extension, change mime type, utf8, example

8798e18

domenic approved these changes Apr 12, 2021

View reviewed changes

more examples

95bc3fd

buettner merged commit 4cf1ad9 into buettner:main Apr 13, 2021

jeremyroman mentioned this pull request Jun 7, 2021

Remove the requirement for a specialised MIME type and rename the file to have .json extension #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an explainer for traffic advice. #10

Add an explainer for traffic advice. #10

jeremyroman commented Apr 8, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 12, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 12, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 12, 2021

jeremyroman Apr 12, 2021

domenic Apr 9, 2021

jeremyroman Apr 12, 2021

domenic Apr 12, 2021

domenic left a comment

domenic Apr 12, 2021

jeremyroman commented Apr 13, 2021

mnot commented Mar 28, 2022

jeremyroman commented Mar 28, 2022


		Currently the only advice is the key `"disallow"`, which specifies a boolean which, if present and `true`, advises the agent not to establish connections to the origin. In the future other advice may be added.

		If the response has a `404 Not Found` status, on the other hand, the agent should apply its default behavior.


		## Proposal

		Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice.json`. If it returns a `200 OK` response with the `application/json` MIME type, the response body should contain valid JSON like the following:


		## Why not robots.txt?

		`robots.txt` is designed for crawlers, especially search engine crawlers, and so site owners have likely already established robots rules because they wish to limit traffic from crawlers -- even though they have no such concern about prefetch proxy traffic. The `robots.txt` format is also designed to limit traffic by path, which isn't appropriate for agents which do not know the path of the requests they are responsible for throttling (as with a CONNECT proxy carrying TLS traffic).

Add an explainer for traffic advice. #10

Add an explainer for traffic advice. #10

Conversation

jeremyroman commented Apr 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyroman commented Apr 13, 2021

mnot commented Mar 28, 2022

jeremyroman commented Mar 28, 2022