buettner · buettner · Apr 13, 2021 · Apr 8, 2021 · Apr 12, 2021 · Apr 12, 2021
diff --git a/README.md b/README.md
@@ -68,12 +68,12 @@ Where:
 Users can opt-out of the feature at any time. Furthermore, users can temporarily opt-out of the feature by using their browser’s private browsing mode.
 
 #### Publisher opt-out
-One option for origin-wide opt-out is to leverage the publisher's DNS record:
+Publishers can opt out by disallowing connections in their [traffic advice](traffic-advice.md). This advice would be fetched and cached by the proxy, and can be used by publishers by adding a single resource to their origin at a well-known path.
+
+Another option for origin-wide opt-out is to leverage the publisher's DNS record:
 * Publishers specify in their DNS entry that they are opting out of proxied prefetching (completely or with some TBD granularity if necessary). 
 * The DNS check would be done by the proxy for privacy reasons;  issuing a DNS request from the browser before navigation would share prefetch information with the DNS resolver and potentially the target host. 
 
-Alternatively (or in addition), we could define a [/.well-known URL](https://tools.ietf.org/html/rfc5785) that can be used for publisher opt-out, and this URL would be fetched and cached by the proxy. This has the advantage that it is easier for developers to add a new resource than to modify their DNS record. 
-
 Ideally, the browser would fetch the opt-out signal *before* making a connection to the proxy. While there are proposals to enable anonymous fetching of both DNS records ([Oblivious DNS](https://tools.ietf.org/html/draft-pauly-dprive-oblivious-doh-00)) and HTTP resources ([Oblivious HTTP](https://tools.ietf.org/html/draft-thomson-http-oblivious-00)), neither is well-supported yet. If either of those proposals gains traction, we may want to revisit the publisher opt-out design to take advantage of Oblivous fetching.
 
 In addition, publishers can opt-out for individual requests, for example,  when dealing with temporary traffic spikes or other issues. For these, publishers should look for the `Purpose: prefetch` request header and reject requests accordingly (see [Geolocation](https://github.com/buettner/private-prefetch-proxy#geolocation) for an example use case).

diff --git a/traffic-advice.md b/traffic-advice.md
@@ -0,0 +1,47 @@
+# Traffic Advice
+
+Publishers may wish not to accept traffic from [private prefetch proxies](README.md) and other sources other than direct user traffic, for instance to reduce server load due to speculative prefetch activity.
+
+We propose a well-known "traffic advice" resource, analogous to `/robots.txt` (for web crawlers), which allows an HTTP server to declare that implementing agents should stop sending traffic to it for some time.
+
+## Proposal
+
+HTTP request activity can broadly be divided into:
+* activity on behalf of a user interaction (e.g., a web browser a web page requested by the user), or which for another reason cannot easily be discarded
+* activity for which there is an existing specialized mechanism for throttling traffic (e.g. web crawlers respecting `robots.txt`)
+* activity which can easily be discarded (e.g., because it corresponds to a prefetch which improves loading performance but not correctness) at the server's request (e.g., because it is under load or the operator otherwise does not wish to serve non-essential traffic)
+
+Applications in the third category should consider acting as *agents which respect traffic advice*, so as to respect the server operator's wishes with a minimum resource impact.
+
+Agents which respect traffic advice should fetch the well-known path `/.well-known/traffic-advice`. If it returns a response with an [ok status](https://fetch.spec.whatwg.org/#ok-status) and a `application/trafficadvice+json` MIME type, the response body should contain valid UTF-8 encoded JSON like the following:
+
+```json
+[
+    {"user_agent": "prefetch-proxy", "disallow": true}
+]
+```
+
+Each agent has a series of identifiers it recognizes, in order of specificity:
+* its own agent name (e.g. `"ExamplePrivatePrefetchProxy"`)
+* decreasingly specific generic categories that describe it, like `"prefetch-proxy"`
+* `"*"` (which applies to every implementing agent)
+
+It finds the most specific element of the response, and applies the corresponding advice (currently only a boolean which advises disallowing all traffic) to its behavior. The agent should respect the cache-related response headers to minimize the frequency of such requests and to revalidate the resource when it is stale.
+
+Currently the only advice is the key `"disallow"`, which specifies a boolean which, if present and `true`, advises the agent not to establish connections to the origin. In the future other advice may be added.
+
+If the response has a `404 Not Found` status (or a similar status), on the other hand, the agent should apply its default behavior.
+
+## Why not robots.txt?
+
+`robots.txt` is designed for crawlers, especially search engine crawlers, and so site owners have likely already established robots rules because they wish to limit traffic from crawlers -- even though they have no such concern about prefetch proxy traffic. The `robots.txt` format is also designed to limit traffic by path, which isn't appropriate for agents which do not know the path of the requests they are responsible for throttling (as with a CONNECT proxy carrying TLS traffic).
+
+A more similar textual format would be possible, but the format for parsing `robots.txt` is not consistently specified and implemented. By contrast, JSON implementations are widely available on a wide variety of platforms used by site owners and authors.
+
+## Application to private prefetch proxies
+
+For example, suppose a private prefetch proxy, `ExamplePrivatePrefetchProxy`, would like to respect traffic advice in order to allow site owners to limit inbound traffic from the proxy.
+
+When a client of the proxy service (e.g., a web browser) requests a connection to `https://www.example.com`, the proxy server issues an HTTP request for `https://www.example.com/.well-known/traffic-advice`. It receives the sample response body from above. It recognizes `"prefetch-proxy"` as the most specific advice to apply to itself.
+
+It caches this result (traffic is presently disallowed) at the proxy server (or even across multiple proxy server instances run by the same operator), and refuses client connections to `https://www.example.com` until an updated `/.well-known/traffic-advice` resource no longer disallows traffic. Even if a large number of proxy clients request connections to `https://www.example.com`, the site operator and its CDN do not receive traffic from the proxy except for infrequent requests to revalidate the traffic advice (which may be, for example, once per hour).