Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-Origin Read Blocking (CORB) #681

Closed
anforowicz opened this issue Mar 8, 2018 · 35 comments
Closed

Cross-Origin Read Blocking (CORB) #681

anforowicz opened this issue Mar 8, 2018 · 35 comments
Labels
security/privacy There are security or privacy implications topic: orb

Comments

@anforowicz
Copy link
Contributor

Historically, browsers had rather lax Content-Type checking. We’ve been able to introduce stricter checks in some cases (e.g. blocking mislabeled scripts and stylesheets in presence of the nosniff header [1]) and unfortunately failed in some other cases (e.g. Firefox’s attempt to block mislabeled images in presence of the nosniff header [2, 3]).

Given Spectre, lax handling of mislabeled cross-origin responses carries new, significant security risks. We've developed a proposal, which we're calling Cross-Origin Read Blocking (CORB), which increases the strictness of cross-origin fetching semantics while trying to still stay web-compatible. CORB reduces the risk of leaking sensitive data by keeping it further from cross-origin web pages. In most browsers, it keeps such data out of untrusted script execution contexts. In browsers with Site Isolation, it can keep such data out of untrusted renderer processes entirely, helping even against speculative side channel attacks.

We're looking to collaborate with everyone on an interoperable set of changes to the web platform, so that blocking of cross-origin responses can be done consistently across all the browsers. Please take a look at the proposal and its compatibility impact in the CORB explainer and provide feedback in this thread on the algorithm itself, as well as on the next steps for trying to encode CORB into the relevant specs for web standards.

We believe that CORB has a reasonably low risk of breaking existing websites (see the “CORB and web compatibility” section in the explainer). We’ve spent a considerable amount of time trying to tweak CORB to minimize compatibility risk (e.g. introducing confirmation sniffing and skipping sniffing for HTML comments since JS can have them too) and are continuing to consider additional tweaks to minimize the risk further (e.g. we are trying to gather data that might inform how to handle text/plain and range requests). The remaining risk is mostly for nosniff responses labeled with a wrong MIME type - as pointed out above, stricter handling of such responses has always been desirable, but the Spectre threat makes this more urgent.

[1] https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-nosniff?
[2] #395
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1302539

@annevk annevk added the security/privacy There are security or privacy implications label Mar 9, 2018
@annevk
Copy link
Member

annevk commented Mar 9, 2018

cc @cdumez @youennf @travisleithead @evilpie @ckerschb @whatwg/security (please tell me or whatwg/meta if you want to be added to this team; it's basically for notification purposes of issues that need security input)

@wanderview
Copy link
Member

When service workers cache actual cross-origin responses (e.g. in ‘no-cors’ request mode), the responses are ‘opaque’ and therefore CORB can block such responses without changing the service worker's behavior (‘opaque’ responses have a non-accessible body even without CORB).

Opaque responses have a hidden body, yes, but the Cache entry still contains the filtered body. It must still be there for the service worker to use the Cache to service no-cors requests like , etc.

It seems CORB would require that Cache.match() not send the opaque body data to the renderer process until after the CORB checking is performed. The CORB checking would depend on what the service worker does with the opaque response. (If I understand correctly the CORB check requires knowing the destination of the response within the browser.)

FWIW, this does part does seem doable to me from an implementation perspective. Gecko's cache waits to open the body file descriptor until body consumption begins. So as long as we can perform the CORB check in the renderer process as part of the respondWith() call, then it seems possible to achieve this.

@annevk
Copy link
Member

annevk commented Apr 3, 2018

Having read through https://chromium.googlesource.com/chromium/src/+/master/services/network/cross_origin_read_blocking_explainer.md in more detail I wonder why it doesn't call out fetch() a bit more prominently. fetch() supports "no-cors" and therefore can get at cross-origin resources without CORS. Those resources are opaque to most callers, but it seems that special care needs to be taken to not leak them to the wrong process until explicitly called for, including when they get persisted to disk...

@csreis
Copy link

csreis commented Apr 3, 2018

Maybe @anforowicz or Nick Carter can chime in more, but in terms of persisting no-cors to disk, Chrome's implementation is still able to write the response to disk without giving it to the renderer process that made the request. (I think we're using DetachableResourceHandler for that, FWIW.) That may be worth mentioning in the explainer, since I think it matters for preload and ServiceWorkers as well. Were there other cases you were concerned about?

@annevk
Copy link
Member

annevk commented Apr 4, 2018

@csreis storing the response from a fetch() in the Cache API can be done outside service workers too, but yeah, that's roughly what we want to have in the specification around "opaque filtered responses" I think. To make it very clear these need to remain out-of-process for as long as possible.

(I'm not entirely sure where we should put the canonical description of the class of attacks, standards-wise. Either here or in HTML I suppose.)

@jakearchibald
Copy link
Collaborator

#144 (comment) - see "Attack 4".

It looks like CORB will handle this attack for particular mime types, but I think it still makes sense to apply the extra blocking I proposed, since it'll cover all mime types.

Let me know if that's wrong.

@evilpie
Copy link
Contributor

evilpie commented Apr 4, 2018

I am a bit concerned about the "Mislabeled image (nosniff)" case. Do you have any data on how common text/html is for images, with nosniff? At least for JavaScript this number was quite high, and even higher in the HTTP Archive report. This number however doesn't take into account no-sniff. Do you have that data or maybe should we ask the HTTP Archive people again?

@anforowicz
Copy link
Contributor Author

@evilpie, we have some data in the "Quantifying CORB impact on existing websites" section of the explainer. After excluding responses that had an explicit "Content-Length: 0" response header, we see that 0.115% of all CORB-eligible responses might have been observably blocked due to a nosniff header or range request.

The real question here is: how many of these 0.115% contained images (and were undesirably disrupted by CORB) VS non-images (and were non-decodable with and without CORB). At this point we only have anecdotal data - we were able to only repro one such case in the wild and it turned out to be a tracking pixel that returned a html doc as a response.

@jakearchibald
Copy link
Collaborator

I'm currently looking to enable range requests to pass through a service worker safely, and later I'll specify how various web APIs should make range requests and validate responses.

Although CORB is involved in the same area, the goals are different, but we should be aware of overlap 😄.

Here's a summary of the similarities and differences, as I understand them:

CORB's goal is to prevent bringing data into the content process, whereas I'm aiming to prevent exposing data to script. CORB is best-effort, with compatibility in mind, whereas I need to strictly avoid exposing opaque data to script.

CORB will filter opaque partial responses if they match particular content types. This prevents an audio/video element being used to bring data that's potentially sensitive into the content process.

#560 prevents Attack 4, where a <script> is given a partial response that may contain private data. CORB will make this a lot harder for particular content types, but #560 prevents this particular attack for all content types.

CORB recommends against multipart range requests. Currently range requests aren't specced from that API's point of view, but I'm trying to define it. I don't plan to use multiple ranges in a single response, and once specced, browsers shouldn't make kinds of range requests that aren't explicitly allowed.

I intend to make media elements reject responses that would result in a mix of opaque and visible data being treated as the same media resource. This prevents Attack 1.

I intend to make media elements reject responses that would result in opaque data from multiple URLs being treated as the same media resource. This prevents Attack 2.

I intend to make range supporting APIs fail if the partial response starts at an offset other than the requested range. This prevents Attack 3.

In intend to make downloads fail/restart if content identifying headers change between requests. Such as total length in Content-Range, Content-Type, ETag, Last-Modified.

@jakearchibald
Copy link
Collaborator

Why does CORB blocking filter the response? Wouldn't it be more robust to replace the response with a generic empty response?

Although they're less sensitive, CORS safelisted headers and status codes also leak data.

@jakearchibald
Copy link
Collaborator

I don't think the cache API is part of this. If responses are filtered/blocked as part of fetch, then only correctly filtered/blocked responses will go into the cache.

@wanderview
Copy link
Member

wanderview commented Apr 10, 2018

I don't think the cache API is part of this. If responses are filtered/blocked as part of fetch, then only correctly filtered/blocked responses will go into the cache.

Doesn't this depend on how cache.add() is implemented internally?

Edit: Oh, you mean at the spec level. Nevermind.

@csreis
Copy link

csreis commented Apr 14, 2018

Regarding range requests:

Thanks for covering the overlap here, @jakearchibald! I agree that CORB overlaps with the attack 4 defense, but only for certain content types, so your original plans still seem relevant.

CORB's goal is to prevent bringing data into the content process, whereas I'm aiming to prevent exposing data to script. CORB is best-effort, with compatibility in mind, whereas I need to strictly avoid exposing opaque data to script.

Correct.

CORB will filter opaque partial responses if they match particular content types. This prevents an audio/video element being used to bring data that's potentially sensitive into the content process.

Correct.

#560 prevents Attack 4, where a <script> is given a partial response that may contain private data. CORB will make this a lot harder for particular content types, but #560 prevents this particular attack for all content types.

Correct.

CORB recommends against multipart range requests. Currently range requests aren't specced from that API's point of view, but I'm trying to define it. I don't plan to use multiple ranges in a single response, and once specced, browsers shouldn't make kinds of range requests that aren't explicitly allowed.

Just to clarify, are you saying that multipart range requests wouldn't generate responses with multipart/byteranges content types after your changes? In general or just for service worker? We would love it if the content type reflected what was in the response, since we weren't eager about parsing the multipart response in the browser process to determine what was in it. For now, we're just recommending against supporting it for sensitive data.

I intend to make media elements reject responses that would result in a mix of opaque and visible data being treated as the same media resource. This prevents Attack 1.

I intend to make media elements reject responses that would result in opaque data from multiple URLs being treated as the same media resource. This prevents Attack 2.

I intend to make range supporting APIs fail if the partial response starts at an offset other than the requested range. This prevents Attack 3.

In intend to make downloads fail/restart if content identifying headers change between requests. Such as total length in Content-Range, Content-Type, ETag, Last-Modified.

Sounds good to me (and orthogonal to CORB).

@csreis
Copy link

csreis commented Apr 14, 2018

Regarding service worker and cache API:

It seems CORB would require that Cache.match() not send the opaque body data to the renderer process until after the CORB checking is performed. The CORB checking would depend on what the service worker does with the opaque response. (If I understand correctly the CORB check requires knowing the destination of the response within the browser.)

I don't think that's right. CORB shouldn't depend on what the destination of the request is, nor what service worker is going to do with the response. The intention is to not expose the data of the opaque response to the service worker in the renderer process at all. My understanding from discussions with @mattto, @anforowicz, and @nick-chromium was that service worker could still handle opaque responses without exposing that data to the renderer process, though it's worth clarifying the details on things like cache.add.

FWIW, this does part does seem doable to me from an implementation perspective. Gecko's cache waits to open the body file descriptor until body consumption begins. So as long as we can perform the CORB check in the renderer process as part of the respondWith() call, then it seems possible to achieve this.

It might be possible for you to do the check in the renderer process, but that defeats some of the benefit of CORB. Hopefully we can make that unnecessary.

@csreis storing the response from a fetch() in the Cache API can be done outside service workers too, but yeah, that's roughly what we want to have in the specification around "opaque filtered responses" I think. To make it very clear these need to remain out-of-process for as long as possible.

Again, I think the intention it to keep them out of process entirely, rather than for as long as possible. Otherwise an attacker could use the cache API to pull whatever they want into their process, correct? Maybe we can talk with @anforowicz, @nick-chromium, and @mattto about how we're handling it in Chrome.

I don't think the cache API is part of this. If responses are filtered/blocked as part of fetch, then only correctly filtered/blocked responses will go into the cache.

To be clear, I think we're ok with having blocked responses end up in the cache in general, as long as they're not in the renderer process.

(Sorry if I'm misunderstanding here.)

@jakearchibald
Copy link
Collaborator

@csreis

Just to clarify, are you saying that multipart range requests wouldn't generate responses with multipart/byteranges content types after your changes?

I'm saying that the browser shouldn't ever ask for multiple ranges in a single request. Is there anywhere we do this today? Or am I misunderstanding what you mean by multipart request?

To be clear, I think we're ok with having blocked responses end up in the cache in general

I think it's easiest if the body is replaced with an empty body long before it ends up in the cache API.

I'm happy with these empty-body responses ending up in the cache.

@csreis
Copy link

csreis commented Apr 20, 2018

Regarding service worker and cache API:

CORB shouldn't depend on what the destination of the request is, nor what service worker is going to do with the response. The intention is to not expose the data of the opaque response to the service worker in the renderer process at all. My understanding from discussions with @mattto, @anforowicz, and @nick-chromium was that service worker could still handle opaque responses without exposing that data to the renderer process, though it's worth clarifying the details on things like cache.add.

I met with @jakearchibald and @mattto this week to discuss the cache API and we agreed that CORB won't disrupt it, since the cache API is origin-specific. It's important to note that CORB doesn't take anything about the request into account, which means that if CORB blocks a response for a given origin, then it would be blocked no matter how that origin asked for it (even when retrieving it later from the cache API). Thus, it's fine for a ServiceWorker (or a page) to put an empty value for an opaque response into the cache API, since that response will always be opaque for that origin.

(This is different from preload and the network cache, where we do want CORB-blocked responses to end up on disk, so that they're fast after navigating to a cross-origin page. That doesn't require sending the data to the renderer process, though.)

FWIW, this does part does seem doable to me from an implementation perspective. Gecko's cache waits to open the body file descriptor until body consumption begins. So as long as we can perform the CORB check in the renderer process as part of the respondWith() call, then it seems possible to achieve this.

Given the above, hopefully the renderer process check is not necessary?

@anforowicz
Copy link
Contributor Author

Quick status update:

  • Making progress on a PR with Fetch spec changes covering nosniff and 206 behavior
  • For the last 2+ weeks we've beein running a 5% field trial of site-per-process on Chrome 66 stable channel - this also gives coverage for CORB. So far no issues were reported, but through telemetry (e.g. looking at some of the top 100 sites most often impacted by CORB) we've discovered previously unknown cases of html/js polyglots (see https://crbug.com/839945 and https://crbug.com/839425). Fixing handling of the polyglots will require small tweaks of CORB confirmation sniffing for HTML (and probably reinforces the earlier decision to start with spec-ing only the nosniff/206 cases first).

@anforowicz
Copy link
Contributor Author

I thought that I'd also share a link to the middle of a I/O '18 session where CORB was discussed: https://youtu.be/yIaYQGPuZbM?t=2614

annevk pushed a commit that referenced this issue May 17, 2018
CORB is an additional filter for responses of cross-origin "no-cors" 
fetches. It aims to provide defense-in-depth protection for JSON, 
HTML, XML (though not image/svg+xml), and (sometimes) text/plain 
resources against cross-process CPU exploits. It also makes it harder 
to use incorrectly labeled resources as scripts, images, fonts, etc.

Discussion and further work is tracked by #681 and #721.

Tests are in web-platform-tests's fetch/corb directory.
@xgqfrms

This comment has been minimized.

@annevk
Copy link
Member

annevk commented Sep 20, 2018

@csreis @anforowicz what's the timeline for defining the remainder of CORB? Other browsers would like to implement it as well so it'd help if it was fully defined.

@anforowicz
Copy link
Contributor Author

@annevk, I think that the only part of CORB that still requires an official description is the sniffing algorithm that CORB uses to say with high confidence that the response really contains a html / xml / json document. This sniffing differs slightly from the sniffing algorithms in the mimesniff.spec, because of the need to avoid accidentally sniffing JavaScript (allowed in cross-origin responses) as HTML (blocked by CORB in cross-origin responses).

Q: Is description of the sniffing algorithm the main/only blocker for implementing CORB in other browsers?

AFAIR, I've tried to argue that even if differences in sniffing implementations would not be (*) observable by web contents (assuming that the sniffing correctly classified a response as html/xml/json only if the response really is html/xml/json and not one of cross-origin-allowed types like javascript or css). This led me to further argue that sniffing shouldn't be described in a normative part of a spec (but possibly still described in a non-normative spec section or in document). So - I think describing the Chromium's CORB sniffing algorithm in the CORB explainer might be a good first step here. WDYT?

Q: WDYT? Where should the sniffing algorighm's description go (in the short term and in the long term)?

(*) OTOH, maybe the presence of wpt/fetch/corb/script-html-js-polyglot.sub.html test is a counter-example here - incorrect sniffing can lead to observable/incorrect behavior that this test is supposed to catch.

@annevk
Copy link
Member

annevk commented Sep 25, 2018

My understanding from Firefox is that a complete description of CORB would help, for implementation, for analysis, and for looking at potential further expansion.

Having a non-normative description first would be a good first step. I wonder if https://mimesniff.spec.whatwg.org/ might be a good long term place. I see some potential for sharing there. E.g., if we detect a PDF, ZIP, or RAR resource we could also deny access straight away.

And unless we expect CPU architecture to fix Spectre within the next five-ten years, I think we need a normative definition as well, as it defines the effective security boundary and it's good to be as clear and accurate about that as possible.

@harisbaig100

This comment has been minimized.

@ziyadparekh
Copy link

I just ran into an issue with CORB using the fetch api. I understand the security implications of blocking third party extensions/javascript from reading sensitive mime types when coming in to the client's browser. My questions are:

  1. If we shift the request server side and then send the response back to the client via the same origin, how does that stop third party js from intercepting and reading the response?

  2. How do ad scripts still load json/html/js on pages even though they are obviously cross origin requests?

Would appreciate any help in shining light on these

@anforowicz
Copy link
Contributor Author

If we shift the request server side and then send the response back to the client via the same origin, how does that stop third party js from intercepting and reading the response?

I am not sure if I understand the scenario above (e.g. I don't understand what is meant by "shift the request server side" and "[have the server] send the response back to the client via the same origin"). Do you mean request initiated from https://bar.com to https://foo.com/secret.json and foo.com server redirecting to https://bar.com/secret.json?

At any rate, CORB is a client-side security feature and it can't protect against information disclosure problems on the server side.

How do ad scripts still load json/html/js on pages even though they are obviously cross origin requests?

CORB only blocks responses that cannot possibly be included in <img> or <script> or similar legacy tags - therefore CORB would not block Javascript / js (unless it is served with a wrong Content-Type).

CORB would block json and/or html, but these only make sense in responses to fetch/XHR (and so would also be blocked by CORS). If ad scripts depend on cross-origin responses then either the responses are allowed by CORS (and so are not blocked by CORB) or the ad scripts are already broken (because the responses are blocked by CORS even before CORB looks at them).

@ziyadparekh
Copy link

To the first point, if a browser running on https://foo.com is making a cross origin request to https://bar.com, (and bar.com doesn't have acces-control-origin-allow: * headers) that request, would be blocked by cors and subsequently by corb if the response is of type json/html (right?). If https://foo.com makes a request to https://foo.com/api/resource, and foo.com proxies that request to https://bar.com, sending the response back, cors and corb would not block the response (right?) making the response available to read by third party javascript?

So I guess ad scripts/tags have set the Access-Control-Origin-Header on their side and therefore are not blocked by Corb or Cors?

@anforowicz
Copy link
Contributor Author

I don't know what "foo.com proxies that request to https://bar.com". If foo.com server trusts bar.com then it can share its data with bar.com (via ftp / http-or-rest / phone calls/etc.). However, because of CORB the browser won't share foo.com's data with bar.com.

So I guess ad scripts/tags have set the Access-Control-Origin-Header on their side and therefore are not blocked by Corb or Cors?

If an ad script wants to read cross-origin data from foo.com, then foo.com (not the ad script) has to agree to giving the data to the ad (by sending back appropriate CORS headers in the http response).

@ziyadparekh
Copy link

Sorry if i'm not explaining it correctly.

Lets say you load visa.com in your browser and it loads up the front-end app. Now the front-end app needs to make a request to (for example) https://secure.com which returns text/html (to load a secure form for instance) If https://secure.com doesnt have the access-control-origin-header set on the response it will be blocked by cors and subsequently corb (right?)

My question is, what is the best practice to show the html returned from https://secure.com to the user? Should visa.com send a request to visa.com backend which would then request https://secure.com server side and then return the html to be shown to the user?

Or is there another best practice to achieve this?

@anforowicz
Copy link
Contributor Author

One way to embed a secure form from https://secure.com in a document from https://visa.com is by using iframes.

@csreis
Copy link

csreis commented Dec 8, 2018

As @anforowicz mentions, CORS and CORB apply to a document's subresource requests, but not to iframes. In your example, visa.com can load https://secure.com's text/html response in an iframe without being blocked by either CORS or CORB. It cannot use fetch or XHR to get https://secure.com's text/html response without an Access-Control-Allow-Origin header. Also, if foo.com tried to request the URL via an img or script tag, CORB would filter the response (though it wouldn't have been usable in those contexts anyway).

As for the proxying question, foo.com could indeed proxy data from bar.com, but this isn't a security risk to the user because the request to bar.com won't have the user's cookies or other credentials if it's being made from foo.com's server. There's no need to use this proxying for iframes. (Most ads load in iframes, giving them access to whatever data they need from their own origin.)

Hope that clarifies things.

@annevk
Copy link
Member

annevk commented Dec 8, 2018

https://annevankesteren.nl/2015/02/same-origin-policy might help here. In particular, note that secure.com might only be available on the user's local network, so you couldn't proxy the request.

@M0n0kr
Copy link

M0n0kr commented Jan 8, 2020

I'm just an end-user but I wanted to inform you that your system has blocked a very simple link on a small-town news site, a link to police reports that would help end a debate that is fanning a fire surrounding a protest at a local police station. It may have nothing to do with you and it may seem like one case has nothing to do with the other, but in small towns like ours, a shooting and an elderly woman in jail ...well all roads (and all fingers) point back to the police station. https://www.koamnewsnow.com/additional-reports-detail-investigation-into-well-being-of-man-months-before-he-was-found-in-dead/
People here have little to do but follow things like this, so when the only evidence we have disappears because Chromium decides it should... small town talk thinks it's a conspiracy, silly I know.
Please help stop violence before boredom helps escalate it.

image

@Malvoz
Copy link

Malvoz commented Jan 8, 2020

@mingcatsandra the issue you're experiencing is not related to CORB. And it also isn't a link, it's a simple image, which is nonexistant. Use the website's contact information to inform the owners if you want them to address an issue.

@M0n0kr
Copy link

M0n0kr commented Jan 19, 2020

Actually its a non-existent image NOW. Before the previous update of Google Chrome it was a very long, VERY detailed series of police reports that took me and my fiance an hour and a half to get through. My fiance is an ex-law enforcement officer for that city so he was able to help me understand the lingo. Sorry it took so long to get back to you but I was very thoroughly hacked directly after making my first comment here and have just regained access to my account.

@annevk
Copy link
Member

annevk commented May 17, 2022

I suggest we close this when #1441 lands.

@annevk annevk closed this as completed in 78f9bdd May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security/privacy There are security or privacy implications topic: orb
Development

No branches or pull requests