redact location.ancestorOrigins according to Referrer Policy #1918

hillbrad · 2016-10-17T20:32:47Z

@bzbarsky @dakami and I had a hallway discussion at the end of TPAC about the possibility of adding location.ancestorOrigins to Firefox. bz has had longstanding concerns about the information this leaks to child frames. We arrived at a local consensus that any leakage is roughly equivalent to what happens already with referrer, so it would make sense to redact ancestorOrigins according to referrer policy. (and this could resolve that objection to a Mozilla implementation of ancestorOrigins)

/cc @smaug---- @annevk

domenic · 2016-10-17T21:06:04Z

One big question, which I asked in the PR, is what does "redact" mean. Since it's an origin instead of a URL, several of the referrer policies don't really apply (e.g. maybe they're no-ops). If it gets censored completely (e.g. if the referrer policy is "no-referrer"), then does the resulting array contain null? The empty string? Or is that entry just missing, so that the number of entries in the array is less than the number of ancestor browsing contexts? We'll need a comprehensive spec for (origin, referrer policy) -> censored origin.

Otherwise, I think we'd need to get a sense of what other user agents besides Firefox would be interested in this spec change. I guess only Chrome implements both referrer policy and ancestorOrigins, so... @mikewest, perhaps?

As for WebKit and Edge, which don't implement referrer policy but do implement ancestorOrigins: does this sound reasonable to you, as something you would do if/when you eventually implemented referrer policy? Leaving aside any commitments to implementing referrer policy. Tagging the usual suspects... @cdumez @travisleithead. Please route to more appropriate people as necessary.

bzbarsky · 2016-10-17T21:12:32Z

The idea is that if the referrer policy allows the origin to leak out via the referrer (which I believe all policies except "no-referrer" do) then we should just go ahead and return the origin in ancestorOrigins. So this is really about the "no-referrer" case, plus any browser configuration that has equivalent effects.

As for what value should be used in the "no-referrer" case, I don't have a strong opinion. Obvious options are "", null, "null" (this last as if the actual origin were a unique origin). Using "null" feels somewhat nice to me in that it's a situation that could arise even without the referrer policy business, so pages should be ready for it anyway. Using null would worry me in terms of pages getting exceptions when trying to string-manipulate the array entries.

hillbrad · 2016-10-17T21:14:48Z

I should write some test cases, but isn't the null case already possible today with GUID URL schemes? (data:, file:, etc.) And implicitly handled, as with CORS, by serializing to the string literal "null" according to RFC6454?

domenic · 2016-10-17T21:16:39Z

"null" sounds pretty good. (And it's according to the Unicode serialization of an origin, not some RFC ;).) But yeah, the PR as written just asks for the origin of the URL no-referrer, so we gotta straighten that out.

hillbrad · 2016-10-17T21:17:06Z

Well, this could be defined as basically a switch on the referrer policy states (which might be the most logical internal implementation choice), but I thought that calling out to the algorithm to produce a referrer and then extracting the origin via URL parsing would be more future compatible with new policy states that might be defined. I can revisit if that seems preferable.

domenic · 2016-10-17T21:18:29Z

IMO a switch makes the most sense, but adding it to the Referrer Policy spec would be best, since that ensures that whenever they add new policies they'll see that they need to update that algorithm as well.

bzbarsky · 2016-10-17T21:31:23Z

The referrer may or may not be related to the origin in general (e.g. for a sandboxed iframe the referrer is based on its URL but the origin a unique origin). So going via some sort of "extract the referrer" algorithm to get a value to use in ancestorOrigins as is done in this PR isn't right.

hillbrad · 2016-10-18T00:54:40Z

Take a look at: w3c/webappsec-referrer-policy#77 ?

bzbarsky · 2016-10-19T01:00:48Z

One thing that I'd like to check on, actually. What should happen if a page at origin A loads a subframe from origin A which then loads a page from origin B, if the original page is sending full referrers but the subframe is using the no-referrer policy?

hillbrad · 2016-10-19T01:26:32Z

I haven't spec'd it as a barrier or ratchet, but an individual query from a
Location, to each ancestor, independent of any intermediate contexts and
their policy states.

On Tue, Oct 18, 2016 at 6:00 PM Boris Zbarsky [email protected]
wrote:

One thing that I'd like to check on, actually. What should happen if a
page at origin A loads a subframe from origin A which then loads a page
from origin B, if the original page is sending full referrers but the
subframe is using the no-referrer policy?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACFbcC4lLs5fwF7n6Tv4A1Aes0OY-y36ks5q1WvHgaJpZM4KZEkb
.

bzbarsky · 2016-10-19T01:32:55Z

OK, but that will leak the origin of the topmost page in this case, when it should be able to have a reasonable expectation of no such leakage occurring, right?

hillbrad · 2016-10-19T03:35:33Z

Is that a reasonable expectation? Or should it set its own policy if it is
concerned? Specifying a ratchet is much more difficult, btw, as the
referrer policy options don't have a strict ordering.

bzbarsky · 2016-10-19T04:02:57Z

Is that a reasonable expectation?

As long as it's only loading things it controls, I think it is, yes. This way the decision as to whether to allow the origin to escape only has to be made in the page that actually loads cross-site things.

Specifying a ratchet is much more difficult, btw

I'm not sure what you mean by "ratchet" here, but two simple things to specify would be that once you hit no-referrer you either insert a single "null" and terminate or insert "null" for everything else up the frame chain. This isn't as nice as doing more complicated checks about same-originness, I agree.

bzbarsky · 2017-01-10T04:50:38Z

Note the more clearly articulated proposal I made for this in w3c/webappsec-referrer-policy#77 (comment). I thought @hillbrad was going to convert that to an HTML spec issue, but that didn't seem to happen...

Anyway, I would love feedback from Blink and WebKit on whether the change I propose is something they would implement, and feedback from Edge on whether they're interested in implementing this at all, and if so under what conditions.

See #1918 for context.

annevk · 2017-02-20T12:38:39Z

Copying @RByers, @cdumez, @travisleithead to get input from Blink, WebKit, and Edge. Would be nice to make some progress here.

foolip · 2017-03-07T04:46:03Z

For Blink, perhaps @dominiccooney or @mikewest could comment?

mikewest · 2017-03-07T09:48:22Z

@jeisinger and @estark37 are Blink's referrer policy folks, and will likely have opinions.

jeisinger · 2017-03-07T12:57:33Z

What I like about @bzbarsky's proposal is that it only indirectly uses referrer policy - referrer policy ideally should only affect the referrer. Of course using the referrer afterwards for whatever is fine.

I think we'd implement this if that means that Firefox will ship ancestorOrigins, and the API is still good enough to achieve the kind of protection @hillbrad et al need

See #1918 for context.

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

jeffreytgilbert · 2018-08-29T22:34:50Z

For what it's worth, the idea to respect a referrer policy set by the domains in the ancestry chain is great, but neither ancestorOrigins nor the requested change go far enough in either direction. A full URL should be available in ancestorOrigins because domain on its own is no more or less secure because information about a person can be groked by domain + some number of other data points, so truncating it doesn't make much sense for user privacy concerns if we're being strict here. Conversely, a domain (cnn.com) may be considered ok, but a page on that domain (cnn.com/vegas-shooting-kills-dozens-etc) may be considered not ok given a specific context.

On the other hand, the user also has not and cannot indicate via referrer policy set by the middle men that it doesn't want to leak information about the ancestor chain, and that begs the question, should there be user level controls for turning this information flow on or off.

In my opinion, this requires a multi-part solution where the user has the ability to turn off a behavior, as do sites(content providers) who manage relationships between one another, but the location.href chain should be opened up fully where no restrictions are explicitly called for. The primary case FOR doing this from a supply chain perspective is being assured the message and markup you're delivering is not being framed in an inappropriate context. Advertisers, for instance, may have strict policies against placing their brand next to content related to pornography or extreme violence for instance. This information, when locked away through cross origin chains of iframes, becomes unknowable.

On the other hand, if a user jumps into "in private" mode and disables this information from leaking to chains of iframes, a disabled chain of unknowable origins should be enough information for an advertiser to use as an indicator that maybe the risk isn't worth the buy opportunity, and the end users experience and privacy is preserved.

See whatwg#1918 for context.

dliebner · 2019-05-14T01:05:09Z

The current webkit implementation is helpful to ad tech as it helps determine the validity of the embed. It's possible for an advertisement to be chained from the original site through multiple intermediary iframes before finally rendering the bottom level ad content - this is normal, if an ad request is going through multiple ad networks before finally arriving on a served ad. What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

bzbarsky · 2019-05-14T04:01:39Z

Sure, and ad tech could just treat "no available ancestorOrigins" as "bad actor" for its purposes. Then sites can decide whether they want to leak their origin to their subframes (and allow ad tech in there) or not, right?

dliebner · 2019-05-14T10:18:51Z

I'm a little confused by the attitude that a parent frame should remain anonymous to its subframes. If a site is being embedded by another site, don't they deserve to know by who? In what legitimate scenario does a site embed an iframe (or a chain of iframes) and need to be anonymous?

annevk · 2019-05-14T11:06:29Z

As a reminder, there's a HTML PR for this at #2480 and a WPT PR at web-platform-tests/wpt#5402.

@othermaciej @johnwilander I suspect Safari picking this up would make it more likely for Firefox to ship this too (it currently does not expose this attribute at all).

bzbarsky · 2019-05-14T11:52:58Z

If a site is being embedded by another site, don't they deserve to know by who?

Imo, no. If it doesn't want to be framed, it has ways to avoid being framed, yes?

My usual go-to example here is that imo a site should be able to embed a video from a video hosting site without exposing information about itself to a video hosting site. Under the assumption that the video hosting site allows such framing, of course.

opyh · 2019-05-21T15:01:41Z

What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

A person who visits a political or health blog doesn't want these URLs to be shared with giphy, facebook, and every adtech company on the planet.

While it's understandable that adtech companies want to know my political views and if I have cancer or not (and as a side effect, can prevent ad fraud more easily), as a user I'd like to have a choice if my browser sends this very personal information. Embed providers are not entitled to it. They should be able to choose who can embed them (possible with frame-ancestors), and users should be able to choose whom they want to share information with.

dliebner · 2019-05-21T16:09:20Z

My counter point is that blocking-by-default will effectively block the majority of ancestor data to ad tech because you can't expect developers to go out of their way to add/enable allow-policies. From the ad tech point of view, if you can't reliably see the ancestors, you can't reliably detect fraud.

With regard to your privacy concerns, 1) Not all ad tech companies are interested in invading your privacy (although sure probably most are) and 2) If that's something you're worried about, ad block is fairly effective and 3) If the sites you're visiting are of a sensitive nature and are embedding advertisements and you're concerned about your privacy, perhaps you should be evaluating those sites and their choice of ad partners.

I am someone who is building an ad tech company who is not interested in tracking individual users, and I need tools to detect, prevent and deter ad fraud.

opyh · 2019-05-22T12:47:55Z

I have worked in adtech myself, on several sides of the ecosystem – adtech developers are used to much more painful things than adding allow policies to websites ;) So you can expect developers to do this.

You can’t demand from a normal person using a browser to know what's going on behind the scenes. If I, as a software developer, have no means to see which health site tracks me and which doesn’t, how is a non-IT person supposed to understand this?

It's the standard’s job to help creating browsers that protect me from bad actors. No matter if I have an ad blocker or not.

If ad fraud can't be detected without complete surveillance, so be it? The ad industry is free to adapt business models that don’t simplify privacy fraud. If a user explicitly wants to be tracked in exchange for freebies, they'd still be free to configure their browser accordingly.

Thanks for your counter arguments – I'm out of this discussion, and I hope that this issue can be solved in a way that doesn't hand my browser history over to random companies as a default.

michael-oneill · 2019-05-22T15:54:52Z

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

dliebner · 2019-05-22T16:00:53Z

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

That is useful, but the issue I'm talking about is running ads that are supposed to only be served on one site and running them on another site. The people seeing the ads will be legitimate users, but how will the ad tech know if the ads are being served on the intended site without the ancestor list?

michael-oneill · 2019-05-22T16:19:33Z

In this proposal the browser will determine if they are being shown on the intended site, the ad tech only gets metrics from the Metrics Server e.g. Neilson or similar. Anything invalid gets ignored.

SamB · 2019-07-28T00:40:26Z

Browsers can determine if the user is a bot or not

... but wouldn't bots just use lying browsers?

dliebner · 2019-07-28T05:06:03Z

It's not so much about detecting bots as it is about preventing malicious publishers from sending spoofed data via real users.

jeffreytgilbert · 2019-11-18T18:34:23Z

Problem statement:
A full URL and the chain of domains can be read from within an iframe in a cross domain context via javascript. When used in conjunction with long-lived stable identifiers, behavioral information can be inferred and associated with the user identifier and deep behavioral profiles can be stored and resold over private data marketplaces unbeknownst to the user.

This is the default case. The top level site does not have the ability to control this behavior.

Proposed solution:
Allow control of ancestorOrigin and referrer data by applying the Referrer Policy header/attribute to ancestorOrigin API. The default behavior, if no referrer policy is specified, is the same as historical ancestorOrigin behavior, which has an ordered list of domains.

Regarding privacy, here’s my best semi-complete list for @dliebner and @opyh.

A user should be able to opt into advertising and tracking for ad supported publisher content. For example:

Opt-Out (Truste, NAI, etc)
Opt-In (GDPR, CCPA, etc)
Apple's ITP
Do Not Track - DNT flag
Limit Ad Tracking - LMT flag
Content Blockers / Ad Blockers
In Private modes
Cross Site Tracking prevention
Block All Cookies settings

A site (aka publisher) should expect to be able to restrict page content, including cross domain content such as ads, to appropriate usage. For example:

Content Security Policy headers (CSP)
IFrame sandbox attributes (akin to CSP)
Cross domain iframe sources
Referrer policy (via header or iframe attribute)
Feature policy (via header or iframe attribute)
SameSite cookie classification/restriction
Secure cookies
Ads.txt
Ads.cert

It's probable that I missed some things here.

Good news, bad news is… OpenRTB 3.0 has a possible solution using blockchain like signed ledgers to show the chain of changes to a bid request. The problem is, the adoption rate for OpenRTB is not fast. It's a big change and it's making some big assumptions about publishers, exchanges and networks willingness to adopt the new complexity and cost associated with implementing it. The biggest benefits are for adopters of header bidding. The biggest losers in this are probably ad networks, which is likely why there is a real reluctance to adopt this version. They married a good tasting thing with a bad tasting thing.

You can read more on the certificate chain here: What is ads.cert?

@opyh has some valid points related to not leaking the full browsing history of the user to advertisers. @dliebner also has valid points related to a trustworthy supply chain free of fraudulent publisher and exchange practices. My earlier comment is probably closer to an additional feature request for user level controls since this ticket addresses publisher level controls.

hillbrad mentioned this issue Oct 17, 2016

redact location.ancestorOrigins according to parents referrer policies #1917

Closed

domenic added normative change needs implementer interest Moving the issue forward requires implementers to express interest security/privacy There are security or privacy implications labels Oct 17, 2016

hillbrad mentioned this issue Oct 18, 2016

Redact origin according to policy w3c/webappsec-referrer-policy#77

Closed

annevk added a commit that referenced this issue Jan 10, 2017

Meta: add ancestorOrigins warning due to continued disagreement

769e782

See #1918 for context.

annevk mentioned this issue Jan 10, 2017

Meta: add ancestorOrigins warning due to continued disagreement #2251

Merged

annevk mentioned this issue Jan 24, 2017

redact location.ancestorOrigins according to Referrer Policy w3c/webappsec-referrer-policy#71

Closed

wanderview mentioned this issue Feb 15, 2017

Remove frameType, maybe add ancestorOrigins w3c/ServiceWorker#732

Closed

jungkees mentioned this issue Feb 16, 2017

Track ancestorOrigins privacy issues w3c/ServiceWorker#1075

Open

annevk added a commit that referenced this issue Mar 29, 2017

Meta: add ancestorOrigins warning due to continued disagreement

3c9c428

See #1918 for context.

domenic pushed a commit that referenced this issue Mar 29, 2017

Meta: add ancestorOrigins warning due to continued disagreement

01d3caf

See #1918 for context.

annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 21, 2017

Basic ancestorOrigins test and Location IDL update

5653bfc

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk added a commit to web-platform-tests/wpt that referenced this issue May 16, 2017

ancestorOrigins: add tests and update Location IDL

f0d3e4c

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk added a commit that referenced this issue Feb 4, 2018

Redact ancestorOrigins using document's referrer

ee7a6fb

Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.

annevk added a commit that referenced this issue Feb 5, 2018

Redact ancestorOrigins using document's referrer

a08939f

Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.

annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 5, 2018

ancestorOrigins: add tests and update Location IDL

fb94c7f

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

arturjanc mentioned this issue Apr 17, 2018

Cross-Origin-Resource-Policy (was: From-Origin) whatwg/fetch#687

Closed

TimothyGu added the topic: location label Aug 23, 2018

alice pushed a commit to alice/html that referenced this issue Jan 8, 2019

Meta: add ancestorOrigins warning due to continued disagreement

53c9326

See whatwg#1918 for context.

fmarier mentioned this issue Oct 16, 2023

Prune location.ancestorOrigins entries brave/brave-browser#33671

Open

fmarier mentioned this issue Nov 7, 2023

[hackerone] Remove onion services from window.location.ancestorOrigins brave/brave-browser#32421

Closed

This was referenced Jul 15, 2024

LibWeb: Implement Location.ancestorOrigins LadybirdBrowser/ladybird#623

Open

Upcoming WHATNOT meeting on 2024-07-18 #10471

Closed

keithamus added the agenda+ To be discussed at a triage meeting label Jul 15, 2024

past removed the agenda+ To be discussed at a triage meeting label Jul 18, 2024

cwilso mentioned this issue Jul 31, 2024

Upcoming WHATNOT meeting on 2024-07-25 #10496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redact location.ancestorOrigins according to Referrer Policy #1918

redact location.ancestorOrigins according to Referrer Policy #1918

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

bzbarsky commented Oct 17, 2016

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

bzbarsky commented Oct 17, 2016

hillbrad commented Oct 18, 2016

bzbarsky commented Oct 19, 2016

hillbrad commented Oct 19, 2016

bzbarsky commented Oct 19, 2016

hillbrad commented Oct 19, 2016 •

edited by annevk

Loading

bzbarsky commented Oct 19, 2016

bzbarsky commented Jan 10, 2017 •

edited

Loading

annevk commented Feb 20, 2017

foolip commented Mar 7, 2017

mikewest commented Mar 7, 2017 •

edited

Loading

jeisinger commented Mar 7, 2017

jeffreytgilbert commented Aug 29, 2018 •

edited

Loading

dliebner commented May 14, 2019

bzbarsky commented May 14, 2019

dliebner commented May 14, 2019

annevk commented May 14, 2019

bzbarsky commented May 14, 2019

opyh commented May 21, 2019

dliebner commented May 21, 2019

opyh commented May 22, 2019

michael-oneill commented May 22, 2019

dliebner commented May 22, 2019

michael-oneill commented May 22, 2019

SamB commented Jul 28, 2019

dliebner commented Jul 28, 2019

jeffreytgilbert commented Nov 18, 2019

redact location.ancestorOrigins according to Referrer Policy #1918

redact location.ancestorOrigins according to Referrer Policy #1918

Comments

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

bzbarsky commented Oct 17, 2016

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

hillbrad commented Oct 17, 2016

domenic commented Oct 17, 2016

bzbarsky commented Oct 17, 2016

hillbrad commented Oct 18, 2016

bzbarsky commented Oct 19, 2016

hillbrad commented Oct 19, 2016

bzbarsky commented Oct 19, 2016

hillbrad commented Oct 19, 2016 • edited by annevk Loading

bzbarsky commented Oct 19, 2016

bzbarsky commented Jan 10, 2017 • edited Loading

annevk commented Feb 20, 2017

foolip commented Mar 7, 2017

mikewest commented Mar 7, 2017 • edited Loading

jeisinger commented Mar 7, 2017

jeffreytgilbert commented Aug 29, 2018 • edited Loading

dliebner commented May 14, 2019

bzbarsky commented May 14, 2019

dliebner commented May 14, 2019

annevk commented May 14, 2019

bzbarsky commented May 14, 2019

opyh commented May 21, 2019

dliebner commented May 21, 2019

opyh commented May 22, 2019

michael-oneill commented May 22, 2019

dliebner commented May 22, 2019

michael-oneill commented May 22, 2019

SamB commented Jul 28, 2019

dliebner commented Jul 28, 2019

jeffreytgilbert commented Nov 18, 2019

hillbrad commented Oct 19, 2016 •

edited by annevk

Loading

bzbarsky commented Jan 10, 2017 •

edited

Loading

mikewest commented Mar 7, 2017 •

edited

Loading

jeffreytgilbert commented Aug 29, 2018 •

edited

Loading