Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redact location.ancestorOrigins according to Referrer Policy #1918

Open
hillbrad opened this issue Oct 17, 2016 · 46 comments · May be fixed by #2480
Open

redact location.ancestorOrigins according to Referrer Policy #1918

hillbrad opened this issue Oct 17, 2016 · 46 comments · May be fixed by #2480
Assignees
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change security/privacy There are security or privacy implications topic: location

Comments

@hillbrad
Copy link

@bzbarsky @dakami and I had a hallway discussion at the end of TPAC about the possibility of adding location.ancestorOrigins to Firefox. bz has had longstanding concerns about the information this leaks to child frames. We arrived at a local consensus that any leakage is roughly equivalent to what happens already with referrer, so it would make sense to redact ancestorOrigins according to referrer policy. (and this could resolve that objection to a Mozilla implementation of ancestorOrigins)

/cc @smaug---- @annevk

@domenic
Copy link
Member

domenic commented Oct 17, 2016

One big question, which I asked in the PR, is what does "redact" mean. Since it's an origin instead of a URL, several of the referrer policies don't really apply (e.g. maybe they're no-ops). If it gets censored completely (e.g. if the referrer policy is "no-referrer"), then does the resulting array contain null? The empty string? Or is that entry just missing, so that the number of entries in the array is less than the number of ancestor browsing contexts? We'll need a comprehensive spec for (origin, referrer policy) -> censored origin.

Otherwise, I think we'd need to get a sense of what other user agents besides Firefox would be interested in this spec change. I guess only Chrome implements both referrer policy and ancestorOrigins, so... @mikewest, perhaps?

As for WebKit and Edge, which don't implement referrer policy but do implement ancestorOrigins: does this sound reasonable to you, as something you would do if/when you eventually implemented referrer policy? Leaving aside any commitments to implementing referrer policy. Tagging the usual suspects... @cdumez @travisleithead. Please route to more appropriate people as necessary.

@domenic domenic added normative change needs implementer interest Moving the issue forward requires implementers to express interest security/privacy There are security or privacy implications labels Oct 17, 2016
@bzbarsky
Copy link
Contributor

The idea is that if the referrer policy allows the origin to leak out via the referrer (which I believe all policies except "no-referrer" do) then we should just go ahead and return the origin in ancestorOrigins. So this is really about the "no-referrer" case, plus any browser configuration that has equivalent effects.

As for what value should be used in the "no-referrer" case, I don't have a strong opinion. Obvious options are "", null, "null" (this last as if the actual origin were a unique origin). Using "null" feels somewhat nice to me in that it's a situation that could arise even without the referrer policy business, so pages should be ready for it anyway. Using null would worry me in terms of pages getting exceptions when trying to string-manipulate the array entries.

@hillbrad
Copy link
Author

I should write some test cases, but isn't the null case already possible today with GUID URL schemes? (data:, file:, etc.) And implicitly handled, as with CORS, by serializing to the string literal "null" according to RFC6454?

@domenic
Copy link
Member

domenic commented Oct 17, 2016

"null" sounds pretty good. (And it's according to the Unicode serialization of an origin, not some RFC ;).) But yeah, the PR as written just asks for the origin of the URL no-referrer, so we gotta straighten that out.

@hillbrad
Copy link
Author

Well, this could be defined as basically a switch on the referrer policy states (which might be the most logical internal implementation choice), but I thought that calling out to the algorithm to produce a referrer and then extracting the origin via URL parsing would be more future compatible with new policy states that might be defined. I can revisit if that seems preferable.

@domenic
Copy link
Member

domenic commented Oct 17, 2016

IMO a switch makes the most sense, but adding it to the Referrer Policy spec would be best, since that ensures that whenever they add new policies they'll see that they need to update that algorithm as well.

@bzbarsky
Copy link
Contributor

The referrer may or may not be related to the origin in general (e.g. for a sandboxed iframe the referrer is based on its URL but the origin a unique origin). So going via some sort of "extract the referrer" algorithm to get a value to use in ancestorOrigins as is done in this PR isn't right.

@hillbrad
Copy link
Author

Take a look at: w3c/webappsec-referrer-policy#77 ?

@bzbarsky
Copy link
Contributor

One thing that I'd like to check on, actually. What should happen if a page at origin A loads a subframe from origin A which then loads a page from origin B, if the original page is sending full referrers but the subframe is using the no-referrer policy?

@hillbrad
Copy link
Author

I haven't spec'd it as a barrier or ratchet, but an individual query from a
Location, to each ancestor, independent of any intermediate contexts and
their policy states.

On Tue, Oct 18, 2016 at 6:00 PM Boris Zbarsky [email protected]
wrote:

One thing that I'd like to check on, actually. What should happen if a
page at origin A loads a subframe from origin A which then loads a page
from origin B, if the original page is sending full referrers but the
subframe is using the no-referrer policy?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACFbcC4lLs5fwF7n6Tv4A1Aes0OY-y36ks5q1WvHgaJpZM4KZEkb
.

@bzbarsky
Copy link
Contributor

OK, but that will leak the origin of the topmost page in this case, when it should be able to have a reasonable expectation of no such leakage occurring, right?

@hillbrad
Copy link
Author

hillbrad commented Oct 19, 2016

Is that a reasonable expectation? Or should it set its own policy if it is
concerned? Specifying a ratchet is much more difficult, btw, as the
referrer policy options don't have a strict ordering.

@bzbarsky
Copy link
Contributor

Is that a reasonable expectation?

As long as it's only loading things it controls, I think it is, yes. This way the decision as to whether to allow the origin to escape only has to be made in the page that actually loads cross-site things.

Specifying a ratchet is much more difficult, btw

I'm not sure what you mean by "ratchet" here, but two simple things to specify would be that once you hit no-referrer you either insert a single "null" and terminate or insert "null" for everything else up the frame chain. This isn't as nice as doing more complicated checks about same-originness, I agree.

@bzbarsky
Copy link
Contributor

bzbarsky commented Jan 10, 2017

Note the more clearly articulated proposal I made for this in w3c/webappsec-referrer-policy#77 (comment). I thought @hillbrad was going to convert that to an HTML spec issue, but that didn't seem to happen...

Anyway, I would love feedback from Blink and WebKit on whether the change I propose is something they would implement, and feedback from Edge on whether they're interested in implementing this at all, and if so under what conditions.

@annevk
Copy link
Member

annevk commented Feb 20, 2017

Copying @RByers, @cdumez, @travisleithead to get input from Blink, WebKit, and Edge. Would be nice to make some progress here.

@foolip
Copy link
Member

foolip commented Mar 7, 2017

For Blink, perhaps @dominiccooney or @mikewest could comment?

@mikewest
Copy link
Member

mikewest commented Mar 7, 2017

@jeisinger and @estark37 are Blink's referrer policy folks, and will likely have opinions.

@jeisinger
Copy link
Member

What I like about @bzbarsky's proposal is that it only indirectly uses referrer policy - referrer policy ideally should only affect the referrer. Of course using the referrer afterwards for whatever is fine.

I think we'd implement this if that means that Firefox will ship ancestorOrigins, and the API is still good enough to achieve the kind of protection @hillbrad et al need

annevk added a commit that referenced this issue Mar 29, 2017
domenic pushed a commit that referenced this issue Mar 29, 2017
annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 21, 2017
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.
annevk added a commit to web-platform-tests/wpt that referenced this issue May 16, 2017
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.
annevk added a commit that referenced this issue Feb 4, 2018
Also rewrite the algorithm to avoid loops and use variables correctly.

Tests: web-platform-tests/wpt#5402.

Fixes #1918.
annevk added a commit that referenced this issue Feb 5, 2018
Also rewrite the algorithm to avoid loops and use variables correctly.

Tests: web-platform-tests/wpt#5402.

Fixes #1918.
annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 5, 2018
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.
@jeffreytgilbert
Copy link

jeffreytgilbert commented Aug 29, 2018

For what it's worth, the idea to respect a referrer policy set by the domains in the ancestry chain is great, but neither ancestorOrigins nor the requested change go far enough in either direction. A full URL should be available in ancestorOrigins because domain on its own is no more or less secure because information about a person can be groked by domain + some number of other data points, so truncating it doesn't make much sense for user privacy concerns if we're being strict here. Conversely, a domain (cnn.com) may be considered ok, but a page on that domain (cnn.com/vegas-shooting-kills-dozens-etc) may be considered not ok given a specific context.

On the other hand, the user also has not and cannot indicate via referrer policy set by the middle men that it doesn't want to leak information about the ancestor chain, and that begs the question, should there be user level controls for turning this information flow on or off.

In my opinion, this requires a multi-part solution where the user has the ability to turn off a behavior, as do sites(content providers) who manage relationships between one another, but the location.href chain should be opened up fully where no restrictions are explicitly called for. The primary case FOR doing this from a supply chain perspective is being assured the message and markup you're delivering is not being framed in an inappropriate context. Advertisers, for instance, may have strict policies against placing their brand next to content related to pornography or extreme violence for instance. This information, when locked away through cross origin chains of iframes, becomes unknowable.

On the other hand, if a user jumps into "in private" mode and disables this information from leaking to chains of iframes, a disabled chain of unknowable origins should be enough information for an advertiser to use as an indicator that maybe the risk isn't worth the buy opportunity, and the end users experience and privacy is preserved.

alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
@dliebner
Copy link

The current webkit implementation is helpful to ad tech as it helps determine the validity of the embed. It's possible for an advertisement to be chained from the original site through multiple intermediary iframes before finally rendering the bottom level ad content - this is normal, if an ad request is going through multiple ad networks before finally arriving on a served ad. What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

@bzbarsky
Copy link
Contributor

Sure, and ad tech could just treat "no available ancestorOrigins" as "bad actor" for its purposes. Then sites can decide whether they want to leak their origin to their subframes (and allow ad tech in there) or not, right?

@dliebner
Copy link

I'm a little confused by the attitude that a parent frame should remain anonymous to its subframes. If a site is being embedded by another site, don't they deserve to know by who? In what legitimate scenario does a site embed an iframe (or a chain of iframes) and need to be anonymous?

@annevk
Copy link
Member

annevk commented May 14, 2019

As a reminder, there's a HTML PR for this at #2480 and a WPT PR at web-platform-tests/wpt#5402.

@othermaciej @johnwilander I suspect Safari picking this up would make it more likely for Firefox to ship this too (it currently does not expose this attribute at all).

@bzbarsky
Copy link
Contributor

If a site is being embedded by another site, don't they deserve to know by who?

Imo, no. If it doesn't want to be framed, it has ways to avoid being framed, yes?

My usual go-to example here is that imo a site should be able to embed a video from a video hosting site without exposing information about itself to a video hosting site. Under the assumption that the video hosting site allows such framing, of course.

@opyh
Copy link

opyh commented May 21, 2019

What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

A person who visits a political or health blog doesn't want these URLs to be shared with giphy, facebook, and every adtech company on the planet.

While it's understandable that adtech companies want to know my political views and if I have cancer or not (and as a side effect, can prevent ad fraud more easily), as a user I'd like to have a choice if my browser sends this very personal information. Embed providers are not entitled to it. They should be able to choose who can embed them (possible with frame-ancestors), and users should be able to choose whom they want to share information with.

@dliebner
Copy link

My counter point is that blocking-by-default will effectively block the majority of ancestor data to ad tech because you can't expect developers to go out of their way to add/enable allow-policies. From the ad tech point of view, if you can't reliably see the ancestors, you can't reliably detect fraud.

With regard to your privacy concerns, 1) Not all ad tech companies are interested in invading your privacy (although sure probably most are) and 2) If that's something you're worried about, ad block is fairly effective and 3) If the sites you're visiting are of a sensitive nature and are embedding advertisements and you're concerned about your privacy, perhaps you should be evaluating those sites and their choice of ad partners.

I am someone who is building an ad tech company who is not interested in tracking individual users, and I need tools to detect, prevent and deter ad fraud.

@opyh
Copy link

opyh commented May 22, 2019

I have worked in adtech myself, on several sides of the ecosystem – adtech developers are used to much more painful things than adding allow policies to websites ;) So you can expect developers to do this.

You can’t demand from a normal person using a browser to know what's going on behind the scenes. If I, as a software developer, have no means to see which health site tracks me and which doesn’t, how is a non-IT person supposed to understand this?

It's the standard’s job to help creating browsers that protect me from bad actors. No matter if I have an ad blocker or not.

If ad fraud can't be detected without complete surveillance, so be it? The ad industry is free to adapt business models that don’t simplify privacy fraud. If a user explicitly wants to be tracked in exchange for freebies, they'd still be free to configure their browser accordingly.

Thanks for your counter arguments – I'm out of this discussion, and I hope that this issue can be solved in a way that doesn't hand my browser history over to random companies as a default.

@michael-oneill
Copy link

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

@dliebner
Copy link

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

That is useful, but the issue I'm talking about is running ads that are supposed to only be served on one site and running them on another site. The people seeing the ads will be legitimate users, but how will the ad tech know if the ads are being served on the intended site without the ancestor list?

@michael-oneill
Copy link

In this proposal the browser will determine if they are being shown on the intended site, the ad tech only gets metrics from the Metrics Server e.g. Neilson or similar. Anything invalid gets ignored.

@SamB
Copy link

SamB commented Jul 28, 2019

Browsers can determine if the user is a bot or not

... but wouldn't bots just use lying browsers?

@dliebner
Copy link

It's not so much about detecting bots as it is about preventing malicious publishers from sending spoofed data via real users.

@jeffreytgilbert
Copy link

Problem statement:
A full URL and the chain of domains can be read from within an iframe in a cross domain context via javascript. When used in conjunction with long-lived stable identifiers, behavioral information can be inferred and associated with the user identifier and deep behavioral profiles can be stored and resold over private data marketplaces unbeknownst to the user.

This is the default case. The top level site does not have the ability to control this behavior.

Proposed solution:
Allow control of ancestorOrigin and referrer data by applying the Referrer Policy header/attribute to ancestorOrigin API. The default behavior, if no referrer policy is specified, is the same as historical ancestorOrigin behavior, which has an ordered list of domains.

Regarding privacy, here’s my best semi-complete list for @dliebner and @opyh.

A user should be able to opt into advertising and tracking for ad supported publisher content. For example:

  • Opt-Out (Truste, NAI, etc)
  • Opt-In (GDPR, CCPA, etc)
  • Apple's ITP
  • Do Not Track - DNT flag
  • Limit Ad Tracking - LMT flag
  • Content Blockers / Ad Blockers
  • In Private modes
  • Cross Site Tracking prevention
  • Block All Cookies settings

A site (aka publisher) should expect to be able to restrict page content, including cross domain content such as ads, to appropriate usage. For example:

  • Content Security Policy headers (CSP)
  • IFrame sandbox attributes (akin to CSP)
  • Cross domain iframe sources
  • Referrer policy (via header or iframe attribute)
  • Feature policy (via header or iframe attribute)
  • SameSite cookie classification/restriction
  • Secure cookies
  • Ads.txt
  • Ads.cert

It's probable that I missed some things here.

Good news, bad news is… OpenRTB 3.0 has a possible solution using blockchain like signed ledgers to show the chain of changes to a bid request. The problem is, the adoption rate for OpenRTB is not fast. It's a big change and it's making some big assumptions about publishers, exchanges and networks willingness to adopt the new complexity and cost associated with implementing it. The biggest benefits are for adopters of header bidding. The biggest losers in this are probably ad networks, which is likely why there is a real reluctance to adopt this version. They married a good tasting thing with a bad tasting thing.

You can read more on the certificate chain here: What is ads.cert?

@opyh has some valid points related to not leaking the full browsing history of the user to advertisers. @dliebner also has valid points related to a trustworthy supply chain free of fraudulent publisher and exchange practices. My earlier comment is probably closer to an additional feature request for user level controls since this ticket addresses publisher level controls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change security/privacy There are security or privacy implications topic: location
Development

Successfully merging a pull request may close this issue.