Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anti-fraud, ads.txt, and domain blindness #24

Closed
samtingleff opened this issue Apr 14, 2020 · 4 comments
Closed

Anti-fraud, ads.txt, and domain blindness #24

samtingleff opened this issue Apr 14, 2020 · 4 comments

Comments

@samtingleff
Copy link

samtingleff commented Apr 14, 2020

This is touched upon in #12, #19, and #20, however anti-fraud in itself seems important enough to call out very specifically.

The ads.txt standard was designed to solve for a specific form of fraud called "domain spoofing". In this fraud, an individual (somehow) sends bid requests through programmatic (RTB) channels which claim to originate from example.com (a presumably high value domain), yet (a) the traffic is not "legitimate" in that it does not originate from actual human beings browsing the real web site; and (b) the downstream payee of the fraudulent ad impression is unassociated with the business owner of example.com.

Ads.txt solves for this by forcing the publisher to publicly declare their authorized ad platforms. The path example.com/ads.txt is expected to enumerate ALL of the authorized channels for that domain along with the associated publisher ID on each channel. With this information, a buying platform can validate any particular RTB request against this data, rejecting non-matching platforms or publisher ID values.

While there are other forms of ad fraud, this solution has dramatically reduced the volume of domain spoofing. A solution to this seems necessary.

@michaelkleber
Copy link
Collaborator

Great idea, let's walk through how this work in TURTLEDOVE, and see if there's anything additional we need to support. I'm assuming for now that RTB works as fleshed out in #20 (comment). In particular, at pageload time, the DSP will receive an RTB call-out containing contextual and first-party targeting information, but nothing about what interest groups the user is in. The DSP then gets an opportunity to return some signals to the browser, and those signals end up part of the input to that DSP's on-device bidding JS.

There's discussion later in that thread about the extent to which a DSPs' in-browser logic consumes their own signals vs signals from another SSP or ad network that they choose to cooperate with, and that implies trust issues which seem relevant to this discussion too. But let's ignore that bit at least until we understand the DSP's-own-signals version.

(a) the traffic is not "legitimate" in that it does not originate from actual human beings browsing the real web site

This is the subject of the Trust Tokens, specifically the "Extension: Trust-Bound Keypair and Request Signing". In particular, when a DSP receives a signed contextual-targeting-information RTB call-out claiming the targeting URL is on publisher example.com, this will let the DSP cryptographically verify that (1) some particular Trust Token Issuer issuer.com decided this was a trustworthy browser, (2) the browser redeemed a trust token while visiting publisher example.com, and (3) the browser sent the signed ad request while browsing an example.com page.

Crucially, the DSP can be sure of these things even if it doesn't trust the intermediaries who handled the ad request after it left the browser.

Of course this is only valuable if the DSP believes the Trust Token Issuer has good judgement.

(b) the downstream payee of the fraudulent ad impression is unassociated with the business owner of example.com.

When the DSP receives the contextual-targeting-information RTB call-out, it knows the publisher domain and it knows what account it's being asked to pay. Those are the ingredients for running an ad ads.txt check, right? Even though at this point it doesn't know who the advertiser is?

So the signals that the DSP can send back to the browser could indicate that its server-side ads.txt check was successful. Of course the DSP's in-browser JS needs to know that this signal is trustworthy, even though it was transmitted through an untrusted pipe. But we can secure that communications channel — either with regular encryption, or even with a new extension to Trust Tokens ensuring that the DSP's in-browser JS can only decrypt the signals if this is the very same browser holding the private key that signed the outbound request.

Finally, every party's in-browser JS needs a way to report out what happened. The TURTLEDOVE explainer proposes doing this via the aggregation capabilities that we've mostly been talking about in the conversion measurement context. Again we need to prevent fraud in the reporting phase, and there are ongoing conversations about crypto solutions there as well, on both the repos for both the Chrome and Safari proposals.

@samtingleff
Copy link
Author

Note that there are many different use cases contained within "legitimate traffic", only one of which is bots. Another interesting example of that is a (I would call it malicious) browser extension which modifies publisher ad code to financially benefit the operator of the extension (by rewriting ad request seller IDs, for example).

It sounds like the suggestion is for the response to the contextual request to include one bit of information ("this request passed an ads.txt check"), which could be provided on the audience-based request as well. This data would have to somehow guarantee that the domain and the seller ID are the same.

And remember that in our simplest model, the most immediate recipient of an ad request is an SSP, which sends RTB requests to multiple DSPs. Today these parties expect something of a "trust but verify" process and most DSPs would not wish to trust an SSPs declaration that "this request is totally good on ads.txt".

@michaelkleber
Copy link
Collaborator

Note that there are many different use cases contained within "legitimate traffic", only one of which is bots. Another interesting example of that is a (I would call it malicious) browser extension which modifies publisher ad code to financially benefit the operator of the extension (by rewriting ad request seller IDs, for example).

We'll make Trust Tokens robust enough that they can't be stolen by malicious extensions. But if the seller ID were rewritten then the request would fail the ads.txt check, wouldn't it?

It sounds like the suggestion is for the response to the contextual request to include one bit of information ("this request passed an ads.txt check"), which could be provided on the audience-based request as well. This data would have to somehow guarantee that the domain and the seller ID are the same.

Right. The response would indicate what domain and seller ID went through ads.txt validation. And the DSP's on-device bidding JS could make sure that the response really originated from its (the DSP's) server, that it was meant for this specific browser, and that the intended domain was that of the current page.

And remember that in our simplest model, the most immediate recipient of an ad request is an SSP, which sends RTB requests to multiple DSPs. Today these parties expect something of a "trust but verify" process and most DSPs would not wish to trust an SSPs declaration that "this request is totally good on ads.txt".

Hmm, interesting. That lack of trust is why there needs to be an RTB request from the SSP to the DSP, and why the DSP needs to be the one to send the "ads.txt-verified bit" back to the browser. If the DSP were willing to trust the SSP's ads.txt verification and use the SSP's contextual signals in its on-device bidding JS, then we wouldn't need the RTB call-out at contextual-request time.

Maybe we could also engineer a trust-but-verify model here also? In that case the SSP would do ads.txt verification, and would report the domain and seller ID back to the browser. The DSP's on-device JS would learn that ssp-ad-network.com claims publisher example.com gets paid through ssp-ad-network account ID 12345. Then the DSP can trust (bid as if the SSP's claims are true) and also verify (use aggregated reporting so that its server finds out the domain name and ssp-ad-network account ID).

There is some subtlety here, though, because aggregate reporting involves some noise — which could look like fraud, or could be a place where a small amount of fraud could hide. I don't know whether it's well-suited to trust-but-verify accounting fraud prevention.

@WICG WICG deleted a comment from asdfzxh8 Oct 19, 2021
@JensenPaul
Copy link
Collaborator

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants