-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
☂️ Select a fraud prevention mechanism #27
Comments
Ping @michaelkleber @csharrison. |
Thanks @johnwilander for filing this issue! I think your suggested countermeasure (blinded cryptographic proof of trustworthiness) is a very promising one. I too think it makes sense to just use blinded signatures (as opposed to somehow trying to utilize the full Trust Token API). Here is one idea for how the browser could initiate the process: Step 1:The publisher website displays an ad, with an adCampaignID and adDestination attribute. Step 2:The person clicks on the ad. Step 3:The browser generates a secret, random "nonce" that is kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension. Step 4:The browser "blinds" this "nonce" with a blinding function. It generates a random "blinding factor" to do so. I would recommend using a different "blinding factor" each time. As before, this random "blinding factor" should be kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension.
Step 5:The browser sends this "blinded_nonce" to the publisher website, in a request that is tied to the specific click that just happened an instant ago, asking for a signature.
There are two approaches that come to mind here. Either the ad-click could be a redirect that first bounces the user through a "blind signature" step before sending them onwards to the actual adDestination, or the browser could issue an out of band request to the blind signature endpoint. Step 6:The publisher website would receive the request to provide a signature for an ad click that just happened moments ago. This request must contain sufficient information such that the publisher knows which ad-click this request refers to. This is so they:
If no signature has already been given out for this click, the publisher should provide a cryptographic signature corresponding to the adCampaignID and adDestination of that click. There are probably more elegant / advanced ways of doing this (would love help from you cryptographers out there!), but just for simplicity let's assume the publisher generates a unique Key-Value pair for each combination of adCampaignID and adDestination. The publisher signs the blinded-nonce with the appropriate private key.
Step 7:When the browser receives this blind_signature from the publisher, it should validate that this value is actually a legitimate signature, and not somehow being used as a tracking vector. To do this, the browser should download the Public key that corresponds to the given adCampaignID, adDestination pair. The publisher should make these public keys available somehow (perhaps another endpoint?). To ensure the publisher isn't rotating them super quickly (as part of some kind of attempt at tracking) requests for these keys should not have cookies (i.e. anonymous requests) and perhaps they can be cached somewhere outside of the Publishers control (there isn't a good reason for them to be changing often). Our friends at Google have thought more about this problem than I have and I'm sure the folks who worked on Key Transparency have more thoughts. Step 8:With the correct public key available, the browser can quickly check that the signature is valid:
If the signature is not valid, the browser should drop data about this ad click. It should not be eligible for generating a conversion report. If the signature is valid the browser can "unblind" the signature using the random blinding factor from before.
At this point, the browser can dispose of:
None of these will be needed again. The publisher_signature and the nonce should be kept together with the other information about this click requesting attribution (namely the adCampaignID and the adDestination). Step 9:At a much later time, when the anonymous conversion report is generated, the report can contain 2 additional fields: the nonce and the publisher_signature. Neither of these add any additional tracking information. The publisher website has never seen either, and they are un-linkable to the blinded_nonce and the blinded_signature the publisher website saw earlier (this "un-linkability" is the key property of blind-signatures). The publisher can validate that the signature is valid, and could only have been generated by them.
If the signature is invalid, this report is likely fraudulent and should be ignored. If the signature is valid, the publisher knows a few things:
Fraud Prevention guaranteesI think this would make life significantly more difficult for fraudsters. In order to generate fake conversions they would need to do a few things:
The third one is a really nice guarantee (that we got by virtue of using a different Key-Value pair per adCampaignID, adDestination pair). This makes it MUCH harder to mess around with a competitor (you might never get served an ad for a competitor). The publisher can also choose to limit the types of ads it shows to possibly compromised / sketchy looking browsers (e.g. those that click on a LOT of ads...). They could serve a lot of "honey-pot ads" (ads no real person would ever click on, for example ugly ads advertising expensive, low-quality goods, in a language the end user doesn't understand). If the publisher starts seeing conversions on these "honey-pot ads" that's a good sign that at least some of those clients it deems "possibly compromised / sketchy" are indeed engaging in conversion fraud. This is a nice trick because it's very hard for automated scripts to differentiate between real ads and "honey-pot ads". It's hard to write a script that can say: "This ad doesn't seem appealing". All of these ideas are drawn from our "Private Fraud Prevention" repo: https://github.com/siyengar/private-fraud-prevention |
Hi, sorry for delay here. I think integrating blind signatures makes sense for this API. One concern I brought up in WICG/attribution-reporting-api#13 is that some of these techniques require additional thought if we want to add noise to the values we report. |
Given @csharrison's positive comment, I think this looks like a viable way forward.
While "publisher" works in concrete examples, we try to avoid it in general descriptions. The term muddies the waters since many don't think of publishers as engaged in cross-site tracking and the whole thing gets a false aura of benign, happy path behavior. The truth is this web technology will be available to all websites including social networks, search engines, click bait sites, shopping sites, bank sites, splash screens from service providers etc.
This sounds like defense against some threat. Could you elaborate, please?
We don't want to add extra navigations or redirects if we can avoid them so out-of-band sounds best.
If these ad clicks were restricted to first-party links, we'd already be in a good place since the out-of-band request could carry cookies. But we've received the request for third-party serving of ad links in which case cookies are unlikely in browsers with anti tracking measures.
We suggest that the browser requests the ad click source's public key at this point (without cookies), checks the signature, and only sends the report of the signature is valid. This makes it much harder for the ad click source site to personalize the signature for tracking purposes. And if the ad click source site is able to personalize the signature and serve up the personalized public key at this later point, it already has the means of cross-site tracking.
We are considering blinded signatures at both ends — "create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both" — i.e. both at the time of the ad click and at the time of conversion. That way, the resulting report can be validated for both those events. This becomes extra valuable if we do double reporting or optional reporting as explored in #31. |
cc @dvorak42 for visibility in ideas for integrating blind signatures with conversion measurement. One additional point I'd like to make is that many publishers / advertisers may not be sophisticated enough to run their own signature endpoints, so from our perspective it would be important to have the capability to delegate the signing operation to a third party. |
I'm happy to use whatever term you prefer. What do you suggest?
Certainly. Putting myself in the shoes of the attacker, if I am going to successfully forge conversion events, and I need a valid signature to use in each fake conversion report, I will want to extract these signatures as efficiently as possible. If the only way to get a valid signature from the website that displayed the ad is to generate a click on an ad shown there, I will need to scale that operation up somehow. The simplest way is to just click one time and try to generate a large number of signatures out of that one click. If we can cut off this path to scale, that will force the attacker to click on many ads (one click per signature). That will make their fake accounts used to harvest signatures stand out, making them easier to take down. Perhaps a simpler way to think about it is this: The maximum number of fake conversions that can be generated is bounded by the number of signatures given out. For this reason, it is very important to control the rate at which signatures are given out.
There are certainly publishers (like Facebook) who serve their own ads. We could certainly use first-party cookies to validate that this browser did actually click the ad in question, and to validate that we have not already given out a signature for this click. But you are absolutely correct that this is a rather unusual case. The vast majority of publishers rely on ad networks to serve ads for them. The Chrome team is proposing a partitioning of cookies in the future, once third-party cookies are removed; I wonder if we can use these partitioned cookies. If the ad shown on news.example was served by ad-tech.example, it seems that ad-tech.example could simply drop a first-party cookie, scoped to news.example and log that along with any ad-clicks. We just need to find some way of asking ad-tech.example to generate a signature related to this scoped cookie. I'm confident we can find some technical solution that isn't useful for cross-site tracking.
I think we are saying the same thing - which is great! Only one minor difference: I am suggesting that the signature which is generated is somehow bound to the adDestination and adCampaignID (which are already present in the eventual conversion report, so no new information is disclosed). I think we can do this by requesting a public key just as you said, without cookies, and that we can check the signature is valid afterwards. I just want to fetch a signature for that specific combination of adDestination and adCampaignID. This is all about making life harder for fraudsters. As explained later, if we pursue this route, clicking on an ad and obtaining a signature will only allow you to generate a fake conversion for the associated ad. Since the fraudster does not control which ad is served, this provides an extremely effective deterrent. If the signature can be used to fake a conversion for any ad campaign, life is now much easier for fraudsters. They can click on any ad and use it to generate a fake conversion for whichever advertiser they please. So just to recap, the signature should not be personalized, it should be the same public key for all people who click on ads directing to the same destination, with the same campaignID.
I'm glad you're exploring this. I think it would be incredibly valuable if we could find a way to pull this off. Unfortunately, like @csharrison said:
I totally agree with him. I'm not exactly sure what you have in mind here in terms of "delegation", but if we are talking about something along the lines of: "Here is some open-source code. Please deploy it on your web-server, set up to respond to requests to this endpoint. Please integrate it with your conversion firing logic via a local database"... that would be far too heavy of a lift. We have found it very difficult to convince advertisers to prioritize the engineering work required to do much simpler things! I don't have any solutions for this, but I would be happy to brainstorm together on how we might solve this challenge. |
Click source or ad click source is what we use in the proposal. I think that term is free of bias and to the point. I.e., any click source will be able to use this technology.
Aha! You're viewing fraudsters as the attackers while I mostly view trackers as the attackers. Both are useful. Let's make sure to be specific on what attack perspective we're using.
Got it. Make it hard to hoard valid signatures.
It's unlikely that Safari will go back to partitioned cookies since we already shipped and later removed them. I didn't know Chrome's plan of record was to allow third-party cookies in the form of partitioned cookies and I don't know what the other browsers' plans are. I don't think PCM should mandate other means of storage to be available or work in a specific way. Whatever is needed should be spaced here. Perhaps we could use a nonce that is provided in the link meta data and submitted as part of the out-of-band signature request. All we're trying to achieve is tying a click and a request together. No storage should be needed for that.
I don't think we should rely on third-parties storing things in the first-party space. Regardless, as discussed in #7, supporting meta data in links served by third parties does not imply that the eventual conversion report will go to the third-party. Reports will go to first parties. It's their users.
I don't have lack of confidence in what site owners can do and I don't have a "this can only work through third-parties" mentality. Things can change and empowering first party websites is a net benefit for both the web platform and users. Therefore, I'm mostly interested in finding ways for this to work with the first parties in power of data and data use.
I think those are two different conversations. One is about a major website serving ads and the other is about browsers and the web platform. We're building for the future and the future will be different. The ad click source is the first party site where the click happens, not the potential third party serving the ad and the link. We need to think about what this means for these signatures if we allow third parties to serve the links. |
OK, I'll refer to it as the "ad click source" then =)
Good call out! I agree both perspectives are useful. I'll make sure to clarify which one I'm using.
Exactly!
I like this suggestion. It's definitely a lot simpler and cleaner, and it achieves the same goal of ensuring the ad click source can validate that:
I guess we are talking about something like this pseudocode:
Where the ad server should generate a unique nonce per ad tag, and maintain a record of the nonces of ads which were clicked. The browser can just immediately send this "nonce" back to the ad server to request a signature in the "out of band" request, and then throw it away as it is no longer necessary and will certainly not feature in the anonymous conversion report. Is this what you had in mind? I think this would work just fine, it's "a lot simpler and cleaner" and as you say, it doesn't require any storage. |
Yes. We just need to come up with a name for the attribute that's precise, makes sure that it's not perceived as a tracking ID of sorts, and short. 🙂 Then we'd have to think through the privacy and fraud implications of it. Would it withstand a fraudster's automation? Do you consider malicious browsers in your threat model? A fraudster can download an open source engine, change it, and run from there. |
Sounds good. I don't see any tracking risk here, so long as the browser disposes of this nonce immediately after sending the "out of band" request. It's just being sent back to the same website that generated it immediately after a first-party interaction. I'm not picky about the name =).
Excellent question. Yes, I absolutely consider malicious browsers in my threat model. Even more concerning (and common) are malicious browser extensions. We see a lot of fraudulent activity coming from malicious Chrome extensions in particular. Thinking through the fraudster's likely attack scenario, I assume they will first try to distribute malicious browser extensions that scrape the page content to extract legitimate values for "nonce". They could attempt to forge "out of band" requests to try to hoard signatures. Fortunately, if those ads had not been clicked we could refuse to return a signature. That way they would also have to simulate an ad click. This would at least limit the rate at which signatures could be generated. We could try to detect compromised accounts by looking for people who seem to click ads way more frequently than the norm, but this is just a heuristic. We could serve honeypot ads, and use this to figure out which users have compromised browsers, but this would not catch all of the abuse. Ideally I would like to make it impossible for a browser extension to perform this type of an attack. Do you think we could hide this "nonce" from browser extensions somehow? If we could make it impossible to harvest "nonce" values with browser extensions, the next thing the fraudsters would try would be to just randomly generate values for "nonce" and hope to occasionally get one right. We can mitigate this attack by using really large values for "nonce", say 64 bit values (or more). If the ad-click-source generates values for "nonce" randomly, this would make it incredibly unlikely that the fraudster would ever generate a valid value, much less one that had actually been clicked, and for which a signature had not yet been generated. If the "out of band" request were to come within a few seconds of a click that would further shrink the window of opportunity. Assuming we successfully mitigate the "random guess" attack, then the fraudsters would move on to malicious browsers. The most difficult thing about such an attack is scale. How can you deploy a malicious browser across the phones or computers of thousands of people? Android applications tend to be the main vector of attack we see for distributing Malware these days. We should assume fraudsters will distribute malicious Android applications under the guise of games or utility apps. These apps will have forked browser functionality within them. They will have to load web-pages (likely in an invisible web-view), extract values for "nonce", occasionally simulate a click (but not too often!), then issue "out of band" requests and hoard the signatures (likely sending them back home to a command and control center). This is a challenging threat to combat. It's easier for websites with login (like Facebook) to prevent this, but could be a significant threat to news sites that do not require login. Here are a few ideas that come to mind regarding means of mitigating this threat:
|
I love your optimism! OK, let's leave behind the question of "can we convince very large numbers of website owners to make changes" for a moment, and just discuss how we would theoretically want this to function. I think it's very similar to the click signing. The only difference is that the conversion-source is signing a "conversion" instead of a "click". Walking through my "fraudster threat model" again, here's how I expect this to be attacked:
There is a similar discussion going on between @csharrison and I on the Chrome proposal: WICG/attribution-reporting-api#13 The Webkit proposal has the benefit of not adding random noise to the "Ad Attribution Data". This means that we can sign over the conversion value. The downside of the Chrome proposal is that a randomly scrambled noise conversion is essentially indistinguishable from conversion fraud. |
If it helps, both Privacy Pass and the underlying interactive cryptographic protocol, VOPRFs (verifiable oblivious pseudo-random functions) are undergoing IETF standardization:
|
@johnwilander perhaps I missed it, but is there a reason why blind signatures are needed here instead of a VOPRF (Trust Token or Privacy Pass)? The sketch above does not seem to rely on signatures being publicly verifiable. |
We'll share more about our proposal when we have things ready. |
See #41 (comment). |
We need to determine what fraud prevention mechanism to use. Each possible mechanism should be labeled fraud prevention.
Labeled layering because the mechanism should work for the Event-level Conversion Measurement folks too.
Old summary below.
Fraud prevention has been discussed in #6 and at W3C TPAC in September this year.
The Attack
The secure HTTP POST request to ad click source eTLD+1/.well-known/ad-click-attribution/6-bit ad attribution data/6-bit ad campaign id can be spoofed to report fraudulent attribution data.
The Countermeasure
The suggested way to prevent this kind of fraudulent attribution reporting is to create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both, and provide that proof as part of the attribution report.
The existing proposal for such proof is Trust Tokens, as outlined in the Trust Token API proposal.
To discuss:
The text was updated successfully, but these errors were encountered: