Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should be Opt-In: Creates privacy harm with little to no upside for Web users #1

Open
pes10k opened this issue Dec 25, 2019 · 23 comments
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.

Comments

@pes10k
Copy link

pes10k commented Dec 25, 2019

Moving comments over from w3c/reporting#169

This API should be opt in

  1. This introduces privacy harm, in that it opens up new channels for communication between the site and potentially 3rd parties, over types of information not currently easily captured.
  2. This should be opt in, since the primary beneficiary is the web site, and not the web user. This API is a way for websites to replace their own testing and monitoring by offloading the responsibility onto site visitors. Its possible that some users would like to participate in this effort, but its presumptuous (in the extreme) to assume users want to provide information to the to participate in this effort (and what is primarily the sites responsibility)
  3. What information would be added to the WebExtension layer to allow extensions or similar to block Depreciation reports (independent of other POSTs / reports)?
@vdjeric
Copy link

vdjeric commented Dec 26, 2019

To summarize my own comments in the previous thread:

  • We know first-hand that crashes on pages are fairly common, mostly caused by memory leaks
  • Such crashes & memory leaks are basically impossible to detect via automation for complex sites
  • We optimize what we can measure. Being able to monitor crash rates incentivizes sites to invest in fixing memory leaks. Third-party analytics providers can easily integrate this info
  • Detecting increased crashes and fixing memory leaks is bringing real value to users
  • The 2 bits of information gained (crashed: yes/no, oom: yes/no) are not a privacy risk to users. If the concern is about communication with 3rd-party sites, the reporting URI can be limited to the same origin as the page
  • Requiring user opt-in for simple crash signals seriously decreases the usefulness of the data
  • It's already possible to indirectly detect crashes (by having the page set a flag in local storage, then clear it in the onunload event) but we know firsthand that this signal is MUCH less reliable than browser reporting

This is my personal POV from having worked on fixing memory leaks on Facebook.com and trying to track crash rates by other means, and I would love to hear perspectives of others.

@pes10k
Copy link
Author

pes10k commented Dec 26, 2019 via email

@vdjeric
Copy link

vdjeric commented Dec 26, 2019

I think that's the wrong framing to use, by this standard we would require every browser feature to require user opt-in. Why not also ask the user for permission to share user-agent headers? Or use IndexedDB? Or send an XHR? They would all require an opt-in prompt by this standard, since most users are spooked by a permission prompt.

Today, a permission dialog sends a strong message to the user that they need to make a risky privacy or security decision (allow camera recording? allow microphone recording? share location? allow USB access?). I don't think a message notifying of a crash event is anywhere on par with any of the other capabilities currently requiring user opt-in.

Additionally, the suggested term "crash report" would be misleading, since a crash report in any other context would include low-level information and likely private user data.

I think it's ok for us to disagree since we're coming at this from different perspectives, but we should encourage others to chime in too.

@pes10k
Copy link
Author

pes10k commented Dec 26, 2019 via email

@vdjeric
Copy link

vdjeric commented Dec 26, 2019

Separate point: You mentioned before that out of memory crashes are common from a sites’ perspective. Can you give numbers on the number of out of memory crashes the average user expertness over X period of time, that could not have been caught by site owners measuring and monitoring and taking responsibility for their own site?

I think I need to get permission to share numbers, so realistically that will have to wait until January. But these OOM crashes are frequent and severe enough that it is not uncommon for browser vendors to reach out about spikes in OOM rates on our websites that they detect via browser telemetry.

The leaks are not readily reproducible first-hand nor in automation. Both Mozilla and Facebook tried to reproduce a recent OOM spike in Firefox on Facebook.com and we both failed, despite having a lot more invested in automation systems than the majority of web properties.

@michaelkleber
Copy link

Happy New Year, folks.

Since you're looking for additional opinions: I disagree with the basic premise of this Issue.

In the cases under consideration here, the user just tried to do something and failed, and the API enables the ability to fix that failure, i.e. to make the web site succeed at what the user wants to do. The fact that the user just tried to do the thing means that our presumption must be that they want the thing to work. So we should start in favor of making the debugging information available by default, reflecting our best judgement about the user's interest.

Opt-In here is exactly wrong. It would only make sense in a world where particularly technically sophisticated users are the ones who want sites to work correctly. But such users may well already have the skills to diagnose and report problems; it is rather all other users whom this API benefits.

Moreover, the WIP Privacy Threat Model outlines the high-level privacy threats that we ought to be paying attention to, and none of the threats listed there are relevant to this API. The only thing that comes close now is the least threatening:

  • Benign information disclosure (connected hardware [game controller or assistive device], system preferences [like dark mode]…)

But the bit of information being revealed here ("This site experienced an OOM") is much more about the page being viewed than it is about the person viewing it — which decreases the privacy threat even more, while increasing the value to the user of it being reported, as above.

Note that w3cping/privacy-threat-model#9 tracks adding something about this use case to the Threat Model, and I agree, and hope we can make it clear that is not a privacy threat.

@pes10k
Copy link
Author

pes10k commented Jan 6, 2020

  1. The privacy threat model is extremely under development and does not reflect any PING opinion or consensus.
  2. [opt in] would only make sense in a world where particularly technically sophisticated users are the ones who want sites to work correctly is mis-stating the argument. The point is that currently "make sure my website works correctly because of OOM errors" is (i) a problem relatively few sites have, and (ii) a problem that (logically, correctly!) is seen as a site owner's responsibility to fix, using the site owner's resources.

The idea behind this proposal is a bad and categorical change from current web functionality. That "it would be helpful for site owners to use the resources of non-consenting users to fulfill site owner responsibilities" is not controversial. The idea that browsers should volunteer users' resources and implement functionality that (in its first order effect) is only site-benefiting, and at the margins user-harmful (i.e. privacy harm in the form of new categories of information sent to 3rd parties), is (to put it mildly) not respectful of users and a very bad direction.

@NalaGinrut
Copy link

I agree with @snyderp about The privacy threat model is extremely under development and does not reflect any PING opinion or consensus. I understand it's harder to discuss without an available convention, but I think we still can talk about privacy threats before there's a draft with consensus around PING.

No matter if the privacy threat model is ready, there's is a basic convention in any privacy discussion, that is "try to protect user's privacy if any possible". So the experiences around OOM debug seem not convincing enough to against the concerns of privacy leak. On the contrary, development experiences usually tell us it's more convenient to debug if users can endure some information leak. So I'm afraid it's the wrong direction to check the opt-in proposal by development experiences.

Just pass by and chime in, comments are welcome.

@michaelkleber
Copy link

  1. Yup of course I quite agree that the Threat Model is still a work in progress; that's what I meant by "WIP Privacy Threat Model" in my comment. Not considering it authoritative, just suggestive.

1½. To be clear, this issue likewise does not reflect any PING opinion or consensus.

  1. Without this API, there is no way to report OOM errors — that's the point. So how can you say relatively few sites have this problem while literally arguing against making it possible to measure?

  2. RUM is a part of a site's resources. No successful site pretends that the only things that happen in the real world are ones they can anticipate in testing.

The idea behind your objection is a bad and categorical change from best practices on the web today. By your logic we should take try..catch out of JS as well, since it offers a way to observe when code fails and then return a stack trace to a server.

Rather than relying on philosophical debates about whether "browsers should volunteer users' resources" for ways that improve that user's experience, I would prefer to get back to a discussion of privacy considerations: What information that this API makes available leaks something private about the user?

@pes10k
Copy link
Author

pes10k commented Jan 7, 2020

Without this API, there is no way to report OOM errors — that's the point. So how can you say relatively few sites have this problem while literally arguing against making it possible to measure?

#1 (comment) suggests the opposite, that this API is motivated by X number of cases for Y number of users on Z number of sites. So at least some numbers seem to exist somewhere.

Another way to measure: ask people if its okay to user their browsers to measure!

The idea behind your objection is a bad and categorical change from best practices on the web today. By your logic we should take try..catch out of JS as well, since it offers a way to observe when code fails and then return a stack trace to a server.

I honestly don't understand the point you're trying to make here. But, if nothing else, stack traces exist today, OOM reporting does not; the argument is that this is a new category of information and browser vendors should not be blase in helping themselves to it, and assume users want to share even more data.

Data minimization (e.g. user should share only as much information as needed to achieve the user's goal) is a basic privacy principal (https://tools.ietf.org/html/rfc6973#section-6.1). I don't consider RFC 6973 to be the final word, but its a good floor. This proposal is (plainly) counter to that.

@michaelkleber
Copy link

Sorry, let me clarify. Your vision for PING includes going back through old specs and fixing privacy problems there, not merely preventing new ones. Which is great! But it seems to me that the same reasoning that leads you to object to this API would lead you to object to try..catch in your retrospective review, where "this is a new category of information" is not a relevant distinction.

If I'm wrong, and you think this API has problems while try..catch does not, please help me understand where the differences are?

(I'll keep the RFC 6973 discussion on w3cping/privacy-threat-model#9.)

@pes10k
Copy link
Author

pes10k commented Jan 8, 2020

Honest question, maybe there are capabilities here I don't know about, but how would you capture the reports described in the proposal (ie. OOM, unresponsive tab) using try…catch?

@jyasskin
Copy link
Member

jyasskin commented Jan 8, 2020

The shape of this API seems like it does enable especially-privacy-sensitive browsers like Brave to add a prompt in the place @snyderp wants, in the "your tab just crashed" UI. Have I missed something preventing that? Other types of reports might not be so lucky and might want some API change to allow Brave to provide the UI it wants.

As is usual for permission prompts, we expect different browsers to design different permission UI, including some (Chrome, at least, in this case) that might omit certain prompts entirely, either by auto-granting or auto-denying them.

@clelland
Copy link
Collaborator

clelland commented Jan 8, 2020

It seems like there shouldn't be any interoperability issue with a user prompt in that case -- there is no further interaction possible with the page anyway, nothing else can possibly depend on the report being sent silently (or being sent at all, really)

If this is a reasonable point for UAs to differ on, I'm happy to accept any spec text that says that the UA may present information to the user and ask for explicit permission before sending crash reports, consistent with the browser vendor's privacy stance.

@pes10k
Copy link
Author

pes10k commented Jan 9, 2020

@jyasskin @clelland

The goal of standards stuff is not to make sure the spec is broad enough so that both privacy-respecting and non-privacy-respecting vendors can be stamped "standards compliant", its to ensure that the standards that comprise the web platform require privacy (i.e. privacy by default). So, simply saying "the spec allows Brave to do what Brave wants" is not addressing the concern; the concern is that the spec, as authored, allows non-privacy-respecting implementations.

I'm still totally baffled why you all are so resistant to just asking users if they want to send a crash report…

@vdjeric
Copy link

vdjeric commented Jan 9, 2020

I'm still totally baffled why you all are so resistant to just asking users if they want to send a crash report…

We are looking at it from the perspective of weighing the potential benefits vs risks+costs. Helping sites fix leaks and memory problems is valuable to both users and sites, while the cost is minimal (an extra network request), and the privacy risk non-existent. This does not warrant an intrusive user permission dialog, and significantly hurts the quality of the data.

As I understand it, your opposition is based on a philosophical/principled stand, that user resources (extra network request) should not be used by the site (which likely makes 100s of other network requests, including for analytics purposes) when it doesn't directly/immediately serve the user's intent, unless the user gives explicit consent.

@pes10k
Copy link
Author

pes10k commented Jan 9, 2020

There is a plain tension between the two arguments advancing the proposal:

  1. this represents users’ desires, since they want to help the site work better
  2. we can’t ask the user, because not enough people will say yes if they’re asked

(I am very skeptical of the idea “users won’t understand the question” since many browsers and devices ask identical questions, and “it looks like something went wrong. :( Would you like to notify the site about what happened?” seems pretty easy to grok)

As I understand it, your opposition is based on a philosophical/principled stand, that user resources (extra network request) should not be used by the site (which likely makes 100s of other network requests) when it doesn't directly/immediately serve the user's intent, unless the user gives explicit consent.

This is not correct, though I apologize that this part of the conversation has fragmented to the other parallel thread. This is not primarily about user resources, its about privacy / user information. This states it more clearly:
w3cping/privacy-threat-model#9 (comment)

I do not accept the framing that this is an abstract, principal-only issue, if the implication of that is that the decision here won't actually affect users.

@vdjeric
Copy link

vdjeric commented Jan 9, 2020

What mechanism does WICG usually use to resolve these conflicts of perspectives?

@jyasskin
Copy link
Member

jyasskin commented Jan 9, 2020

I'm not one of the WICG chairs, but https://wicg.github.io/admin/charter.html#decision describes how the WICG resolves conflicts. Ideally, either Pete convinces the editors, or the editors convince Pete. If after the discussion, both sides still think they're right, the charter encourages both groups to produce implementations that behave differently, and use the resulting implementation and use experience to create better consensus for the next stage of standardization.

In this case, although it's been very difficult to figure out what concrete privacy harm @snyderp is pointing to, I think it comes down to:

  1. A crash report is sent from an IP address that identifies a user or small group of users.
  2. A crash report may include implicit information about the user's operating environment that isn't included in their User Agent string. For example, the site might only crash with Foo Antivirus 3.5 installed, or with NviTI Graphics Card 1.4. If the site does enough testing, they can discover that fact, at which point they've learned something about the user.

The question then is whether the value to the user is enough to justify the site learning that information.

@shwetank
Copy link

If the concern is about communication with 3rd-party sites, the reporting URI can be limited to the same origin as the page

At the very least, this should be something worth adding in.

@michaelkleber
Copy link

If a 3rd-party iframe causes an OOM, then it seems natural that they ought to be able to get the report. Maybe allow the origin of a document, whether or not it's the top? (Maybe this only makes sense in a browser with out-of-process iframes.)

@pes10k
Copy link
Author

pes10k commented Jan 14, 2020

I disagree with the above framing that this is a UX / interop issue; its instead a privacy/security (and consent) issue.

So maybe the way to cut the knot and make progress is this is to say that sending a crash report requires a permission from the user. Since the Permission API allows for NOOPs (i.e. inferred permission), Chrome folks can just assume their users want to always give permission for this. But mentioning it as a permissioned activity in the spec is important, since it gives other implementors hooks to differentate-but-still-interop on, and a consistent and concise way to reason about the different privacy trade offs made by vendors

@plehegar plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 10, 2020
@annevk
Copy link

annevk commented Mar 6, 2020

  1. The crash report can uniquely identify the user if it happened on a URL that uniquely identifies the user.
  2. The fact that a crash happened might also hint at an activity of the user that would not otherwise be known (or distinguishable from closing the tab).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.
Projects
None yet
Development

No branches or pull requests

9 participants