Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Security consideration: Multi-threading helps cache-based side channel attacks #1

Closed
mseaborn opened this issue Jul 24, 2015 · 41 comments

Comments

@mseaborn
Copy link

Shared memory allows the construction of a high-resolution timer (nanosecond resolution), which enables cache-based side channel attacks, such as those described in the paper "The Spy in the Sandbox -- Practical Cache Attacks in Javascript" (http://arxiv.org/abs/1502.07373).

For example, key press handling code produces a detectable signature of accesses to L3 cache sets. An attacker can use timings to deduce when the user is pressing keys.

Should the SharedArrayBuffer spec discuss this as a security consideration? This might help us come up with mitigations.

For comparison, browsers have tried to mitigate cache side channels by reducing the resolution of performance.now(). (e.g. See https://crbug.com/506723 for Chrome.) With multi-threading, though, it's easy to build your own high-res timer by creating a thread that increments a memory location in a tight loop.

I'm not terribly optimistic about mitigating this. Finding a solution is a research problem and may well be infeasible. I doubt this should block Shared Array Buffers, but if we're adding multi-threading to the Web platform we should at least be aware of what we're letting ourselves in for.

See also the Chromium issue tracking this: https://crbug.com/508166

[Aside: is there a preferred forum for discussing the SharedArrayBuffer spec? Brad Nelson and Ben Smith -- who are working on Chromium's support for SharedArrayBuffers -- told me there isn't one yet and recommended I file an issue in this issue tracker.]

@Yossioren
Copy link

Here's a CodePen that provides a basic high-resolution timer polyfill based on SharedArrayBuffers. Its resolution is on the order of 0.3ns. Tested on Firefox Nightly (built from https://hg.mozilla.org/mozilla-central/rev/72835344333f )

http://codepen.io/yossioren/pen/LVBNMR

@lars-t-hansen
Copy link
Collaborator

@mseaborn, this is the best place to discuss the spec. Thank you for highlighting this issue, it needs to be brought into the spec somehow.

@Yossioren, noted :)

@lars-t-hansen
Copy link
Collaborator

At the ECMA TC39 meeting in Portland in September, 2015, the committee decided that resolving the present issue (in the sense of working out the consequences of it and figuring out mitigations, if the issue is deemed to be a real concern) is a blocker for acceptance to Stage 2.

@Yossioren
Copy link

Thanks for letting me know Lars. Am I allowed to discuss this publicly?

Sent from my potato.

On 24 בספט׳ 2015, at 18:53, Lars T Hansen [email protected] wrote:

At the ECMA TC39 meeting in Portland in September, 2015, the committee decided that resolving the present issue (in the sense of working out the consequences of it and figuring out mitigations, if the issue is deemed to be a real concern) is a blocker for acceptance to Stage 2.


Reply to this email directly or view it on GitHub.

@lars-t-hansen
Copy link
Collaborator

@Yossioren, it should be OK to discuss this publicly.

@lars-t-hansen lars-t-hansen modified the milestone: Stage 2 Sep 24, 2015
@juj
Copy link

juj commented Sep 25, 2015

"For example, key press handling code produces a detectable signature of accesses to L3 cache sets. An attacker can use timings to deduce when the user is pressing keys."

This sounds like a very concrete claim and the paper was written without depending on the SAB proposal, so it sounds like the claimed vulnerability already existed before independent of SAB. Is there a JavaScript demo out somewhere that would demonstrate this attack (or any other kind of attack based on the paper)? The case here seems to be that SAB makes the claimed already existing vulnerability statistically worse, since a better timer precision might improve a signal-to-noise ratio of such a statistical attack, but vulnerabilities are essentially binary, they either exist or not.

Since the discussion is not about that SAB would open up a new vulnerability, but it might make the statistical success rate of an existing claimed attack better, the discussion should entail how many orders of magnitude are we talking about here?

Chrome reduced their high-resolution timer resolution to 5 microseconds (see http://src.chromium.org/viewvc/blink/trunk/Source/core/timing/PerformanceBase.cpp?r1=198348&r2=198347&pathrev=198348 ). Why was 5 microseconds chosen? How much did that help against the success rate of a claimed attack?

@Yossioren
Copy link

Hi Juj,

There's a proof-of-concept for the attack that works pretty well on Firefox
40 Ubuntu/Mac, where the hi-res timer has nanosecond resolution. I'll send
it to you privately.

As you said, the attack isn't due to something that's wrong with SAB. The
attack stems from the way CPU caches are laid out ever since Sandy Bridge,
and is made possible by the availability of high-resolution time
measurements. The attack works best if I can measure the difference
between an L3 cache miss and an L3 cache hit, which is on the order of tens
of nanoseconds, and there are ways of making it work as long as I can
string 64 of these misses in a row, which gets me somewhere around 2
microseconds. Now that the timer resolution is 5 us, the attack as
described cannot be carried out.

The latest versions of the big browsers don't have any timer that goes
below 5 us. If you use SAB to emulate a timer, you can get timing down to
the single nanosecond. I'll also send you a port of the Firefox attack that
uses SAB and works on Nightly for Ubuntu/Mac/Windows with SAB enabled.

Coming back to the top of my comments - the attack isn't due to something
that's wrong with SAB, it's just a "hot potato" that got handed to us from
the CPU's designers. It's going to require a lot of creativity to "fix" SAB
somehow in order to mitigate it.

Best regards,
Yossi.

On Fri, Sep 25, 2015 at 6:33 PM, juj [email protected] wrote:

"For example, key press handling code produces a detectable signature of
accesses to L3 cache sets. An attacker can use timings to deduce when the
user is pressing keys."

This sounds like a very concrete claim and the paper was written without
depending on the SAB proposal, so it sounds like the claimed vulnerability
already existed before independent of SAB. Is there a JavaScript demo out
somewhere that would demonstrate this attack (or any other kind of attack
based on the paper)? The case here seems to be that SAB makes the claimed
already existing vulnerability statistically worse, since a better timer
precision might improve a signal-to-noise ratio of such a statistical
attack, but vulnerabilities are essentially binary, they either exist or
not.

Since the discussion is not about that SAB would open up a new
vulnerability, but it might make the statistical success rate of an
existing claimed attack better, the discussion should entail how many
orders of magnitude are we talking about here?

Chrome reduced their high-resolution timer resolution to 5 microseconds
(see
http://src.chromium.org/viewvc/blink/trunk/Source/core/timing/PerformanceBase.cpp?r1=198348&r2=198347&pathrev=198348
). Why was 5 microseconds chosen? How much did that help against the
success rate of a claimed attack?


Reply to this email directly or view it on GitHub
https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/1#issuecomment-143253374
.

@lukewagner
Copy link

When the timer's precision is limited to 5us resolution in the manner shown above, couldn't the attacker recover the higher precision by observing the precise "edge" of the clock (when it jumps forward 5us)?

@lars-t-hansen
Copy link
Collaborator

Copying over a comment from Waldemar Horwat on es-discuss. The paper he references has not been mentioned earlier on this thread.

I was asked to share my concerns about how bad this can be. Here's a paper demonstrating how one AWS virtual machine has been able to practically break 2048-bit RSA by snooping into a different virtual machine using the same kind of shared cache timing attack. These were both running on unmodified public AWS, and much of the challenge was figuring out when the attacker was co-located with the victim since AWS runs a lot of other users' stuff. This attack would be far easier in shared-memory ECMAScript, where you have a much better idea of what else is running on the browser and the machine (at least in part because you can trigger it via other APIs).

https://eprint.iacr.org/2015/898.pdf

Chrome currently mitigates this by limiting the resolution of timers to 1µs. With any kind of shared memory multicore you can run busy-loops to increase the attack timing surface by 3½ orders of magnitude to about 0.3ns, making these attacks eminently practical.

Waldemar

@mseaborn
Copy link
Author

lukewagner wrote:

When the timer's precision is limited to 5us resolution in the manner shown above, couldn't the attacker recover the higher precision by observing the precise "edge" of the clock (when it jumps forward 5us)?

I suppose you mean something like the following? We can time operations by counting the number of loop operations between jumps in the clock's value ("ticks").

// Returns number of loop iterations before the clock ticks.
function wait_for_tick() {
  var t0 = get_time();
  var count = 0;
  for (;;) {
    if (get_time() != t0)
      return count;
    count++;
  }
}

// Returns how long op_func() takes, in fractional clock ticks.
function time_op(op_func) {
  // Calibrate.
  wait_for_tick();
  var iters_per_tick = wait_for_tick();

  // Run op_func() at the start of a clock tick.
  op_func();

  // Measure time taken for the remaining clock tick.
  // (Assumes that op_func() took <1 clock tick, but this assumption
  // could be fixed.)
  var iter_count = wait_for_tick();
  return (iters_per_tick - iter_count) / iters_per_tick;
}

Yossef, has this approach been considered in the literature on cache attacks? It seems semi-obvious, but it didn't seem to come up when the lower-clock-resolution mitigations were added to browsers.

A mitigation for this is presumably to add some jitter to the clock. How effective would that be?

@lukewagner
Copy link

@mseaborn Exactly. Adding jitter seems potentially quite frustrating to users.

@Yossioren
Copy link

Hi guys,

That's a nice idea Mark, which I haven't seen discussed. It lets you set
up a busy-wait timer in a single-threaded application. This trick will
certainly bring up the effective timer resolution, but it's not clear by
how much, at least in the JS environment. You see, performance.now()
(which you abbreviated as get_time()) takes around 50ns to execute on its
own. You will definitely not be able to tell apart cache hits and misses
(which are about 20ns apart), but the millisecond-resolution attacks might
work again. I should try implementing this in JS to see what sort of
resolution it gets in practice.

As for countermeasures, as long as the clock stays monotonically increasing
and the jitter isn't too crazy (1us to each side?) users won't be seriously
impacted. The question is how easy would it be to circumvent.

On Wed, Sep 30, 2015 at 12:32 AM, Luke Wagner [email protected]
wrote:

@mseaborn https://github.com/mseaborn Exactly. Adding jitter seems
potentially quite frustrating to users.


Reply to this email directly or view it on GitHub
https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/1#issuecomment-144197536
.

@lars-t-hansen
Copy link
Collaborator

For the record (courtesy Brendan), an older timing attach via SVG and requestAnimationFrame:
http://www.theregister.co.uk/2013/08/05/html5_timing_attacks/.

@Yossioren
Copy link

OK Mark, I implemented your idea in JS, here's the codepen:
http://codepen.io/yossioren/pen/XmpZMZ?editors=001

(just the output: http://s.codepen.io/yossioren/debug/XmpZMZ? )

The output of this page is a histogram -- the X axis is ticks, and the Y
axis is how often iters_per_tick is equal to this specigtic value.

On my test machine you can fit only around 10 busy-wait ticks into one 5us
system tick, and the jitter is terrible. See for yourself:
[image: Inline image 1]

On Wed, Sep 30, 2015 at 7:05 PM, Lars T Hansen [email protected]
wrote:

For the record (courtesy Brendan), an older timing attach via SVG and
requestAnimationFrame:
http://www.theregister.co.uk/2013/08/05/html5_timing_attacks/.


Reply to this email directly or view it on GitHub
https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/1#issuecomment-144461757
.

@jfbastien
Copy link
Contributor

This type of infoleak has existed for a while in Chrome using PNaCl's atomics, is available in all browsers through Flash's shareable byte array, and is possible in JavaScript on x86 using denormals.

These are just a few examples: it seems like very high precision timing information leaks are an inevitability, and SAB are adding a redundant attack surface.

It would be worth exploring the following in separate issues:

  • How can one write proper timing-independent code (e.g. to avoid leaking keys)?
  • How does SAB affect the web API surface area? Most APIs weren't designed to be called from multiple threads, and have led to exploits in the past e.g. pwn2own 2015.

@mseaborn
Copy link
Author

mseaborn commented Oct 8, 2015

One of the concerns about allowing high-res timers via shared memory is that browsers wouldn't be able to disable this functionality if the resulting info leaks turn out to be more serious than we'd originally thought. Once web apps start using shared memory, disabling it would break the web. Browsers can easily change the timing resolution of performance.now(), but they can't do the same with the timing resolution of shared memory because it's a more fundamental property of how sharing memory between CPUs works.

Here's a possible answer to that concern:

A simple mitigation is to pin all the relevant threads to the same CPU. (i.e. All threads that can concurrently access a Shared Array Buffer.) This should prevent the SAB from being used to construct a high-res timer, because there would be no fine-grained interleaving of the threads' instructions.

This has the benefit that it's very simple to implement, e.g. using pthread_setaffinity_np() on Linux.

Obviously this takes away the performance benefit of using multiples cores/hyperthreads. But it does not have a performance impact beyond that: It does not slow down individual threads. It does not require interposing on memory accesses or inserting any delays/synchronisations (unlike Dthreads, for example). It does not require doing a CPS-style transform (as Emscripten's Emterpreter-Async functionality does today), which would increase power usage.

Browser vendors or users who are particularly privacy conscious might choose to enable this pinning scheme. Otherwise, this pinning scheme gives us an "escape hatch" -- browsers could enable it if timing side channel attacks start to become more widely used.

@lars-t-hansen
Copy link
Collaborator

@mseaborn, thanks for the suggestion. While I think removing the performance benefit of multiple cores is going to sink the feature - without that benefit, the feature probably does not pay for itself in practice - it is helpful to remember that we have this escape if the timing attack becomes truly problematic and shared memory is the only remaining attack vector.

@jfbastien, I'll open a separate issue for the API surface, it's a big topic in itself.

@littledan
Copy link
Member

To @jfbastien 's point about this not being a new attack surface: There is actually a way to share an ArrayBuffer between two workers already, and that's through WebAudio. A website can open an AudioWorker which gets an ArrayBuffer sent to it. The ArrayBuffer is not actually transferred to and from it, but instead just runs at the same time with both JS contexts having access to it at the same time. I'm not sure if it's possible to get the ArrayBuffer to be visible to more than two threads. The only communication mechanisms available in this context are postMessage and reads/writes to the ArrayBuffer; no atomic or futex instructions are available. AudioWorkers are new, but they just replace ScriptProcessorNode, which I believe has the same issue.

If two threads would be enough to construct this high-precision timer, then SharedArrayBuffer doesn't worsen the attack surface vs what's already there in the web platform.

@Yossioren
Copy link

Thanks for pointing out this API @littledan. I'll take a look at it and see
if it's enough to polyfill a high-resolution timer with. If you want, you
can take look at the CodePen and give your own insights about this
possibility:

http://codepen.io/yossioren/pen/LVBNMR

On Mon, Oct 19, 2015 at 7:50 PM, littledan [email protected] wrote:

To @jfbastien https://github.com/jfbastien 's point about this not
being a new attack surface: There is actually a way to share an ArrayBuffer
between two workers already, and that's through WebAudio. A website can
open an AudioWorker
https://webaudio.github.io/web-audio-api/#AudioWorker which gets an
ArrayBuffer sent to it. The ArrayBuffer is not actually transferred to and
from it, but instead just runs at the same time with both JS contexts
having access to it at the same time. I'm not sure if it's possible to get
the ArrayBuffer to be visible to more than two threads. The only
communication mechanisms available in this context are postMessage and
reads/writes to the ArrayBuffer; no atomic or futex instructions are
available. AudioWorkers are new, but they just replace ScriptProcessorNode
https://developer.mozilla.org/en-US/docs/Web/API/ScriptProcessorNode,
which I believe has the same issue.

If two threads would be enough to construct this high-precision timer,
then SharedArrayBuffer doesn't worsen the attack surface vs what's already
there in the web platform.


Reply to this email directly or view it on GitHub
https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/1#issuecomment-149278084
.

@lars-t-hansen
Copy link
Collaborator

A couple of links pertaining to related discussions in the TOR community:
https://trac.torproject.org/projects/tor/ticket/1517
https://bugzilla.mozilla.org/show_bug.cgi?id=1217238

@lars-t-hansen
Copy link
Collaborator

@Yossioren, in reference to your earlier experiment with the idea from @lukewagner and @mseaborn, you write:

"The output of this page is a histogram -- the X axis is ticks, and the Y axis is how often iters_per_tick is equal to this specific value. On my test machine you can fit only around 10 busy-wait ticks into one 5us system tick, and the jitter is terrible."

I think it is possible to do better here. Specifically I think it should be possible to use a loop that does not have to call performance.now(), thus yielding a loop count with better precision. The way this would work is that we would first search for iteration counts that trigger changes to the value read from performance.now() after performing a known-fast or known-slow operation. To use this, we would then perform the operation to be measured and then iterate without reading the clock a set number of times and then read performance.now() to determine if it has changed. We can then conclude on the basis of that whether the operation was the fast operation (the reading should not have changed) or the slow operation (it should have changed).

I have a proof of concept of this in the repository https://github.com/lars-t-hansen/ticktock:

The program in fib.html demonstrates the likely running time of doubly-recursive fibonnaci(10) on the system under test; I see times of 500ns on an i7 MacBook Pro ("late 2013") and 1000ns on an older AMD FX4100 system in current versions of Firefox Developer Edition, which appears to use a 5us resolution for performance.now().

The program in granularity.html implements the algorithm above and is able to distinguish between fib(1) and fib(10) with what appears to me to be high reliability. The code is a little elaborate because it attempts to warm up the JIT properly and to avoid environmental effects such as loop warmup. But I do think it demonstrates at least the plausibility of the approach.

On my systems, there are a few wrong guesses, usually less than 5%; tweaking the cutoff has helped here but the tweaking carries over to the other system.

If, as you wrote somewhere (probably in your paper) that you can amplify the LLC miss cost up to about 1us, then this type of clock may be used to implement the attack in your paper.

@lars-t-hansen
Copy link
Collaborator

This is the Tor project thread that tracks the same issue as the present thread:
https://trac.torproject.org/projects/tor/ticket/17412

@jfbastien
Copy link
Contributor

Two new papers on the topic:
Flush+Flush: A Stealthier Last-Level Cache Attack

Research on cache attacks has shown that CPU caches leak significant information. Recent attacks either use the Flush+Reload technique on read-only shared memory or the Prime+Probe technique without shared memory, to derive encryption keys or eavesdrop on user input. Efficient countermeasures against these powerful attacks that do not cause a loss of performance are a challenge. In this paper, we use hardware performance counters as a means to detect access-based cache attacks. Indeed, existing attacks cause numerous cache references and cache misses and can subsequently be detected. We propose a new criteria that uses these events for ad-hoc detection.
These findings motivate the development of a novel attack technique: the Flush+Flush attack. The Flush+Flush attack only relies on the execution time of the flush instruction, that depends on whether the data is cached or not. Like Flush+Reload, it monitors when a process loads read-only shared memory into the CPU cache. However, Flush+Flush does not have a reload step, thus causing no cache misses compared to typical Flush+Reload and Prime+Probe attacks. We show that the significantly lower impact on the hardware performance counters therefore evades detection mechanisms. The Flush+Flush attack has a performance close to state-of-the-art side channels in existing cache attack scenarios, while reducing cache misses significantly below the border of detectability. Our Flush+Flush covert channel achieves a transmission rate of 496KB/s which is 6.7 times faster than any previously published cache covert channel. To the best of our knowledge, this is the first work discussing the stealthiness of cache attacks both from the attacker and the defender perspective.

ARMageddon: Last-Level Cache Attacks on Mobile Devices

In the last 10 years cache attacks on Intel CPUs have gained increasing attention among the scientific community. More specifically, powerful techniques to exploit the cache side channel have been developed. However, so far only a few investigations have been performed on modern smartphones and mobile devices in general. In this work, we describe Evict+Reload, the first access-based cross-core cache attack on modern ARM Cortex-A architectures as used in most of today's mobile devices. Our attack approach overcomes several limitations of existing cache attacks on ARM-based devices, for instance, the requirement of a rooted device or specific permissions. Thereby, we broaden the scope of cache attacks in two dimensions. First, we show that all existing attacks on the x86 architecture can also be applied to mobile devices. Second, despite the general belief these attacks can also be launched on non-rooted devices and, thus, on millions of off-the-shelf devices.
Similarly to the well-known Flush+Reload attack for the x86 architecture, Evict+Reload allows to launch generic cache attacks on mobile devices. Based on cache template attacks we identify information leaking through the last-level cache that can be exploited, for instance, to infer tap and swipe events, inter-keystroke timings as well as the length of words entered on the touchscreen, and even cryptographic primitives implemented in Java. Furthermore, we demonstrate the applicability of Prime+Probe attacks on ARM Cortex-A CPUs. The performed example attacks demonstrate the immense potential of our proposed attack techniques.

@lars-t-hansen
Copy link
Collaborator

@littledan, re AudioWorker, I talked to an engineer here who worked on that and he told me that that hole has been closed because it could be used to crash the browser. The scenario he outlined was this: You have your main thread, your AudioWorker, and another Web Worker. You share the buffer between the main thread and the AudioWorker. Then you neuter the buffer by transfering it to the Web Worker, leaving the AudioWorker pointing to garbage. My understanding is that the AudioWorker spec has been updated. Looking at http://www.w3.org/TR/webaudio/, and searching for "acquire the content operation" [sic], seems to back this up.

(Even without that fix I was told that the shared buffer could not be used for this attack, given the limited nature of the computation performed in the AudioWorker.)

@Yossioren
Copy link

Lars, that's fascinating. Do you think your colleague would be able to
provide a proof of concept, even in sketch form, of this crash scenario?

Kol tuv,
Yossi.

On Thu, Dec 10, 2015 at 5:33 PM, Lars T Hansen [email protected]
wrote:

@littledan https://github.com/littledan, re AudioWorker, I talked to an
engineer here who worked on that and he told me that that hole has been
closed because it could be used to crash the browser. The scenario he
outlined was this: You have your main thread, your AudioWorker, and another
Web Worker. You share the buffer between the main thread and the
AudioWorker. Then you neuter the buffer by transfering it to the Web
Worker, leaving the AudioWorker pointing to garbage. My understanding is
that the AudioWorker spec has been updated. Looking at
http://www.w3.org/TR/webaudio/, and searching for "acquire the content
operation" [sic], seems to back this up.

(Even without that fix I was told that the shared buffer could not be used
for this attack, given the limited nature of the computation performed in
the AudioWorker.)


Reply to this email directly or view it on GitHub
https://github.com/lars-t-hansen/ecmascript_sharedmem/issues/1#issuecomment-163660228
.

@taisel
Copy link

taisel commented Dec 11, 2015

I was also under the impression AudioWorker use cases were why main thread allowance of futex blocking was necessary. To make audio workers (more) useful, critical sections may be required for access between main thread and worker, namely for something like streaming PCM audio.

@lars-t-hansen
Copy link
Collaborator

A writeup summarizing both what's in the discussion above and what's happened in discussions elsewhere (subject to updating but fairly stable):
https://github.com/tc39/ecmascript_sharedmem/blob/master/issues/TimingAttack.md

Edit 2016/3/1: corrected the link.

@lars-t-hansen
Copy link
Collaborator

Just recording a finding.

Earlier @mseaborn made this suggestion: "A simple mitigation is to pin all the relevant threads to the same CPU. (i.e. All threads that can concurrently access a Shared Array Buffer.) This should prevent the SAB from being used to construct a high-res timer, because there would be no fine-grained interleaving of the threads' instructions."

This seems like a fine mitigation to me but it's hampered (as far as the standardization work is concerned) by there not being a reliable thread pinning API on all major platforms. Notably, Mac OS X has only an advisory API, and does not provide pthread_setaffinity_np(). Presumably other platforms could be affected too (unknown).

@lars-t-hansen
Copy link
Collaborator

This is considered largely resolved (January 2016 TC39 meeting). Waldemar is still concerned - considers this a blocker - but much of the rest of the committee seems convinced by the argument that (a) this is not a new capability on the web (Flash, Java, PNaCl, native extensions) and (b) wasm will open the problem regardless and (c) this type of info-leak should be closed by addressing the info-leak, not closing down the timer. Additionally, Google's security team does not believe the bug is exploitable in any significant way (I'm paraphrasing a statement read at the meeting, please treat as such).

@lars-t-hansen lars-t-hansen modified the milestones: Stage 3, Stage 2 Jan 27, 2016
@lars-t-hansen
Copy link
Collaborator

Argh, did not mean to close this, but to move to Stage 3.

@lars-t-hansen lars-t-hansen reopened this Jan 27, 2016
@ekr
Copy link

ekr commented Mar 1, 2016

@lars-t-hansen
Copy link
Collaborator

@ekr, I will fix the link. Thanks.

@lars-t-hansen
Copy link
Collaborator

@taisel
Copy link

taisel commented Mar 29, 2016

@lars-t-hansen
Would it be possible for main and/or worker thread to request the entrance and leaving of heightened security to avoid leaks when you want to avoid such? From what I'm seeing it'd enable/disable the affinity locking. I'm presuming this from the context that a web app that wants to actually be secure would opt into this performance degradation.

navigator.threadSecurity = true/false

Edit: This would be advisory and platform dependent. It would only take one context with the value true to enable the request, with the "false" valued ones being ignored.

@lars-t-hansen
Copy link
Collaborator

@taisel,

Would it be possible for main and/or worker thread to request the entrance and leaving of heightened security ...

In principle I think so, if the platform cooperates.

... to avoid leaks when you want to avoid such?

No, see below.

This would be advisory and platform dependent. It would only take one context with the value true to enable the request, with the "false" valued ones being ignored.

The API is not addressing the problem. The problem is that any process on the system at all, including the browser itself, is potentially vulnerable to cache sniffing from a loaded web page. The attacker would use two workers, one to carry out the attack and one to provide a clock signal. The attacker has no incentive to disable true parallelism, it needs it. The victim might not be a web page at all, it could be the SSL implementation in the browser or it could be outside the browser, and it can't just disable parallelism for the web pages either.

@taisel
Copy link

taisel commented Mar 29, 2016

Wouldn't that be an OS API issue then, and not the responsibility of the VM? As long as the VM does its "part" on security I don't see an issue. If the OS vendors lack an ability to mitigate this then relevant bugs should be filed into their bug reporting systems.

I'm against capping parallelism, When I say "part" I mean having the advisory API be a NOP unless the OS provides actual mitigations.

@lars-t-hansen
Copy link
Collaborator

@taisel, The argument is that if you have an installed hardware base (measured in the hundreds of millions of units) that is vulnerable to cache sniffing attacks, and you are the provider of a software platform that allows essentially arbitrary code to be run without user intervention and hence an attack to be run everywhere (through an ad, say), then you can't just wash your hands by saying it's the hardware's fault (or the operating system's, though in this case I don't think it is).

@lars-t-hansen
Copy link
Collaborator

@lukewagner dug up a new paper on a possible mitigation based on Intel's cache allocation technology (CAT): https://ssrg.nicta.com.au/publications/nictaabstracts/8984.pdf:

"This paper shows how such LLC side channel attacks can be defeated using a performance optimization feature recently introduced in commodity processors. Since most cloud servers use Intel processors, we show how the Intel Cache Allocation Technology (CAT) can be used to provide a system-level protection mechanism to defend from side channel attacks on the shared LLC. CAT is a way-based hardware cache-partitioning mechanism for enforcing quality-of-service with respect to LLC occupancy. However, it cannot be directly used to defeat cache side channel attacks due to the very limited number of partitions it provides. We present CATalyst, a pseudo-locking mechanism which uses CAT to partition the LLC into a hybrid hardware-software managed cache. We implement a proof-of-concept system using Xen and Linux running on a server with Intel processors, and show that LLC side channel attacks can be defeated. Furthermore, CATalyst only causes very small
performance overhead when used for security, and has negligible impact on legacy applications."

Not a panacea: CAT is Xeon-only and this needs OS support.

@lars-t-hansen
Copy link
Collaborator

Since this is really a Fact of life and browsers have started to ship this (https://groups.google.com/a/chromium.org/forum/#!msg/blink-dev/dnzvgTswfbc/AFIUge2oDQAJ) I will close the bug.

@taisel
Copy link

taisel commented Feb 1, 2018

@lars-t-hansen Interesting note: Linus Torvalds agrees with you. IIRC recently Intel tried to make some kind of mitigation rolled in as a feature - https://www.theregister.co.uk/2018/01/22/intel_spectre_fix_linux/

Seems this exact conversation is happening years later.

I have no words now...

@denji
Copy link

denji commented Feb 4, 2018

I have no words now...

AMD Zen 2!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants