Skip to content

Commit

Permalink
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linu…
Browse files Browse the repository at this point in the history
…x/kernel/git/tip/tip

Pull x86 page table isolation fixes from Thomas Gleixner:
 "A couple of urgent fixes for PTI:

   - Fix a PTE mismatch between user and kernel visible mapping of the
     cpu entry area (differs vs. the GLB bit) and causes a TLB mismatch
     MCE on older AMD K8 machines

   - Fix the misplaced CR3 switch in the SYSCALL compat entry code which
     causes access to unmapped kernel memory resulting in double faults.

   - Fix the section mismatch of the cpu_tss_rw percpu storage caused by
     using a different mechanism for declaration and definition.

   - Two fixes for dumpstack which help to decode entry stack issues
     better

   - Enable PTI by default in Kconfig. We should have done that earlier,
     but it slipped through the cracks.

   - Exclude AMD from the PTI enforcement. Not necessarily a fix, but if
     AMD is so confident that they are not affected, then we should not
     burden users with the overhead"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/process: Define cpu_tss_rw in same section as declaration
  x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
  x86/dumpstack: Print registers for first stack frame
  x86/dumpstack: Fix partial register dumps
  x86/pti: Make sure the user/kernel PTEs match
  x86/cpu, x86/pti: Do not enable PTI on AMD processors
  x86/pti: Enable PTI by default
  • Loading branch information
torvalds committed Jan 4, 2018
2 parents d6bbd51 + 2fd9c41 commit 00a5ae2
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 25 deletions.
13 changes: 6 additions & 7 deletions arch/x86/entry/entry_64_compat.S
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,13 @@ ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
swapgs

/* Stash user ESP and switch to the kernel stack. */
/* Stash user ESP */
movl %esp, %r8d

/* Use %rsp as scratch reg. User ESP is stashed in r8 */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp

/* Switch to the kernel stack */
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

/* Construct struct pt_regs on stack */
Expand Down Expand Up @@ -219,12 +224,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq $0 /* pt_regs->r14 = 0 */
pushq $0 /* pt_regs->r15 = 0 */

/*
* We just saved %rdi so it is safe to clobber. It is not
* preserved during the C calls inside TRACE_IRQS_OFF anyway.
*/
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi

/*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
Expand Down
17 changes: 13 additions & 4 deletions arch/x86/include/asm/unwind.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,18 +56,27 @@ void unwind_start(struct unwind_state *state, struct task_struct *task,

#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
/*
* WARNING: The entire pt_regs may not be safe to dereference. In some cases,
* only the iret frame registers are accessible. Use with caution!
* If 'partial' returns true, only the iret frame registers are valid.
*/
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
bool *partial)
{
if (unwind_done(state))
return NULL;

if (partial) {
#ifdef CONFIG_UNWINDER_ORC
*partial = !state->full_regs;
#else
*partial = false;
#endif
}

return state->regs;
}
#else
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
bool *partial)
{
return NULL;
}
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/kernel/cpu/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -923,8 +923,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

setup_force_cpu_cap(X86_FEATURE_ALWAYS);

/* Assume for now that ALL x86 CPUs are insecure */
setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
if (c->x86_vendor != X86_VENDOR_AMD)

This comment has been minimized.

Copy link
@JorgenPhi

JorgenPhi Jan 4, 2018

Nice job

This comment has been minimized.

Copy link
@luxin88

luxin88 Jan 4, 2018

干的漂亮

This comment has been minimized.

Copy link
@mrgaolei

mrgaolei Jan 4, 2018

火钳刘明

This comment has been minimized.

Copy link
@ry4ng

ry4ng Jan 4, 2018

Great stuff

This comment has been minimized.

Copy link
@duanqn

duanqn Jan 4, 2018

No speed loss on AMD CPUs?

This comment has been minimized.

Copy link
@DesmondFox

DesmondFox via email Jan 4, 2018

This comment has been minimized.

Copy link
@DesmondFox

DesmondFox Jan 4, 2018

@duanqn
Oh! In 694d99d40972f12e59a3696effee8a376b79d7c8 has been removed PTI for AMD CPUs. I did not notice :)

This comment has been minimized.

Copy link
@jiaolu

jiaolu Jan 4, 2018

沙发沙发!

This comment has been minimized.

Copy link
@linuxwind

linuxwind Jan 4, 2018

good job!

This comment has been minimized.

Copy link
@gnmmarechal

gnmmarechal Jan 4, 2018

Neat

This comment has been minimized.

Copy link
@shminer

shminer Jan 4, 2018

火钳刘明 坐等性能降低

This comment has been minimized.

Copy link
@KonnoYuuki

KonnoYuuki Jan 4, 2018

火钳刘明

This comment has been minimized.

Copy link
@nicolasFlinois

nicolasFlinois Jan 4, 2018

good job

This comment has been minimized.

Copy link
@firebroo

firebroo Jan 5, 2018

WINNER WINNER ,CHICKEN DINNER!

This comment has been minimized.

Copy link
@nx4dm1n

nx4dm1n Jan 5, 2018

WINNER WINNER ,CHICKEN DINNER!

This comment has been minimized.

Copy link
@kswapd

kswapd Jan 5, 2018

good!

This comment has been minimized.

Copy link
@Arvin-chen

Arvin-chen Jan 5, 2018

nice

This comment has been minimized.

Copy link
@jbelke

jbelke Jan 8, 2018

Thanks!

setup_force_cpu_bug(X86_BUG_CPU_INSECURE);

This comment has been minimized.

Copy link
@zhaoming-mike

zhaoming-mike Jan 4, 2018

good!

This comment has been minimized.

Copy link
@dev-0x7C6

dev-0x7C6 Jan 4, 2018

this would sound better X86_"BUG"_OR_"FLAW"_CPU_INSECURE


fpu__init_system(c);

Expand Down
31 changes: 22 additions & 9 deletions arch/x86/kernel/dumpstack.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,23 @@ void show_iret_regs(struct pt_regs *regs)
regs->sp, regs->flags);
}

static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
static void show_regs_if_on_stack(struct stack_info *info, struct pt_regs *regs,
bool partial)
{
if (on_stack(info, regs, sizeof(*regs)))
/*
* These on_stack() checks aren't strictly necessary: the unwind code
* has already validated the 'regs' pointer. The checks are done for
* ordering reasons: if the registers are on the next stack, we don't
* want to print them out yet. Otherwise they'll be shown as part of
* the wrong stack. Later, when show_trace_log_lvl() switches to the
* next stack, this function will be called again with the same regs so
* they can be printed in the right context.
*/
if (!partial && on_stack(info, regs, sizeof(*regs))) {
__show_regs(regs, 0);
else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
IRET_FRAME_SIZE)) {

} else if (partial && on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
IRET_FRAME_SIZE)) {
/*
* When an interrupt or exception occurs in entry code, the
* full pt_regs might not have been saved yet. In that case
Expand All @@ -98,11 +109,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
struct stack_info stack_info = {0};
unsigned long visit_mask = 0;
int graph_idx = 0;
bool partial;

printk("%sCall Trace:\n", log_lvl);

unwind_start(&state, task, regs, stack);
stack = stack ? : get_stack_pointer(task, regs);
regs = unwind_get_entry_regs(&state, &partial);

/*
* Iterate through the stacks, starting with the current stack pointer.
Expand All @@ -120,7 +133,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
* - hardirq stack
* - entry stack
*/
for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
const char *stack_name;

if (get_stack_info(stack, task, &stack_info, &visit_mask)) {
Expand All @@ -140,7 +153,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
printk("%s <%s>\n", log_lvl, stack_name);

if (regs)
show_regs_safe(&stack_info, regs);
show_regs_if_on_stack(&stack_info, regs, partial);

/*
* Scan the stack, printing any text addresses we find. At the
Expand All @@ -164,7 +177,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,

/*
* Don't print regs->ip again if it was already printed
* by show_regs_safe() below.
* by show_regs_if_on_stack().
*/
if (regs && stack == &regs->ip)
goto next;
Expand Down Expand Up @@ -199,9 +212,9 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unwind_next_frame(&state);

/* if the frame has entry regs, print them */
regs = unwind_get_entry_regs(&state);
regs = unwind_get_entry_regs(&state, &partial);
if (regs)
show_regs_safe(&stack_info, regs);
show_regs_if_on_stack(&stack_info, regs, partial);
}

if (stack_name)
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
* section. Since TSS's are completely CPU-local, we want them
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
*/
__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss_rw) = {
__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
.x86_tss = {
/*
* .sp0 is only used when entering ring 0 from a lower
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/stacktrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state);
unwind_next_frame(&state)) {

regs = unwind_get_entry_regs(&state);
regs = unwind_get_entry_regs(&state, NULL);
if (regs) {
/*
* Kernel mode registers on the stack indicate an
Expand Down
3 changes: 2 additions & 1 deletion arch/x86/mm/pti.c
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,8 @@ static void __init pti_setup_espfix64(void)
static void __init pti_clone_entry_text(void)
{
pti_clone_pmds((unsigned long) __entry_text_start,
(unsigned long) __irqentry_text_end, _PAGE_RW);
(unsigned long) __irqentry_text_end,
_PAGE_RW | _PAGE_GLOBAL);
}

/*
Expand Down
1 change: 1 addition & 0 deletions security/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ config SECURITY_NETWORK

config PAGE_TABLE_ISOLATION
bool "Remove the kernel mapping in user mode"
default y

This comment has been minimized.

Copy link
@chenyu139

chenyu139 Jan 4, 2018

这一行我看懂了!

This comment has been minimized.

Copy link
@43385607

43385607 Jan 4, 2018

我review了 小伙子 代码写得不错

This comment has been minimized.

Copy link
@shminer

shminer Jan 4, 2018

这一行看懂的下面请排队

depends on X86_64 && !UML
help
This feature reduces the number of hardware side channels by
Expand Down

1 comment on commit 00a5ae2

@parrotgeek1
Copy link

@parrotgeek1 parrotgeek1 commented on 00a5ae2 Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following CPUs also aren't affected because they don't do ANY speculation:

"Vortex86 SoC"
"SiS SiS SiS " (also for the first vortex86's)
Pre-2013 Atoms (i.e. up to n2800 but not n2807 etc)
Probably weird old ones like Rise mP6, etc

Please sign in to comment.