Skip to content

Commit

Permalink
origin
Browse files Browse the repository at this point in the history
GIT dda3e15231b35840fe6f0973f803cc70ddb86281

commit b0f2853b56a2acaff19cca2c6a608f8ec268d21a
Author: Christoph Hellwig <[email protected]>
Date:   Wed Jan 17 22:04:38 2018 +0100

    nvme-pci: take sglist coalescing in dma_map_sg into account
    
    Some iommu implementations can merge physically and/or virtually
    contiguous segments inside sg_map_dma.  The NVMe SGL support does not take
    this into account and will warn because of falling off a loop.  Pass the
    number of mapped segments to nvme_pci_setup_sgls so that the SGL setup
    can take the number of mapped segments into account.
    
    Reported-by: Fangjian (Turing) <[email protected]>
    Fixes: a7a7cbe3 ("nvme-pci: add SGL support")
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Keith Busch <[email protected]>
    Reviewed-by: Sagi Grimberg <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 20469a37aed12a886d0deda5a07c04037923144a
Author: Keith Busch <[email protected]>
Date:   Wed Jan 17 22:04:37 2018 +0100

    nvme-pci: check segement valid for SGL use
    
    The driver needs to verify there is a payload with a command before
    seeing if it should use SGLs to map it.
    
    Fixes: 955b1b5a00ba ("nvme-pci: move use_sgl initialization to nvme_init_iod()")
    Reported-by: Paul Menzel <[email protected]>
    Reviewed-by: Paul Menzel <[email protected]>
    Signed-off-by: Keith Busch <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 091f02483df7b56615b524491f404e574c5e0668
Author: Russell King <[email protected]>
Date:   Sat Jan 13 12:11:26 2018 +0000

    ARM: net: bpf: clarify tail_call index
    
    As per 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT"), the index used
    for array lookup is defined to be 32-bit wide. Update a misleading
    comment that suggests it is 64-bit wide.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit ec19e02b343db991d2d1610c409efefebf4e2ca9
Author: Russell King <[email protected]>
Date:   Sat Jan 13 21:06:16 2018 +0000

    ARM: net: bpf: fix LDX instructions
    
    When the source and destination register are identical, our JIT does not
    generate correct code, which leads to kernel oopses.
    
    Fix this by (a) generating more efficient code, and (b) making use of
    the temporary earlier if we will overwrite the address register.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit 02088d9b392f605c892894b46aa8c83e3abd0115
Author: Russell King <[email protected]>
Date:   Sat Jan 13 22:38:18 2018 +0000

    ARM: net: bpf: fix register saving
    
    When an eBPF program tail-calls another eBPF program, it enters it after
    the prologue to avoid having complex stack manipulations.  This can lead
    to kernel oopses, and similar.
    
    Resolve this by always using a fixed stack layout, a CPU register frame
    pointer, and using this when reloading registers before returning.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit 0005e55a79cfda88199e41a406a829c88d708c67
Author: Russell King <[email protected]>
Date:   Sat Jan 13 22:51:27 2018 +0000

    ARM: net: bpf: correct stack layout documentation
    
    The stack layout documentation incorrectly suggests that the BPF JIT
    scratch space starts immediately below BPF_FP. This is not correct,
    so let's fix the documentation to reflect reality.
    
    Signed-off-by: Russell King <[email protected]>

commit 70ec3a6c2c11e4b0e107a65de943a082f9aff351
Author: Russell King <[email protected]>
Date:   Sat Jan 13 21:26:14 2018 +0000

    ARM: net: bpf: move stack documentation
    
    Move the stack documentation towards the top of the file, where it's
    relevant for things like the register layout.
    
    Signed-off-by: Russell King <[email protected]>

commit d1220efd23484c72c82d5471f05daeb35b5d1916
Author: Russell King <[email protected]>
Date:   Sat Jan 13 16:10:07 2018 +0000

    ARM: net: bpf: fix stack alignment
    
    As per 2dede2d8e925 ("ARM EABI: stack pointer must be 64-bit aligned
    after a CPU exception") the stack should be aligned to a 64-bit boundary
    on EABI systems.  Ensure that the eBPF JIT appropraitely aligns the
    stack.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit f4483f2cc1fdc03488c8a1452e545545ae5bda93
Author: Russell King <[email protected]>
Date:   Sat Jan 13 11:39:54 2018 +0000

    ARM: net: bpf: fix tail call jumps
    
    When a tail call fails, it is documented that the tail call should
    continue execution at the following instruction.  An example tail call
    sequence is:
    
      12: (85) call bpf_tail_call#12
      13: (b7) r0 = 0
      14: (95) exit
    
    The ARM assembler for the tail call in this case ends up branching to
    instruction 14 instead of instruction 13, resulting in the BPF filter
    returning a non-zero value:
    
      178:  ldr     r8, [sp, #588]  ; insn 12
      17c:  ldr     r6, [r8, r6]
      180:  ldr     r8, [sp, #580]
      184:  cmp     r8, r6
      188:  bcs     0x1e8
      18c:  ldr     r6, [sp, #524]
      190:  ldr     r7, [sp, #528]
      194:  cmp     r7, #0
      198:  cmpeq   r6, #32
      19c:  bhi     0x1e8
      1a0:  adds    r6, r6, #1
      1a4:  adc     r7, r7, #0
      1a8:  str     r6, [sp, #524]
      1ac:  str     r7, [sp, #528]
      1b0:  mov     r6, #104
      1b4:  ldr     r8, [sp, #588]
      1b8:  add     r6, r8, r6
      1bc:  ldr     r8, [sp, #580]
      1c0:  lsl     r7, r8, #2
      1c4:  ldr     r6, [r6, r7]
      1c8:  cmp     r6, #0
      1cc:  beq     0x1e8
      1d0:  mov     r8, #32
      1d4:  ldr     r6, [r6, r8]
      1d8:  add     r6, r6, #44
      1dc:  bx      r6
      1e0:  mov     r0, #0          ; insn 13
      1e4:  mov     r1, #0
      1e8:  add     sp, sp, #596    ; insn 14
      1ec:  pop     {r4, r5, r6, r7, r8, sl, pc}
    
    For other sequences, the tail call could end up branching midway through
    the following BPF instructions, or maybe off the end of the function,
    leading to unknown behaviours.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit e9062481824384f00299971f923fecf6b3668001
Author: Russell King <[email protected]>
Date:   Sat Jan 13 11:35:15 2018 +0000

    ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs
    
    Avoid the 'bx' instruction on CPUs that have no support for Thumb and
    thus do not implement this instruction by moving the generation of this
    opcode to a separate function that selects between:
    
            bx      reg
    
    and
    
            mov     pc, reg
    
    according to the capabilities of the CPU.
    
    Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
    Signed-off-by: Russell King <[email protected]>

commit 45d55e7bac4028af93f5fa324e69958a0b868e96
Author: Thomas Gleixner <[email protected]>
Date:   Tue Jan 16 12:20:18 2018 +0100

    x86/apic/vector: Fix off by one in error path
    
    Keith reported the following warning:
    
    WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
      x86_vector_free_irqs+0xa1/0x180
      x86_vector_alloc_irqs+0x1e4/0x3a0
      msi_domain_alloc+0x62/0x130
    
    The reason for this is that if the vector allocation fails the error
    handling code tries to free the failed vector as well, which causes the
    above imbalance warning to trigger.
    
    Adjust the error path to handle this correctly.
    
    Fixes: b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
    Reported-by: Keith Busch <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Keith Busch <[email protected]>
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos

commit d47924417319e3b6a728c0b690f183e75bc2a702
Author: Thomas Gleixner <[email protected]>
Date:   Tue Jan 16 19:59:59 2018 +0100

    x86/intel_rdt/cqm: Prevent use after free
    
    intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
    proceeds accessing it.
    
     BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
     Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
     find_first_bit+0x1f/0x80
     has_busy_rmid+0x47/0x70
     intel_rdt_offline_cpu+0x4b4/0x510
    
     Freed by task 195:
     kfree+0x94/0x1a0
     intel_rdt_offline_cpu+0x17d/0x510
    
    Do the teardown first and then free memory.
    
    Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
    Reported-by: Joseph Salisbury <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: Ravi Shankar <[email protected]>
    Cc: Peter Zilstra <[email protected]>
    Cc: Stephane Eranian <[email protected]>
    Cc: Vikas Shivappa <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: "Roderick W. Smith" <[email protected]>
    Cc: [email protected]
    Cc: Fenghua Yu <[email protected]>
    Cc: Tony Luck <[email protected]>
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos

commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12
Author: Andi Kleen <[email protected]>
Date:   Tue Jan 16 12:52:28 2018 -0800

    module: Add retpoline tag to VERMAGIC
    
    Add a marker for retpoline to the module VERMAGIC. This catches the case
    when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
    making it insecure.
    
    It doesn't handle the case when retpoline has been runtime disabled.  Even
    in this case the match of the retcompile status will be enforced.  This
    implies that even with retpoline run time disabled all modules loaded need
    to be recompiled.
    
    Signed-off-by: Andi Kleen <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Greg Kroah-Hartman <[email protected]>
    Acked-by: David Woodhouse <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Cc: [email protected]
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/[email protected]

commit 4fdec2034b7540dda461c6ba33325dfcff345c64
Author: Paolo Bonzini <[email protected]>
Date:   Tue Jan 16 16:42:25 2018 +0100

    x86/cpufeature: Move processor tracing out of scattered features
    
    Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
    so do not duplicate it in the scattered features word.
    
    Besides being more tidy, this will be useful for KVM when it presents
    processor tracing to the guests.  KVM selects host features that are
    supported by both the host kernel (depending on command line options,
    CPU errata, or whatever) and KVM.  Whenever a full feature word exists,
    KVM's code is written in the expectation that the CPUID bit number
    matches the X86_FEATURE_* bit number, but this is not the case for
    X86_FEATURE_INTEL_PT.
    
    Signed-off-by: Paolo Bonzini <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Luwei Kang <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Radim Krčmář <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit 07c7b6a52503ac13ae357a8b3ef3456590a64b65
Author: Linus Walleij <[email protected]>
Date:   Tue Jan 16 09:51:51 2018 +0100

    gpio: mmio: Also read bits that are zero
    
    The code for .get_multiple() has bugs:
    
    1. The simple .get_multiple() just reads a register, masks it
    and sets the return value. This is not correct: we only want to
    assign values (whether 0 or 1) to the bits that are set in the
    mask. Fix this by using &= ~mask to clear all bits in the mask
    and then |= val & mask to set the corresponding bits from the
    read.
    
    2. The bgpio_get_multiple_be() call has a similar problem: it
    uses the |= operator to set the bits, so only the bits in the
    mask are affected, but it misses to clear all returned bits
    from the mask initially, so some bits will be returned
    erroneously set to 1.
    
    3. The bgpio_get_set_multiple() again fails to clear the bits
    from the mask.
    
    4. find_next_bit() wasn't handled correctly, use a totally
    different approach for one function and change the other
    function to follow the design pattern of assigning the first
    bit to -1, then use bit + 1 in the for loop and < num_iterations
    as break condition.
    
    Fixes: 80057cb417b2 ("gpio-mmio: Use the new .get_multiple() callback")
    Cc: Bartosz Golaszewski <[email protected]>
    Reported-by: Clemens Gruber <[email protected]>
    Tested-by: Clemens Gruber <[email protected]>
    Reported-by: Lukas Wunner <[email protected]>
    Signed-off-by: Linus Walleij <[email protected]>

commit 81d947e2b8dd2394586c3eaffdd2357797d3bf59
Author: Daniel Borkmann <[email protected]>
Date:   Mon Jan 15 23:12:09 2018 +0100

    net, sched: fix panic when updating miniq {b,q}stats
    
    While working on fixing another bug, I ran into the following panic
    on arm64 by simply attaching clsact qdisc, adding a filter and running
    traffic on ingress to it:
    
      [...]
      [  178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000
      [  178.197314] Mem abort info:
      [  178.200121]   ESR = 0x96000004
      [  178.203168]   Exception class = DABT (current EL), IL = 32 bits
      [  178.209095]   SET = 0, FnV = 0
      [  178.212157]   EA = 0, S1PTW = 0
      [  178.215288] Data abort info:
      [  178.218175]   ISV = 0, ISS = 0x00000004
      [  178.222019]   CM = 0, WnR = 0
      [  178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33
      [  178.231531] [0000810fb501f000] *pgd=0000000000000000
      [  178.236508] Internal error: Oops: 96000004 [#1] SMP
      [...]
      [  178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G        W        4.15.0-rc7+ #5
      [  178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
      [  178.326887] pstate: 60400005 (nZCv daif +PAN -UAO)
      [  178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
      [  178.336728] lr : __netif_receive_skb+0x28/0x78
      [  178.341161] sp : ffff00002344b750
      [  178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580
      [  178.349769] x27: 0000000000000000 x26: ffff000009378000
      [...]
      [  178.418715] x1 : 0000000000000054 x0 : 0000000000000000
      [  178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
      [  178.430537] Call trace:
      [  178.432976]  __netif_receive_skb_core+0x49c/0xac8
      [  178.437670]  __netif_receive_skb+0x28/0x78
      [  178.441757]  process_backlog+0x9c/0x160
      [  178.445584]  net_rx_action+0x2f8/0x3f0
      [...]
    
    Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
    which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
    Problem is that this cannot work since they are actually set up right after
    the qdisc ->init() callback in qdisc_create(), so first packet going into
    sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
    therefore panic.
    
    In order to fix this, allocation of {b,q}stats needs to happen before we
    call into ->init(). In net-next, there's already such option through commit
    d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats").
    However, the bug needs to be fixed in net still for 4.15. Thus, include
    these bits to reduce any merge churn and reuse the static_flags field to
    set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
    there is no other user left. Prashant Bhole ran into the same issue but
    for net-next, thus adding him below as well as co-author. Same issue was
    also reported by Sandipan Das when using bcc.
    
    Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
    Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html
    Reported-by: Sandipan Das <[email protected]>
    Co-authored-by: Prashant Bhole <[email protected]>
    Co-authored-by: John Fastabend <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Cc: Jiri Pirko <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 70eeff66c4696cee4076d6388b6bede5bd7ff71c
Author: Roland Dreier <[email protected]>
Date:   Mon Jan 15 12:24:49 2018 -0800

    qed: Fix potential use-after-free in qed_spq_post()
    
    We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
    calling qed_spq_add_entry().  The test is fine is the mode is EBLOCK,
    but if it isn't then qed_spq_add_entry() might kfree(p_ent).
    
    Signed-off-by: Roland Dreier <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 0d9c9f0f40ca262b67fc06a702b85f3976f5e1a1
Author: Jakub Kicinski <[email protected]>
Date:   Mon Jan 15 11:47:53 2018 -0800

    nfp: use the correct index for link speed table
    
    sts variable is holding link speed as well as state.  We should
    be using ls to index into ls_to_ethtool.
    
    Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()")
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit a5b1379afbfabf91e3a689e82ac619a7157336b3
Author: Yuiko Oshino <[email protected]>
Date:   Mon Jan 15 13:24:28 2018 -0500

    lan78xx: Fix failure in USB Full Speed
    
    Fix initialize the uninitialized tx_qlen to an appropriate value when USB
    Full Speed is used.
    
    Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
    Signed-off-by: Yuiko Oshino <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit c5006b8aa74599ce19104b31d322d2ea9ff887cc
Author: Xin Long <[email protected]>
Date:   Mon Jan 15 17:02:00 2018 +0800

    sctp: do not allow the v4 socket to bind a v4mapped v6 address
    
    The check in sctp_sockaddr_af is not robust enough to forbid binding a
    v4mapped v6 addr on a v4 socket.
    
    The worse thing is that v4 socket's bind_verify would not convert this
    v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
    socket bound a v6 addr.
    
    This patch is to fix it by doing the common sa.sa_family check first,
    then AF_INET check for v4mapped v6 addrs.
    
    Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
    Reported-by: [email protected]
    Acked-by: Neil Horman <[email protected]>
    Signed-off-by: Xin Long <[email protected]>
    Acked-by: Marcelo Ricardo Leitner <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit a0ff660058b88d12625a783ce9e5c1371c87951f
Author: Xin Long <[email protected]>
Date:   Mon Jan 15 17:01:36 2018 +0800

    sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
    
    After commit cea0cc80a677 ("sctp: use the right sk after waking up from
    wait_buf sleep"), it may change to lock another sk if the asoc has been
    peeled off in sctp_wait_for_sndbuf.
    
    However, the asoc's new sk could be already closed elsewhere, as it's in
    the sendmsg context of the old sk that can't avoid the new sk's closing.
    If the sk's last one refcnt is held by this asoc, later on after putting
    this asoc, the new sk will be freed, while under it's own lock.
    
    This patch is to revert that commit, but fix the old issue by returning
    error under the old sk's lock.
    
    Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
    Reported-by: [email protected]
    Signed-off-by: Xin Long <[email protected]>
    Acked-by: Neil Horman <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 625637bf4afa45204bd87e4218645182a919485a
Author: Xin Long <[email protected]>
Date:   Mon Jan 15 17:01:19 2018 +0800

    sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
    
    After introducing sctp_stream structure, sctp uses stream->outcnt as the
    out stream nums instead of c.sinit_num_ostreams.
    
    However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
    in sctp_sendmsg. At that moment, stream->outcnt is still using previous
    value. If it's value is not updated, the sinit_num_ostreams of sinit could
    not really work.
    
    This patch is to fix it by updating stream->outcnt and reiniting stream
    if stream outcnt has been change by sinit in sendmsg.
    
    Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
    Signed-off-by: Xin Long <[email protected]>
    Acked-by: Neil Horman <[email protected]>
    Acked-by: Marcelo Ricardo Leitner <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 3d1661304f0b2b51a8a43785b764822611dbdd53
Author: Thomas Falcon <[email protected]>
Date:   Wed Jan 10 19:39:52 2018 -0600

    ibmvnic: Fix pending MAC address changes
    
    Due to architecture limitations, the IBM VNIC client driver is unable
    to perform MAC address changes unless the device has "logged in" to
    its backing device. Currently, pending MAC changes are handled before
    login, resulting in an error and failure to change the MAC address.
    Moving that chunk to the end of the ibmvnic_login function, when we are
    sure that it was successful, fixes that.
    
    The MAC address can be changed when the device is up or down, so
    only check if the device is in a "PROBED" state before setting the
    MAC address.
    
    Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
    Signed-off-by: Thomas Falcon <[email protected]>
    Reviewed-by: John Allen <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit c96f5471ce7d2aefd0dda560cc23f08ab00bc65d
Author: Josh Snyder <[email protected]>
Date:   Mon Dec 18 16:15:10 2017 +0000

    delayacct: Account blkio completion on the correct task
    
    Before commit:
    
      e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
    
    delayacct_blkio_end() was called after context-switching into the task which
    completed I/O.
    
    This resulted in double counting: the task would account a delay both waiting
    for I/O and for time spent in the runqueue.
    
    With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
    In ttwu, we have not yet context-switched. This is more correct, in that
    the delay accounting ends when the I/O is complete.
    
    But delayacct_blkio_end() relies on 'get_current()', and we have not yet
    context-switched into the task whose I/O completed. This results in the
    wrong task having its delay accounting statistics updated.
    
    Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
    so that it can update the statistics of the correct task.
    
    Signed-off-by: Josh Snyder <[email protected]>
    Acked-by: Tejun Heo <[email protected]>
    Acked-by: Balbir Singh <[email protected]>
    Cc: <[email protected]>
    Cc: Brendan Gregg <[email protected]>
    Cc: Jens Axboe <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit 107cd2532181b96c549e8f224cdcca8631c3076b
Author: Tom Lendacky <[email protected]>
Date:   Wed Jan 10 13:26:34 2018 -0600

    x86/mm: Encrypt the initrd earlier for BSP microcode update
    
    Currently the BSP microcode update code examines the initrd very early
    in the boot process.  If SME is active, the initrd is treated as being
    encrypted but it has not been encrypted (in place) yet.  Update the
    early boot code that encrypts the kernel to also encrypt the initrd so
    that early BSP microcode updates work.
    
    Tested-by: Gabriel Craciunescu <[email protected]>
    Signed-off-by: Tom Lendacky <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Brijesh Singh <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit cc5f01e28d6c60f274fd1e33b245f679f79f543c
Author: Tom Lendacky <[email protected]>
Date:   Wed Jan 10 13:26:26 2018 -0600

    x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
    
    In preparation for encrypting more than just the kernel, the encryption
    support in sme_encrypt_kernel() needs to support 4KB page aligned
    encryption instead of just 2MB large page aligned encryption.
    
    Update the routines that populate the PGD to support non-2MB aligned
    addresses.  This is done by creating PTE page tables for the start
    and end portion of the address range that fall outside of the 2MB
    alignment.  This results in, at most, two extra pages to hold the
    PTE entries for each mapping of a range.
    
    Tested-by: Gabriel Craciunescu <[email protected]>
    Signed-off-by: Tom Lendacky <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Brijesh Singh <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit 2b5d00b6c2cdd94f6d6a494a6f6c0c0fc7b8e711
Author: Tom Lendacky <[email protected]>
Date:   Wed Jan 10 13:26:16 2018 -0600

    x86/mm: Centralize PMD flags in sme_encrypt_kernel()
    
    In preparation for encrypting more than just the kernel during early
    boot processing, centralize the use of the PMD flag settings based
    on the type of mapping desired.  When 4KB aligned encryption is added,
    this will allow either PTE flags or large page PMD flags to be used
    without requiring the caller to adjust.
    
    Tested-by: Gabriel Craciunescu <[email protected]>
    Signed-off-by: Tom Lendacky <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Brijesh Singh <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit bacf6b499e11760aef73a3bb5ce4e5eea74a3fd4
Author: Tom Lendacky <[email protected]>
Date:   Wed Jan 10 13:26:05 2018 -0600

    x86/mm: Use a struct to reduce parameters for SME PGD mapping
    
    In preparation for follow-on patches, combine the PGD mapping parameters
    into a struct to reduce the number of function arguments and allow for
    direct updating of the next pagetable mapping area pointer.
    
    Tested-by: Gabriel Craciunescu <[email protected]>
    Signed-off-by: Tom Lendacky <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Brijesh Singh <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit 1303880179e67c59e801429b7e5d0f6b21137d99
Author: Tom Lendacky <[email protected]>
Date:   Wed Jan 10 13:25:56 2018 -0600

    x86/mm: Clean up register saving in the __enc_copy() assembly code
    
    Clean up the use of PUSH and POP and when registers are saved in the
    __enc_copy() assembly function in order to improve the readability of the code.
    
    Move parameter register saving into general purpose registers earlier
    in the code and move all the pushes to the beginning of the function
    with corresponding pops at the end.
    
    We do this to prepare fixes.
    
    Tested-by: Gabriel Craciunescu <[email protected]>
    Signed-off-by: Tom Lendacky <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Brijesh Singh <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Signed-off-by: Ingo Molnar <[email protected]>

commit 385d11b152c4eb638eeb769edcb3249533bb9a00
Author: Josh Poimboeuf <[email protected]>
Date:   Mon Jan 15 08:17:08 2018 -0600

    objtool: Improve error message for bad file argument
    
    If a nonexistent file is supplied to objtool, it complains with a
    non-helpful error:
    
      open: No such file or directory
    
    Improve it to:
    
      objtool: Can't open 'foo': No such file or directory
    
    Reported-by: Markus <[email protected]>
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar <[email protected]>

commit 2a0098d70640dda192a79966c14d449e7a34d675
Author: Josh Poimboeuf <[email protected]>
Date:   Mon Jan 15 08:17:07 2018 -0600

    objtool: Fix seg fault with gold linker
    
    Objtool segfaults when the gold linker is used with
    CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
    
    With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
    being passed to objtool.  The gold linker seems to strip unused ELF
    symbols by default, which confuses objtool and causes the seg fault when
    it's trying to generate ORC metadata.
    
    Objtool should really be running immediately after GCC anyway, without a
    linker call in between.  Change the makefile ordering so that objtool is
    called before the linker.
    
    Reported-and-tested-by: Markus <[email protected]>
    Signed-off-by: Josh Poimboeuf <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
    Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar <[email protected]>

commit ae59c3f0b6cfd472fed96e50548a799b8971d876
Author: Leon Romanovsky <[email protected]>
Date:   Fri Jan 12 07:58:39 2018 +0200

    RDMA/mlx5: Fix out-of-bound access while querying AH
    
    The rdma_ah_find_type() accesses the port array based on an index
    controlled by userspace. The existing bounds check is after the first use
    of the index, so userspace can generate an out of bounds access, as shown
    by the KASN report below.
    
    ==================================================================
    BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
    Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
    
    CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
    Call Trace:
     dump_stack+0xe9/0x18f
     print_address_description+0xa2/0x350
     kasan_report+0x3a5/0x400
     to_rdma_ah_attr+0xa8/0x3b0
     mlx5_ib_query_qp+0xd35/0x1330
     ib_query_qp+0x8a/0xb0
     ib_uverbs_query_qp+0x237/0x7f0
     ib_uverbs_write+0x617/0xd80
     __vfs_write+0xf7/0x500
     vfs_write+0x149/0x310
     SyS_write+0xca/0x190
     entry_SYSCALL_64_fastpath+0x18/0x85
    RIP: 0033:0x7fe9c7a275a0
    RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
    RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
    RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
    R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
    R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560
    
    Allocated by task 1:
     __kmalloc+0x3f9/0x430
     alloc_mad_private+0x25/0x50
     ib_mad_post_receive_mads+0x204/0xa60
     ib_mad_init_device+0xa59/0x1020
     ib_register_device+0x83a/0xbc0
     mlx5_ib_add+0x50e/0x5c0
     mlx5_add_device+0x142/0x410
     mlx5_register_interface+0x18f/0x210
     mlx5_ib_init+0x56/0x63
     do_one_initcall+0x15b/0x270
     kernel_init_freeable+0x2d8/0x3d0
     kernel_init+0x14/0x190
     ret_from_fork+0x24/0x30
    
    Freed by task 0:
    (stack is not available)
    
    The buggy address belongs to the object at ffff880019ae2000
     which belongs to the cache kmalloc-512 of size 512
    The buggy address is located 104 bytes to the right of
     512-byte region [ffff880019ae2000, ffff880019ae2200)
    The buggy address belongs to the page:
    page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
    flags: 0x4000000000008100(slab|head)
    raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
    raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
    page dumped because: kasan: bad access detected
    
    Memory state around the buggy address:
     ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
    >ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                              ^
     ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ==================================================================
    Disabling lock debugging due to kernel taint
    
    Cc: <[email protected]>
    Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
    Signed-off-by: Leon Romanovsky <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>

commit 6311b7ce42e0c1d6d944bc099dc47e936c20cf11
Author: Johannes Berg <[email protected]>
Date:   Mon Jan 15 12:42:25 2018 +0100

    netlink: extack: avoid parenthesized string constant warning
    
    NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
    in newer versions of gcc:
      warning: array initialized from parenthesized string constant
    
    Just remove the parentheses, they're not needed in this context since
    anyway since there can be no operator precendence issues or similar.
    
    Signed-off-by: Johannes Berg <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1
Author: Jim Westfall <[email protected]>
Date:   Sun Jan 14 04:18:51 2018 -0800

    ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
    
    Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
    to avoid making an entry for every remote ip the device needs to talk to.
    
    This used the be the old behavior but became broken in a263b3093641f
    (ipv4: Make neigh lookups directly in output packet path) and later removed
    in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
    devices) because it was broken.
    
    Signed-off-by: Jim Westfall <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 096b9854c04df86f03b38a97d40b6506e5730919
Author: Jim Westfall <[email protected]>
Date:   Sun Jan 14 04:18:50 2018 -0800

    net: Allow neigh contructor functions ability to modify the primary_key
    
    Use n->primary_key instead of pkey to account for the possibility that a neigh
    constructor function may have modified the primary_key value.
    
    Signed-off-by: Jim Westfall <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 17d0fb0caa68f2bfd8aaa8125ff15abebfbfa1d7
Author: Sergei Shtylyov <[email protected]>
Date:   Sat Jan 13 20:22:01 2018 +0300

    sh_eth: fix dumping ARSTR
    
    ARSTR  is always located at the start of the TSU register region, thus
    using add_reg()  instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
    causes EDMR or EDSR (depending on the register layout) to be dumped instead
    of ARSTR.  Use the correct condition/macro there...
    
    Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations")
    Signed-off-by: Sergei Shtylyov <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 95a332088ecb113c2e8753fa3f1df9b0dda9beec
Author: William Tu <[email protected]>
Date:   Fri Jan 12 12:29:22 2018 -0800

    Revert "openvswitch: Add erspan tunnel support."
    
    This reverts commit ceaa001a170e43608854d5290a48064f57b565ed.
    
    The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
    as a nested attribute to support all ERSPAN v1 and v2's fields.
    The current attr is a be32 supporting only one field.  Thus, this
    patch reverts it and later patch will redo it using nested attr.
    
    Signed-off-by: William Tu <[email protected]>
    Cc: Jiri Benc <[email protected]>
    Cc: Pravin Shelar <[email protected]>
    Acked-by: Jiri Benc <[email protected]>
    Acked-by: Pravin B Shelar <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 30be8f8dba1bd2aff73e8447d59228471233a3d4
Author: [email protected] <[email protected]>
Date:   Fri Jan 12 15:42:06 2018 +0100

    net/tls: Fix inverted error codes to avoid endless loop
    
    sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
    Socket error codes must be inverted by Kernel TLS before returning because
    they are stored with positive sign. If returned non-inverted they are
    interpreted as number of bytes sent, causing endless looping of the
    splice mechanic behind sendfile().
    
    Signed-off-by: Robert Hering <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 95ef498d977bf44ac094778fd448b98af158a3e6
Author: Eric Dumazet <[email protected]>
Date:   Thu Jan 11 22:31:18 2018 -0800

    ipv6: ip6_make_skb() needs to clear cork.base.dst
    
    In my last patch, I missed fact that cork.base.dst was not initialized
    in ip6_make_skb() :
    
    If ip6_setup_cork() returns an error, we might attempt a dst_release()
    on some random pointer.
    
    Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reported-by: syzbot <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 68e76e034b6b1c1ce2eece1ab8ae4008e14be470
Author: Randy Dunlap <[email protected]>
Date:   Mon Jan 15 11:07:27 2018 -0800

    tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
    
    I regularly get 50 MB - 60 MB files during kernel randconfig builds.
    These large files mostly contain (many repeats of; e.g., 124,594):
    
    In file included from ../include/linux/string.h:6:0,
                     from ../include/linux/uuid.h:20,
                     from ../include/linux/mod_devicetable.h:13,
                     from ../scripts/mod/devicetable-offsets.c:3:
    ../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
        ______f = {     \
        ^
    ../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                           ^
    ../include/linux/string.h:425:2: note: in expansion of macro 'if'
      if (p_size == (size_t)-1 && q_size == (size_t)-1)
      ^
    
    This only happens when CONFIG_FORTIFY_SOURCE=y and
    CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
    FORTIFY_SOURCE=y.
    
    Link: http://lkml.kernel.org/r/[email protected]
    
    Signed-off-by: Randy Dunlap <[email protected]>
    Signed-off-by: Steven Rostedt (VMware) <[email protected]>

commit 37f47bc90c7481e7959703ad1defc4fc9f5d85e3
Author: Marcelo Ricardo Leitner <[email protected]>
Date:   Thu Jan 11 14:22:06 2018 -0200

    sctp: avoid compiler warning on implicit fallthru
    
    These fall-through are expected.
    
    Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
    Acked-by: Neil Horman <[email protected]>
    Reviewed-by: Xin Long <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 6503a30440962f1e1ccb8868816b4e18201218d4
Author: Lorenzo Colitti <[email protected]>
Date:   Thu Jan 11 18:36:26 2018 +0900

    net: ipv4: Make "ip route get" match iif lo rules again.
    
    Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
    versions of route lookup") broke "ip route get" in the presence
    of rules that specify iif lo.
    
    Host-originated traffic always has iif lo, because
    ip_route_output_key_hash and ip6_route_output_flags set the flow
    iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
    convenient way to select only originated traffic and not forwarded
    traffic.
    
    inet_rtm_getroute used to match these rules correctly because
    even though it sets the flow iif to 0, it called
    ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
    But now that it calls ip_route_output_key_hash_rcu, the ifindex
    will remain 0 and not match the iif lo in the rule. As a result,
    "ip route get" will return ENETUNREACH.
    
    Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
    Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
    Signed-off-by: Lorenzo Colitti <[email protected]>
    Acked-by: David Ahern <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570
Author: David Ahern <[email protected]>
Date:   Wed Jan 10 13:00:39 2018 -0800

    netlink: extack needs to be reset each time through loop
    
    syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
    The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
    the callback and without resetting the extack leaving potentially stale
    data. Initializing each time through avoids the WARN_ON.
    
    Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
    Reported-by: [email protected]
    Signed-off-by: David Ahern <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 59b36613e85fb16ebf9feaf914570879cd5c2a21
Author: Cong Wang <[email protected]>
Date:   Wed Jan 10 12:50:25 2018 -0800

    tipc: fix a memory leak in tipc_nl_node_get_link()
    
    When tipc_node_find_by_name() fails, the nlmsg is not
    freed.
    
    While on it, switch to a goto label to properly
    free it.
    
    Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
    Reported-by: Dmitry Vyukov <[email protected]>
    Cc: Jon Maloy <[email protected]>
    Cc: Ying Xue <[email protected]>
    Signed-off-by: Cong Wang <[email protected]>
    Acked-by: Ying Xue <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 749439bfac6e1a2932c582e2699f91d329658196
Author: Mike Maloney <[email protected]>
Date:   Wed Jan 10 12:45:10 2018 -0500

    ipv6: fix udpv6 sendmsg crash caused by too small MTU
    
    The logic in __ip6_append_data() assumes that the MTU is at least large
    enough for the headers.  A device's MTU may be adjusted after being
    added while sendmsg() is processing data, resulting in
    __ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
    the fragmentation header, the math results in a negative 'maxfraglen',
    which causes problems when refragmenting any previous skb in the
    skb_write_queue, leaving it possibly malformed.
    
    Instead sendmsg returns EINVAL when the mtu is calculated to be less
    than IPV6_MIN_MTU.
    
    Found by syzkaller:
    kernel BUG at ./include/linux/skbuff.h:2064!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
       (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
    RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
    RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
    RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
    RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
    RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
    RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
    R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
    R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
    FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     ip6_finish_skb include/net/ipv6.h:911 [inline]
     udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
     udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
     inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
     sock_sendmsg_nosec net/socket.c:633 [inline]
     sock_sendmsg+0xca/0x110 net/socket.c:643
     SYSC_sendto+0x352/0x5a0 net/socket.c:1750
     SyS_sendto+0x40/0x50 net/socket.c:1718
     entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x4512e9
    RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
    RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
    RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
    R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
    R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
    Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
    RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
    RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570
    
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Mike Maloney <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 6200b430220f3b9207861b16f57916950f4ecd8e
Author: Arnd Bergmann <[email protected]>
Date:   Wed Jan 10 17:30:22 2018 +0100

    net: cs89x0: add MODULE_LICENSE
    
    This driver lacks a MODULE_LICENSE tag, leading to a Kbuild warning:
    
    WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/cirrus/cs89x0.o
    
    This adds license, author, and description according to the
    comment block at the start of the file.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 0171c41835591e9aa2e384b703ef9a6ae367c610
Author: Guillaume Nault <[email protected]>
Date:   Wed Jan 10 16:24:45 2018 +0100

    ppp: unlock all_ppp_mutex before registering device
    
    ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
    needs to lock pn->all_ppp_mutex. Therefore we mustn't call
    register_netdevice() with pn->all_ppp_mutex already locked, or we'd
    deadlock in case register_netdevice() fails and calls .ndo_uninit().
    
    Fortunately, we can unlock pn->all_ppp_mutex before calling
    register_netdevice(). This lock protects pn->units_idr, which isn't
    used in the device registration process.
    
    However, keeping pn->all_ppp_mutex locked during device registration
    did ensure that no device in transient state would be published in
    pn->units_idr. In practice, unlocking it before calling
    register_netdevice() doesn't change this property: ppp_unit_register()
    is called with 'ppp_mutex' locked and all searches done in
    pn->units_idr hold this lock too.
    
    Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
    Reported-and-tested-by: [email protected]
    Signed-off-by: Guillaume Nault <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 66940f35d5a81d5969bb5543171c70a434fc5110
Author: Michael S. Tsirkin <[email protected]>
Date:   Wed Jan 10 16:03:05 2018 +0200

    ptr_ring: document usage around __ptr_ring_peek
    
    This explains why is the net usage of __ptr_ring_peek
    actually ok without locks.
    
    Signed-off-by: Michael S. Tsirkin <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit d542296a4d0d9f41d0186edcac2baba1b674d02f
Author: Stephen Hemminger <[email protected]>
Date:   Mon Jan 8 08:23:18 2018 -0800

    9p: add missing module license for xen transport
    
    The 9P of Xen module is missing required license and module information.
    See https://bugzilla.kernel.org/show_bug.cgi?id=198109
    
    Reported-by: Alan Bartlett <[email protected]>
    Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver")
    Signed-off-by: Stephen Hemminger <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit a0e3a18f4baf8e3754ac1e56f0ade924d0c0c721
Author: Steven Rostedt (VMware) <[email protected]>
Date:   Mon Jan 15 10:47:09 2018 -0500

    ring-buffer: Bring back context level recursive checks
    
    Commit 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
    simpler") replaced the context level recursion checks with a simple counter.
    This would prevent the ring buffer code from recursively calling itself more
    than the max number of contexts that exist (Normal, softirq, irq, nmi). But
    this change caused a lockup in a specific case, which was during suspend and
    resume using a global clock. Adding a stack dump to see where this occurred,
    the issue was in the trace global clock itself:
    
      trace_buffer_lock_reserve+0x1c/0x50
      __trace_graph_entry+0x2d/0x90
      trace_graph_entry+0xe8/0x200
      prepare_ftrace_return+0x69/0xc0
      ftrace_graph_caller+0x78/0xa8
      queued_spin_lock_slowpath+0x5/0x1d0
      trace_clock_global+0xb0/0xc0
      ring_buffer_lock_reserve+0xf9/0x390
    
    The function graph tracer traced queued_spin_lock_slowpath that was called
    by trace_clock_global. This pointed out that the trace_clock_global() is not
    reentrant, as it takes a spin lock. It depended on the ring buffer recursive
    lock from letting that happen.
    
    By removing the context detection and adding just a max number of allowable
    recursions, it allowed the trace_clock_global() to be entered again and try
    to retake the spinlock it already held, causing a deadlock.
    
    Fixes: 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
    Reported-by: David Weinehall <[email protected]>
    Signed-off-by: Steven Rostedt (VMware) <[email protected]>

commit 499ed50f603b4c9834197b2411ba3bd9aaa624d4
Author: Benoît Thébaudeau <[email protected]>
Date:   Sun Jan 14 19:43:05 2018 +0100

    mmc: sdhci-esdhc-imx: Fix i.MX53 eSDHCv3 clock
    
    Commit 5143c953a786 ("mmc: sdhci-esdhc-imx: Allow all supported
    prescaler values") made it possible to set SYSCTL.SDCLKFS to 0 in SDR
    mode, thus bypassing the SD clock frequency prescaler, in order to be
    able to get higher SD clock frequencies in some contexts. However, that
    commit missed the fact that this value is illegal on the eSDHCv3
    instance of the i.MX53. This seems to be the only exception on i.MX,
    this value being legal even for the eSDHCv2 instances of the i.MX53.
    
    Fix this issue by changing the minimum prescaler value if the i.MX53
    eSDHCv3 is detected. According to the i.MX53 reference manual, if
    DLLCTRL[10] can be set, then the controller is eSDHCv3, else it is
    eSDHCv2.
    
    This commit fixes the following issue, which was preventing the i.MX53
    Loco (IMX53QSB) board from booting Linux 4.15.0-rc5:
    [    1.882668] mmcblk1: error -84 transferring data, sector 2048, nr 8, cmd response 0x900, card status 0xc00
    [    2.002255] mmcblk1: error -84 transferring data, sector 2050, nr 6, cmd response 0x900, card status 0xc00
    [   12.645056] mmc1: Timeout waiting for hardware interrupt.
    [   12.650473] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
    [   12.656921] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001201
    [   12.663366] mmc1: sdhci: Blk size:  0x00000004 | Blk cnt:  0x00000000
    [   12.669813] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
    [   12.676258] mmc1: sdhci: Present:   0x01f8028f | Host ctl: 0x00000013
    [   12.682703] mmc1: sdhci: Power:     0x00000002 | Blk gap:  0x00000000
    [   12.689148] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000003f
    [   12.695594] mmc1: sdhci: Timeout:   0x0000008e | Int stat: 0x00000000
    [   12.702039] mmc1: sdhci: Int enab:  0x107f004b | Sig enab: 0x107f004b
    [   12.708485] mmc1: sdhci: AC12 err:  0x00000000 | Slot int: 0x00001201
    [   12.714930] mmc1: sdhci: Caps:      0x07eb0000 | Caps_1:   0x08100810
    [   12.721375] mmc1: sdhci: Cmd:       0x0000163a | Max curr: 0x00000000
    [   12.727821] mmc1: sdhci: Resp[0]:   0x00000920 | Resp[1]:  0x00000000
    [   12.734265] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
    [   12.740709] mmc1: sdhci: Host ctl2: 0x00000000
    [   12.745157] mmc1: sdhci: ADMA Err:  0x00000001 | ADMA Ptr: 0xc8049200
    [   12.751601] mmc1: sdhci: ============================================
    [   12.758110] print_req_error: I/O error, dev mmcblk1, sector 2050
    [   12.764135] Buffer I/O error on dev mmcblk1p1, logical block 0, lost sync page write
    [   12.775163] EXT4-fs (mmcblk1p1): mounted filesystem without journal. Opts: (null)
    [   12.782746] VFS: Mounted root (ext4 filesystem) on device 179:9.
    [   12.789151] mmcblk1: response CRC error sending SET_BLOCK_COUNT command, card status 0x900
    
    Signed-off-by: Benoît Thébaudeau <[email protected]>
    Reported-by: Wladimir J. van der Laan <[email protected]>
    Tested-by: Wladimir J. van der Laan <[email protected]>
    Fixes: 5143c953a786 ("mmc: sdhci-esdhc-imx: Allow all supported prescaler values")
    Cc: <[email protected]> # v4.13+
    Signed-off-by: Ulf Hansson <[email protected]>

commit 59b179b48ce2a6076448a44531242ac2b3f6cef2
Author: Johannes Berg <[email protected]>
Date:   Mon Jan 15 09:58:27 2018 +0100

    cfg80211: check dev_set_name() return value
    
    syzbot reported a warning from rfkill_alloc(), and after a while
    I think that the reason is that it was doing fault injection and
    the dev_set_name() failed, leaving the name NULL, and we didn't
    check the return value and got to rfkill_alloc() with a NULL name.
    Since we really don't want a NULL name, we ought to check the
    return value.
    
    Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()")
    Reported-by: [email protected]
    Signed-off-by: Johannes Berg <[email protected]>

commit 51a1aaa631c90223888d8beac4d649dc11d2ca55
Author: Johannes Berg <[email protected]>
Date:   Mon Jan 15 09:32:36 2018 +0100

    mac80211_hwsim: validate number of different channels
    
    When creating a new radio on the fly, hwsim allows this
    to be done with an arbitrary number of channels, but
    cfg80211 only supports a limited number of simultaneous
    channels, leading to a warning.
    
    Fix this by validating the number - this requires moving
    the define for the maximum out to a visible header file.
    
    Reported-by: [email protected]
    Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations")
    Signed-off-by: Johannes Berg <[email protected]>

commit b71d856ab536f25eb97c011a351ecddf5518de41
Author: Benjamin Beichler <[email protected]>
Date:   Wed Jan 10 17:42:51 2018 +0100

    mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unload
    
    When closing multiple wmediumd instances with many radios and try to
    unload the  mac80211_hwsim module, it may happen that the work items live
    longer than the module. To wait especially for this deletion work items,
    add a work queue, otherwise flush_scheduled_work would be necessary.
    
    Signed-off-by: Benjamin Beichler <[email protected]>
    Signed-off-by: Johannes Berg <[email protected]>

commit 7a94b8c2eee7083ddccd0515830f8c81a8e44b1a
Author: Dominik Brodowski <[email protected]>
Date:   Mon Jan 15 08:12:15 2018 +0100

    nl80211: take RCU read lock when calling ieee80211_bss_get_ie()
    
    As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both
    the call to this function and any operation on this variable need
    protection by the RCU read lock.
    
    Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces")
    Signed-off-by: Dominik Brodowski <[email protected]>
    Signed-off-by: Johannes Berg <[email protected]>

commit a48a52b7bea81c046fe1c1288f84d0eba214cba0
Author: Johannes Berg <[email protected]>
Date:   Mon Jan 15 09:12:05 2018 +0100

    cfg80211: fully initialize old channel for event
    
    Paul reported that he got a report about undefined behaviour
    that seems to me to originate in using uninitialized memory
    when the channel structure here is used in the event code in
    nl80211 later.
    
    He never reported whether this fixed it, and I wasn't able
    to trigger this so far, but we should do the right thing and
    fully initialize the on-stack structure anyway.
    
    Reported-by: Paul Menzel <[email protected]>
    Signed-off-by: Johannes Berg <[email protected]>

commit 28d437d550e1e39f805d99f9f8ac399c778827b7
Author: Tom Lendacky <[email protected]>
Date:   Sat Jan 13 17:27:30 2018 -0600

    x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    
    The PAUSE instruction is currently used in the retpoline and RSB filling
    macros as a speculation trap.  The use of PAUSE was originally suggested
    because it showed a very, very small difference in the amount of
    cycles/time used to execute the retpoline as compared to LFENCE.  On AMD,
    the PAUSE instruction is not a serializing instruction, so the pause/jmp
    loop will use excess power as it is speculated over waiting for return
    to mispredict to the correct target.
    
    The RSB filling macro is applicable to AMD, and, if software is unable to
    verify that LFENCE is serializing on AMD (possible when running under a
    hypervisor), the generic retpoline support will be used and, so, is also
    applicable to AMD.  Keep the current usage of PAUSE for Intel, but add an
    LFENCE instruction to the speculation trap for AMD.
    
    The same sequence has been adopted by GCC for the GCC generated retpolines.
    
    Signed-off-by: Tom Lendacky <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Borislav Petkov <[email protected]>
    Acked-by: David Woodhouse <[email protected]>
    Acked-by: Arjan van de Ven <[email protected]>
    Cc: Rik van Riel <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Paul Turner <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Jiri Kosina <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Josh Poimboeuf <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Kees Cook <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit c995efd5a740d9cbafbf58bde4973e8b50b4d761
Author: David Woodhouse <[email protected]>
Date:   Fri Jan 12 17:49:25 2018 +0000

    x86/retpoline: Fill RSB on context switch for affected CPUs
    
    On context switch from a shallow call stack to a deeper one, as the CPU
    does 'ret' up the deeper side it may encounter RSB entries (predictions for
    where the 'ret' goes to) which were populated in userspace.
    
    This is problematic if neither SMEP nor KPTI (the latter of which marks
    userspace pages as NX for the kernel) are active, as malicious code in
    userspace may then be executed speculatively.
    
    Overwrite the CPU's return prediction stack with calls which are predicted
    to return to an infinite loop, to "capture" speculation if this
    happens. This is required both for retpoline, and also in conjunction with
    IBRS for !SMEP && !KPTI.
    
    On Skylake+ the problem is slightly different, and an *underflow* of the
    RSB may cause errant branch predictions to occur. So there it's not so much
    overwrite, as *filling* the RSB to attempt to prevent it getting
    empty. This is only a partial solution for Skylake+ since there are many
    other conditions which may result in the RSB becoming empty. The full
    solution on Skylake+ is to use IBRS, which will prevent the problem even
    when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
    required on context switch.
    
    [ tglx: Added missing vendor check and slighty massaged comments and
            changelog ]
    
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Acked-by: Arjan van de Ven <[email protected]>
    Cc: [email protected]
    Cc: Rik van Riel <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Josh Poimboeuf <[email protected]>
    Cc: [email protected]
    Cc: Peter Zijlstra <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Jiri Kosina <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Paul Turner <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 0d39e…
  • Loading branch information
akpm00 authored and hnaz committed Jan 19, 2018
1 parent a8750dd commit 9a3a44c
Show file tree
Hide file tree
Showing 104 changed files with 1,216 additions and 636 deletions.
225 changes: 111 additions & 114 deletions arch/arm/net/bpf_jit_32.c

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions arch/x86/entry/entry_32.S
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,17 @@ ENTRY(__switch_to_asm)
movl %ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
#endif

#ifdef CONFIG_RETPOLINE
/*
* When switching from a shallower to a deeper call stack
* the RSB may either underflow or use entries populated
* with userspace addresses. On CPUs where those concerns
* exist, overwrite the RSB with entries which capture
* speculative execution to prevent attack.
*/
FILL_RETURN_BUFFER %ebx, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
#endif

/* restore callee-saved registers */
popl %esi
popl %edi
Expand Down
11 changes: 11 additions & 0 deletions arch/x86/entry/entry_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,17 @@ ENTRY(__switch_to_asm)
movq %rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
#endif

#ifdef CONFIG_RETPOLINE
/*
* When switching from a shallower to a deeper call stack
* the RSB may either underflow or use entries populated
* with userspace addresses. On CPUs where those concerns
* exist, overwrite the RSB with entries which capture
* speculative execution to prevent attack.
*/
FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW
#endif

/* restore callee-saved registers */
popq %r15
popq %r14
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/events/intel/rapl.c
Original file line number Diff line number Diff line change
Expand Up @@ -755,14 +755,14 @@ static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
X86_RAPL_MODEL_MATCH(INTEL_FAM6_IVYBRIDGE_X, snbep_rapl_init),

X86_RAPL_MODEL_MATCH(INTEL_FAM6_HASWELL_CORE, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_HASWELL_X, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_HASWELL_X, hsx_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_HASWELL_ULT, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_HASWELL_GT3E, hsw_rapl_init),

X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_CORE, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_GT3E, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_X, hsx_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_XEON_D, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_XEON_D, hsx_rapl_init),

X86_RAPL_MODEL_MATCH(INTEL_FAM6_XEON_PHI_KNL, knl_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_XEON_PHI_KNM, knl_rapl_init),
Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/asm/apic.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ extern void disconnect_bsp_APIC(int virt_wire_setup);
extern void disable_local_APIC(void);
extern void lapic_shutdown(void);
extern void sync_Arb_IDs(void);
extern void init_bsp_APIC(void);
extern void apic_intr_mode_init(void);
extern void setup_local_APIC(void);
extern void init_apic_mappings(void);
Expand Down
3 changes: 2 additions & 1 deletion arch/x86/include/asm/cpufeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -206,11 +206,11 @@
#define X86_FEATURE_RETPOLINE ( 7*32+12) /* Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */

#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
Expand Down Expand Up @@ -245,6 +245,7 @@
#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */
#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
#define X86_FEATURE_INTEL_PT ( 9*32+25) /* Intel Processor Trace */
#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/include/asm/mem_encrypt.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);

void __init sme_early_init(void);

void __init sme_encrypt_kernel(void);
void __init sme_encrypt_kernel(struct boot_params *bp);
void __init sme_enable(struct boot_params *bp);

int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
Expand Down Expand Up @@ -67,7 +67,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }

static inline void __init sme_early_init(void) { }

static inline void __init sme_encrypt_kernel(void) { }
static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
static inline void __init sme_enable(struct boot_params *bp) { }

static inline bool sme_active(void) { return false; }
Expand Down
6 changes: 5 additions & 1 deletion arch/x86/include/asm/nospec-branch.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
* Fill the CPU return stack buffer.
*
* Each entry in the RSB, if used for a speculative 'ret', contains an
* infinite 'pause; jmp' loop to capture speculative execution.
* infinite 'pause; lfence; jmp' loop to capture speculative execution.
*
* This is required in various cases for retpoline and IBRS-based
* mitigations for the Spectre variant 2 vulnerability. Sometimes to
Expand All @@ -38,11 +38,13 @@
call 772f; \
773: /* speculation trap */ \
pause; \
lfence; \
jmp 773b; \
772: \
call 774f; \
775: /* speculation trap */ \
pause; \
lfence; \
jmp 775b; \
774: \
dec reg; \
Expand Down Expand Up @@ -73,6 +75,7 @@
call .Ldo_rop_\@
.Lspec_trap_\@:
pause
lfence
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov \reg, (%_ASM_SP)
Expand Down Expand Up @@ -165,6 +168,7 @@
" .align 16\n" \
"901: call 903f;\n" \
"902: pause;\n" \
" lfence;\n" \
" jmp 902b;\n" \
" .align 16\n" \
"903: addl $4, %%esp;\n" \
Expand Down
49 changes: 49 additions & 0 deletions arch/x86/kernel/apic/apic.c
Original file line number Diff line number Diff line change
Expand Up @@ -1286,6 +1286,55 @@ static int __init apic_intr_mode_select(void)
return APIC_SYMMETRIC_IO;
}

/*
* An initial setup of the virtual wire mode.
*/
void __init init_bsp_APIC(void)
{
unsigned int value;

/*
* Don't do the setup now if we have a SMP BIOS as the
* through-I/O-APIC virtual wire mode might be active.
*/
if (smp_found_config || !boot_cpu_has(X86_FEATURE_APIC))
return;

/*
* Do not trust the local APIC being empty at bootup.
*/
clear_local_APIC();

/*
* Enable APIC.
*/
value = apic_read(APIC_SPIV);
value &= ~APIC_VECTOR_MASK;
value |= APIC_SPIV_APIC_ENABLED;

#ifdef CONFIG_X86_32
/* This bit is reserved on P4/Xeon and should be cleared */
if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
(boot_cpu_data.x86 == 15))
value &= ~APIC_SPIV_FOCUS_DISABLED;
else
#endif
value |= APIC_SPIV_FOCUS_DISABLED;
value |= SPURIOUS_APIC_VECTOR;
apic_write(APIC_SPIV, value);

/*
* Set up the virtual wire mode.
*/
apic_write(APIC_LVT0, APIC_DM_EXTINT);
value = APIC_DM_NMI;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
if (apic_extnmi == APIC_EXTNMI_NONE)
value |= APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
}

/* Init the interrupt delivery mode for the BSP */
void __init apic_intr_mode_init(void)
{
Expand Down
7 changes: 5 additions & 2 deletions arch/x86/kernel/apic/vector.c
Original file line number Diff line number Diff line change
Expand Up @@ -542,14 +542,17 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,

err = assign_irq_vector_policy(irqd, info);
trace_vector_setup(virq + i, false, err);
if (err)
if (err) {
irqd->chip_data = NULL;
free_apic_chip_data(apicd);
goto error;
}
}

return 0;

error:
x86_vector_free_irqs(domain, virq, i + 1);
x86_vector_free_irqs(domain, virq, i);
return err;
}

Expand Down
36 changes: 36 additions & 0 deletions arch/x86/kernel/cpu/bugs.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <asm/alternative.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
#include <asm/intel-family.h>

static void __init spectre_v2_select_mitigation(void);

Expand Down Expand Up @@ -155,6 +156,23 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
return SPECTRE_V2_CMD_NONE;
}

/* Check for Skylake-like CPUs (for RSB handling) */
static bool __init is_skylake_era(void)
{
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
boot_cpu_data.x86 == 6) {
switch (boot_cpu_data.x86_model) {
case INTEL_FAM6_SKYLAKE_MOBILE:
case INTEL_FAM6_SKYLAKE_DESKTOP:
case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_KABYLAKE_MOBILE:
case INTEL_FAM6_KABYLAKE_DESKTOP:
return true;
}
}
return false;
}

static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
Expand Down Expand Up @@ -213,6 +231,24 @@ static void __init spectre_v2_select_mitigation(void)

spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);

/*
* If neither SMEP or KPTI are available, there is a risk of
* hitting userspace addresses in the RSB after a context switch
* from a shallow call stack to a deeper one. To prevent this fill
* the entire RSB, even when using IBRS.
*
* Skylake era CPUs have a separate issue with *underflow* of the
* RSB, when they will predict 'ret' targets from the generic BTB.
* The proper mitigation for this is IBRS. If IBRS is not supported
* or deactivated in favour of retpolines the RSB fill on context
* switch is required.
*/
if ((!boot_cpu_has(X86_FEATURE_PTI) &&
!boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
pr_info("Filling RSB on context switch\n");
}
}

#undef pr_fmt
Expand Down
8 changes: 4 additions & 4 deletions arch/x86/kernel/cpu/intel_rdt.c
Original file line number Diff line number Diff line change
Expand Up @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (static_branch_unlikely(&rdt_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);
kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
list_del(&d->list);
if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
Expand All @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
cancel_delayed_work(&d->cqm_limbo);
}

kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
kfree(d);
return;
}
Expand Down
1 change: 0 additions & 1 deletion arch/x86/kernel/cpu/scattered.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ struct cpuid_bit {
static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
{ X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
Expand Down
4 changes: 2 additions & 2 deletions arch/x86/kernel/head64.c
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
p = fixup_pointer(&phys_base, physaddr);
*p += load_delta - sme_get_me_mask();

/* Encrypt the kernel (if SME is active) */
sme_encrypt_kernel();
/* Encrypt the kernel and related (if SME is active) */
sme_encrypt_kernel(bp);

/*
* Return the SME encryption mask (if SME is active) to be used as a
Expand Down
12 changes: 6 additions & 6 deletions arch/x86/kernel/idt.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ struct idt_data {
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initdata struct idt_data early_idts[] = {
static const __initconst struct idt_data early_idts[] = {
INTG(X86_TRAP_DB, debug),
SYSG(X86_TRAP_BP, int3),
#ifdef CONFIG_X86_32
Expand All @@ -70,7 +70,7 @@ static const __initdata struct idt_data early_idts[] = {
* the traps which use them are reinitialized with IST after cpu_init() has
* set up TSS.
*/
static const __initdata struct idt_data def_idts[] = {
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, divide_error),
INTG(X86_TRAP_NMI, nmi),
INTG(X86_TRAP_BR, bounds),
Expand Down Expand Up @@ -108,7 +108,7 @@ static const __initdata struct idt_data def_idts[] = {
/*
* The APIC and SMP idt entries
*/
static const __initdata struct idt_data apic_idts[] = {
static const __initconst struct idt_data apic_idts[] = {
#ifdef CONFIG_SMP
INTG(RESCHEDULE_VECTOR, reschedule_interrupt),
INTG(CALL_FUNCTION_VECTOR, call_function_interrupt),
Expand Down Expand Up @@ -150,15 +150,15 @@ static const __initdata struct idt_data apic_idts[] = {
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initdata struct idt_data early_pf_idts[] = {
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, page_fault),
};

/*
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
static const __initdata struct idt_data dbg_idts[] = {
static const __initconst struct idt_data dbg_idts[] = {
INTG(X86_TRAP_DB, debug),
INTG(X86_TRAP_BP, int3),
};
Expand All @@ -180,7 +180,7 @@ gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
static const __initdata struct idt_data ist_idts[] = {
static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, debug, DEBUG_STACK),
ISTG(X86_TRAP_NMI, nmi, NMI_STACK),
SISTG(X86_TRAP_BP, int3, DEBUG_STACK),
Expand Down
3 changes: 3 additions & 0 deletions arch/x86/kernel/irqinit.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ void __init init_ISA_irqs(void)
struct irq_chip *chip = legacy_pic->chip;
int i;

#if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC)
init_bsp_APIC();
#endif
legacy_pic->init(0);

for (i = 0; i < nr_legacy_irqs(); i++)
Expand Down
10 changes: 0 additions & 10 deletions arch/x86/kernel/setup.c
Original file line number Diff line number Diff line change
Expand Up @@ -364,16 +364,6 @@ static void __init reserve_initrd(void)
!ramdisk_image || !ramdisk_size)
return; /* No initrd provided by bootloader */

/*
* If SME is active, this memory will be marked encrypted by the
* kernel when it is accessed (including relocation). However, the
* ramdisk image was loaded decrypted by the bootloader, so make
* sure that it is encrypted before accessing it. For SEV the
* ramdisk will already be encrypted, so only do this for SME.
*/
if (sme_active())
sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);

initrd_start = 0;

mapped_size = memblock_mem_size(max_pfn_mapped);
Expand Down
Loading

0 comments on commit 9a3a44c

Please sign in to comment.