Add quirks for Sabrent Rocket Q on darp7, galp5, lemp9, and lemp10 #35

jacobgkau · 2021-02-01T04:57:36Z

This gets the Sabrent Rocket Q 8TB drive working in the products where it was not previously working (which would quadruple the maximum storage we're able to offer in these products.) I think it would be a good idea to confirm if the 4TB drive also needs the same fixes, so it can be added to the same section.

NVME_QUIRK_NO_DEEPEST_PS is needed to prevent random failures while using the drive (as it enters/exits PS4). NVME_QUIRK_SIMPLE_SUSPEND is needed to prevent failure when resuming from suspend.

If we merge this here, then I'll also try to submit it upstream.

BugLink: https://bugs.launchpad.net/bugs/1912027 commit b5e438e upstream. Not resetting the SMT siblings might leave them in unpredictable state. One of the observed problems was that the CPU timer wasn't reset and therefore large system time values where accounted during CPU bringup. Cc: <[email protected]> # 4.0 Fixes: 10ad34b ("s390: add SMT support") Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 613775d upstream. diag308 subcode 0 performes a clear reset which inlcudes the reset of all registers in the system. While this is the preferred behavior when loading a normal kernel via kexec it prevents the crash kernel to store the register values in the dump. To prevent this use subcode 1 when loading a crash kernel instead. Fixes: ee337f5 ("s390/kexec_file: Add crash support to image loader") Cc: <[email protected]> # 4.17 Signed-off-by: Philipp Rudo <[email protected]> Reported-by: Xiaoying Yan <[email protected]> Tested-by: Lianbo Jiang <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit e259b3f upstream. During removal of the critical section cleanup the calculation of mt_cycles during idle was removed. This causes invalid accounting on systems with SMT enabled. Fixes: 0b0ed65 ("s390: remove critical section cleanup from entry.S") Cc: <[email protected]> # 5.8 Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 454efcf upstream. When a machine check interrupt is triggered during idle, the code is using the async timer/clock for idle time calculation. It should use the machine check enter timer/clock which is passed to the macro. Fixes: 0b0ed65 ("s390: remove critical section cleanup from entry.S") Cc: <[email protected]> # 5.8 Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 658a337 upstream. For an LCU update a read unit address configuration IO is required. This is started using sleep_on(), which has early exit paths in case the device is not usable for IO. For example when it is in offline processing. In those cases the LCU update should fail and not be retried. Therefore lcu_update_work checks if EOPNOTSUPP is returned or not. Commit 4199534 ("s390/dasd: fix endless loop after read unit address configuration") accidentally removed the EOPNOTSUPP return code from read_unit_address_configuration(), which in turn might lead to an endless loop of the LCU update in offline processing. Fix by returning EOPNOTSUPP again if the device is not able to perform the request. Fixes: 4199534 ("s390/dasd: fix endless loop after read unit address configuration") Cc: [email protected] #5.3 Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit a29ea01 upstream. Prevent _lcu_update from adding a device to a pavgroup if the LCU still requires an update. The data is not reliable any longer and in parallel devices might have been moved on the lists already. This might lead to list corruptions or invalid PAV grouping. Only add devices to a pavgroup if the LCU is up to date. Additional steps are taken by the scheduled lcu update. Fixes: 8e09f21 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: [email protected] Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 0ede91f upstream. dasd_alias_add_device() moves devices to the active_devices list in case of a scheduled LCU update regardless if they have previously been in a pavgroup or not. Example: device A and B are in the same pavgroup. Device A has already been in a pavgroup and the private->pavgroup pointer is set and points to a valid pavgroup. While going through dasd_add_device it is moved from the pavgroup to the active_devices list. In parallel device B might be removed from the same pavgroup in remove_device_from_lcu() which in turn checks if the group is empty and deletes it accordingly because device A has already been removed from there. When now device A enters remove_device_from_lcu() it is tried to remove it from the pavgroup again because the pavgroup pointer is still set and again the empty group will be cleaned up which leads to a list corruption. Fix by setting private->pavgroup to NULL in dasd_add_device. If the device has been the last device on the pavgroup an empty pavgroup remains but this will be cleaned up by the scheduled lcu_update which iterates over all existing pavgroups. Fixes: 8e09f21 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: [email protected] Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 53a7f65 upstream. In dasd_alias_disconnect_device_from_lcu the device is removed from any list on the LCU. Afterwards the LCU is removed from the lcu list if it does not contain devices any longer. The lcu->lock protects the lcu from parallel updates. But to cancel all workers and wait for completion the lcu->lock has to be unlocked. If two devices are removed in parallel and both are removed from the LCU the first device that takes the lcu->lock again will delete the LCU because it is already empty but the second device also tries to free the LCU which leads to a list corruption of the lcu list. Fix by removing the device right before the lcu is checked without unlocking the lcu->lock in between. Fixes: 8e09f21 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: [email protected] Signed-off-by: Stefan Haberland <[email protected]> Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 0f966cb upstream. Add a per-transaction flag to indicate that the buffer must be cleared when the transaction is complete to prevent copies of sensitive data from being preserved in memory. Signed-off-by: Todd Kjos <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 0d024a8 upstream. The cx2072x codec driver defines multiple DAIs with the same stream name "Playback" and "Capture". Although the current code works more or less as is as the secondary streams are never used, it still leads the error message like: debugfs: File 'Playback' in directory 'dapm' already present! debugfs: File 'Capture' in directory 'dapm' already present! Fix it by renaming the secondary streams to unique names. Fixes: a497a43 ("ASoC: Add support for Conexant CX2072X CODEC") Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

…IOS) BugLink: https://bugs.launchpad.net/bugs/1912027 commit 718c406 upstream. Users reported that some Lenovo AMD platforms do not have ACP microphone, but the BIOS advertises it via ACPI. This patch create a simple DMI table, where those machines with the broken BIOS can be added. The DMI description for Lenovo IdeaPad 5 and IdeaPad Flex 5 devices are added there. Also describe the dmic_acpi_check kernel module parameter in a more understandable way. Cc: <[email protected]> Cc: Vijendar Mukunda <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 55d8e6a upstream. The Raven and Renoir ACP can be distinguished by the PCI revision. Let's do the check very early, otherwise the wrong probe code can be run. Link: https://lore.kernel.org/alsa-devel/[email protected]/ Cc: <[email protected]> Cc: Vijendar Mukunda <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 56c9045 upstream. I have had reports from two different people that attempts to read the analog input channels of the MF624 board fail with an `ETIMEDOUT` error. After triggering the conversion, the code calls `comedi_timeout()` with `mf6x4_ai_eoc()` as the callback function to check if the conversion is complete. The callback returns 0 if complete or `-EBUSY` if not yet complete. `comedi_timeout()` returns `-ETIMEDOUT` if it has not completed within a timeout period which is propagated as an error to the user application. The existing code considers the conversion to be complete when the EOLC bit is high. However, according to the user manuals for the MF624 and MF634 boards, this test is incorrect because EOLC is an active low signal that goes high when the conversion is triggered, and goes low when the conversion is complete. Fix the problem by inverting the test of the EOLC bit state. Fixes: 04b5650 ("comedi: Humusoft MF634 and MF624 DAQ cards driver") Cc: <[email protected]> # v4.4+ Cc: Rostislav Lisovy <[email protected]> Signed-off-by: Ian Abbott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit fc54886 upstream. Patch series "z3fold: stability / rt fixes". Address z3fold stability issues under stress load, primarily in the reclaim and free aspects. Besides, it fixes the locking problems that were only seen in real-time kernel configuration. This patch (of 3): There used to be two places in the code where slots could be freed, namely when freeing the last allocated handle from the slots and when releasing the z3fold header these slots aree linked to. The logic to decide on whether to free certain slots was complicated and error prone in both functions and it led to failures in RT case. To fix that, make free_handle() the single point of freeing slots. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Tested-by: Mike Galbraith <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit dcf5aed upstream. Use temporary slots in reclaim function to avoid possible race when freeing those. While at it, make sure we check CLAIMED flag under page lock in the reclaim function to make sure we are not racing with z3fold_alloc(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Cc: <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 306e3e9 upstream. The event CYCLE_ACTIVITY.STALLS_MEM_ANY (0x14a3) should be available on all 8 GP counters on ICL, but it's only scheduled on the first four counters due to the current ICL constraint table. Add a line for the CYCLE_ACTIVITY.STALLS_MEM_ANY event in the ICL constraint table. Correct the comments for the CYCLE_ACTIVITY.CYCLES_MEM_ANY event. Fixes: 6017608 ("perf/x86/intel: Add Icelake support") Reported-by: Andi Kleen <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 46b72e1 upstream. According to the event list from icelake_core_v1.09.json, the encoding of the RTM_RETIRED.ABORTED event on Ice Lake should be, "EventCode": "0xc9", "UMask": "0x04", "EventName": "RTM_RETIRED.ABORTED", Correct the wrong encoding. Fixes: 6017608 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

…ace. BugLink: https://bugs.launchpad.net/bugs/1912027 commit aa8e21c upstream. Perf event attritube supports exclude_kernel flag to avoid sampling/profiling in supervisor state (kernel). Based on this event attr flag, Monitor Mode Control Register bit is set to freeze on supervisor state. But sometimes (due to hardware limitation), Sampled Instruction Address Register (SIAR) locks on to kernel address even when freeze on supervisor is set. Patch here adds a check to drop those samples. Cc: [email protected] Signed-off-by: Athira Rajeev <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit e40ad84 upstream. When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is changed later, user space may want to take advantage of this increased guaranteed performance. HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change. The HWP_CAP.MAX is still the maximum achievable performance with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still change as long as it remains less than or equal to HWP_CAP.MAX. When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency attribute shows the most recent guaranteed frequency value. This attribute can be used by user space software to update the scaling min/max limits of the CPU. Currently, the ->setpolicy() callback already uses the latest HWP_CAP values when setting HWP_REQ, but the ->verify() callback will restrict the user settings to the to old guaranteed performance value which prevents user space from making use of the extra CPU capacity theoretically available to it after increasing HWP_CAP.GUARANTEED. To address this, read HWP_CAP in intel_pstate_verify_cpu_policy() to obtain the maximum P-state that can be used and use that to confine the policy max limit instead of using the cached and possibly stale pstate.max_freq value for this purpose. For consistency, update intel_pstate_update_perf_limits() to use the maximum available P-state returned by intel_pstate_get_hwp_max() to compute the maximum frequency instead of using the return value of intel_pstate_get_max_freq() which, again, may be stale. This issue is a side-effect of fixing the scaling frequency limits in commit eacc9c5 ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") which corrected the setting of the reduced scaling frequency values, but caused stale HWP_CAP.GUARANTEED to be used in the case at hand. Fixes: eacc9c5 ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") Reported-by: Srinivas Pandruvada <[email protected]> Tested-by: Srinivas Pandruvada <[email protected]> Cc: 5.8+ <[email protected]> # 5.8+ Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 17858b1 upstream. ecdh_set_secret() casts a void* pointer to a const u64* in order to feed it into ecc_is_key_valid(). This is not generally permitted by the C standard, and leads to actual misalignment faults on ARMv6 cores. In some cases, these are fixed up in software, but this still leads to performance hits that are entirely avoidable. So let's copy the key into the ctx buffer first, which we will do anyway in the common case, and which guarantees correct alignment. Cc: <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit f3456b9 upstream. ARM Cortex-A57 and Cortex-A72 cores running in 32-bit mode are affected by silicon errata #1742098 and #1655431, respectively, where the second instruction of a AES instruction pair may execute twice if an interrupt is taken right after the first instruction consumes an input register of which a single 32-bit lane has been updated the last time it was modified. This is not such a rare occurrence as it may seem: in counter mode, only the least significant 32-bit word is incremented in the absence of a carry, which makes our counter mode implementation susceptible to these errata. So let's shuffle the counter assignments around a bit so that the most recent updates when the AES instruction pair executes are 128-bit wide. [0] ARM-EPM-049219 v23 Cortex-A57 MPCore Software Developers Errata Notice [1] ARM-EPM-012079 v11.0 Cortex-A72 MPCore Software Developers Errata Notice Cc: <[email protected]> # v5.4+ Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit a7b5458 upstream. Don't add platform resources that won't be used. This avoids a recently-added warning from the driver core, that can show up on a multi-platform kernel when !MACH_IS_MAC. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at drivers/base/platform.c:224 platform_get_irq_optional+0x8e/0xce 0 is an invalid IRQ number Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-multi #1 Stack from 004b3f04: 004b3f04 00462c2f 00462c2f 004b3f2 0002e128 004754db 004b6ad4 004b3f4c 0002e19c 004754f7 000000e0 00285ba0 00000009 00000000 004b3f44 ffffffff 004754db 004b3f64 004b3f74 00285ba0 004754f7 000000e0 00000009 004754db 004fdf0c 005269e2 004fdf0c 00000000 004b3f88 00285cae 004b6964 00000000 004fdf0c 004b3fac 0051cc68 004b6964 00000000 004b6964 00000200 00000000 0051cc3e 0023c18a 004b3fc0 0051cd8a 004fdf0c 00000002 0052b43c 004b3fc8 Call Trace: [<0002e128>] __warn+0xa6/0xd6 [<0002e19c>] warn_slowpath_fmt+0x44/0x76 [<00285ba0>] platform_get_irq_optional+0x8e/0xce [<00285ba0>] platform_get_irq_optional+0x8e/0xce [<00285cae>] platform_get_irq+0x12/0x4c [<0051cc68>] pmz_init_port+0x2a/0xa6 [<0051cc3e>] pmz_init_port+0x0/0xa6 [<0023c18a>] strlen+0x0/0x22 [<0051cd8a>] pmz_probe+0x34/0x88 [<0051cde6>] pmz_console_init+0x8/0x28 [<00511776>] console_init+0x1e/0x28 [<0005a3bc>] printk+0x0/0x16 [<0050a8a6>] start_kernel+0x368/0x4ce [<005094f8>] _sinittext+0x4f8/0xc48 random: get_random_bytes called from print_oops_end_marker+0x56/0x80 with crng_init=0 ---[ end trace 392d8e82eed68d6c ]--- Commit a85a6c8 ("driver core: platform: Clarify that IRQ 0 is invalid"), which introduced the WARNING, suggests that testing for irq == 0 is undesirable. Instead of that comparison, just test for resource existence. Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Joshua Thompson <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: [email protected] # v5.8+ Reported-by: Laurent Vivier <[email protected]> Signed-off-by: Finn Thain <[email protected]> Link: https://lore.kernel.org/r/0c0fe1e4f11ccec202d4df09ea7d9d98155d101a.1606001297.git.fthain@telegraphics.com.au Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 83ff51c upstream. Instead of raw access, use readl() to access MMIO registers of memory controller to avoid possible compiler re-ordering. Fixes: d4dc89d ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Cc: <[email protected]> Signed-off-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 706657b upstream. In order to setup its PCI component, the driver needs any node private instance in order to get a reference to the PCI device and hand that into edac_pci_create_generic_ctl(). For convenience, it uses the 0th memory controller descriptor under the assumption that if any, the 0th will be always present. However, this assumption goes wrong when the 0th node doesn't have memory and the driver doesn't initialize an instance for it: EDAC amd64: F17h detected (node 0). ... EDAC amd64: Node 0: No DIMMs detected. But looking up node instances is not really needed - all one needs is the pointer to the proper device which gets discovered during instance init. So stash that pointer into a variable and use it when setting up the EDAC PCI component. Clear that variable when the driver needs to unwind due to some instances failing init to avoid any registration imbalance. Cc: <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 406100f upstream. One of our machines keeled over trying to rebuild the scheduler domains. Mainline produces the same splat: BUG: unable to handle page fault for address: 0000607f820054db CPU: 2 PID: 149 Comm: kworker/1:1 Not tainted 5.10.0-rc1-master+ #6 Workqueue: events cpuset_hotplug_workfn RIP: build_sched_domains Call Trace: partition_sched_domains_locked rebuild_sched_domains_locked cpuset_hotplug_workfn It happens with cgroup2 and exclusive cpusets only. This reproducer triggers it on an 8-cpu vm and works most effectively with no preexisting child cgroups: cd $UNIFIED_ROOT mkdir cg1 echo 4-7 > cg1/cpuset.cpus echo root > cg1/cpuset.cpus.partition # with smt/control reading 'on', echo off > /sys/devices/system/cpu/smt/control RIP maps to sd->shared = *per_cpu_ptr(sdd->sds, sd_id); from sd_init(). sd_id is calculated earlier in the same function: cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu)); sd_id = cpumask_first(sched_domain_span(sd)); tl->mask(cpu), which reads cpu_sibling_map on x86, returns an empty mask and so cpumask_first() returns >= nr_cpu_ids, which leads to the bogus value from per_cpu_ptr() above. The problem is a race between cpuset_hotplug_workfn() and a later offline of CPU N. cpuset_hotplug_workfn() updates the effective masks when N is still online, the offline clears N from cpu_sibling_map, and then the worker uses the stale effective masks that still have N to generate the scheduling domains, leading the worker to read N's empty cpu_sibling_map in sd_init(). rebuild_sched_domains_locked() prevented the race during the cgroup2 cpuset series up until the Fixes commit changed its check. Make the check more robust so that it can detect an offline CPU in any exclusive cpuset's effective mask, not just the top one. Fixes: 0ccea8f ("cpuset: Make generate_sched_domains() work with partition") Signed-off-by: Daniel Jordan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 975323a upstream. The parallel-port restore operations is called when a driver claims the port and is supposed to restore the provided state (e.g. saved when releasing the port). Fixes: b69578d ("USB: usbserial: mos7720: add support for parallel port on moschip 7715") Cc: stable <[email protected]> # 2.6.35 Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 5098e77 upstream. The driver must not call tty_wakeup() while holding its private lock as line disciplines are allowed to call back into write() from write_wakeup(), leading to a deadlock. Also remove the unneeded work struct that was used to defer wakeup in order to work around a possible race in ancient times (see comment about n_tty write_chan() in commit 14b54e3 ("USB: serial: remove changelogs and old todo entries")). Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: [email protected] Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 696c541 upstream. Commit c528fcb ("USB: serial: keyspan_pda: fix receive sanity checks") broke write-unthrottle handling by dropping well-formed unthrottle-interrupt packets which are precisely two bytes long. This could lead to blocked writers not being woken up when buffer space again becomes available. Instead, stop unconditionally printing the third byte which is (presumably) only valid on modem-line changes. Fixes: c528fcb ("USB: serial: keyspan_pda: fix receive sanity checks") Cc: stable <[email protected]> # 4.11 Acked-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit 7353cad upstream. The write() callback can be called in interrupt context (e.g. when used as a console) so interrupts must be disabled while holding the port lock to prevent a possible deadlock. Fixes: e81ee63 ("usb-serial: possible irq lock inversion (PPP vs. usb/serial)") Fixes: 507ca9b ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.") Cc: stable <[email protected]> # 2.6.19 Acked-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1912027 commit c01d2c5 upstream. Make sure to clear the write-busy flag also in case no new data was submitted due to lack of device buffer space so that writing is resumed once space again becomes available. Fixes: 507ca9b ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.") Cc: stable <[email protected]> # 2.6.13 Acked-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>

techaround · 2021-02-12T17:01:38Z

Interesting. I wouldn't necessarily have expected this to improve stability over the boot option (other than suspend), but glad to >>hear it's helping.
There is randomness -- also I woke up to the system in read only mode - so it's not fixed (note that suspend is completely disabled)

I will be getting a 2TB nvme on monday and I would be happy to run your test then.

The 8TB is in Slot 2 right now. I plan to switch that when I get the 2TB since it can use the PCIe 4.0.

(Sabrent 2TB Rocket Q4 NVMe PCIe 4.0 )

techaround · 2021-02-16T01:21:56Z

I have not run your tests yet.

I got the new drive and installed it. I think it is ok, but acts weird with the 8TB is installed.
I had to do this (nvme_core.default_ps_max_latency_us=2000) to even see the 8TB drive.

If I don't, after the disk unencrypts, there is a 30 second delay before it boots and gparted only sees the boot drive.

Do you want me to run your test with nvme_core.default_ps_max_latency_us=2000?

Will running "nvme_core.default_ps_max_latency_us=2000" in the TB2 drive affect the performance? If so, I'll probably end up living without the 8TB drive until this get sorted out.

Though happy to test

jacobgkau · 2021-02-16T03:43:31Z

I had to do this (nvme_core.default_ps_max_latency_us=2000) to even see the 8TB drive.

Are you booting from a new OS installation on the new drive, or the same OS installation on the old drive? If it's a new installation, please add the sabrent-rocket-q kernel version using the same commands as before in my earlier message. Does that get rid of the weirdness without using nvme_core.default_ps_max_latency_us=2000?

Do you want me to run your test with nvme_core.default_ps_max_latency_us=2000?

Yes, while the custom kernel is preferred, I'm also able to reproduce I/O errors with the kernel option, so knowing whether or not you're able to would tell me if we're seeing the same behavior or if my drive is potentially bad (it sounds like we're seeing the same behavior, since you were still seeing some instability with the custom kernel and with the option before.)

Will running "nvme_core.default_ps_max_latency_us=2000" in the TB2 drive affect the performance?

Not performance, just power efficiency. An NVMe drive has various APST states, and each state takes a certain amount of time to enter/exit. What this option does is says: "If it would take more than 2000 microseconds to wake up from this level of power saving, then don't use it." With the Sabrent 4TB and 8TB drives in particular, this disables PS4 (the deepest power-saving state) and leaves only PS1 through PS3 available. (If anything, it's improving performance by not having to spend time entering/exiting PS4, but the ~2ms difference won't be noticeable.)

You can view the list of states enabled with your kernel/boot options using sudo nvme get-feature /dev/nvme# --feature-id=12 -H, and you can see the current power state with sudo nvme get-feature /dev/nvme# --feature-id=2 -H.

If so, I'll probably end up living without the 8TB drive until this get sorted out.

Just to be clear, there is no guarantee this will be sorted out. This PR was an attempt to get the drives working (by disabling standard features that the drive was not handling properly), and while it does improve behavior, it doesn't seem to be enough. This seems like an issue with the drive that may not be fixable without a firmware update for the drive from Sabrent.

techaround · 2021-02-16T18:52:28Z

I had to do this (nvme_core.default_ps_max_latency_us=2000) to even see the 8TB drive.

Are you booting from a new OS installation on the new drive, or the same OS installation on the old drive? If it's a new installation, please add the sabrent-rocket-q kernel version using the same commands as before in my earlier message. Does that get rid of the weirdness without using nvme_core.default_ps_max_latency_us=2000?

I am booting off the 2TB drive.

Do you want me to run your test with nvme_core.default_ps_max_latency_us=2000?

Yes, while the custom kernel is preferred, I'm also able to reproduce I/O errors with the kernel option, so knowing whether or not you're able to would tell me if we're seeing the same behavior or if my drive is potentially bad (it sounds like we're seeing the same behavior, since you were still seeing some instability with the custom kernel and with the option before.)

I will run several tests with different configurations and get back to you.

Will running "nvme_core.default_ps_max_latency_us=2000" in the TB2 drive affect the performance?

Not performance, just power efficiency. An NVMe drive has various APST states, and each state takes a certain amount of time to enter/exit. What this option does is says: "If it would take more than 2000 microseconds to wake up from this level of power saving, then don't use it." With the Sabrent 4TB and 8TB drives in particular, this disables PS4 (the deepest power-saving state) and leaves only PS1 through PS3 available. (If anything, it's improving performance by not having to spend time entering/exiting PS4, but the ~2ms difference won't be noticeable.)

You can view the list of states enabled with your kernel/boot options using sudo nvme get-feature /dev/nvme# --feature-id=12 -H, and you can see the current power state with sudo nvme get-feature /dev/nvme# --feature-id=2 -H.

If so, I'll probably end up living without the 8TB drive until this get sorted out.
Just to be clear, there is no guarantee this will be sorted out. This PR was an attempt to get the drives working (by disabling standard features that the drive was not handling properly), and while it does improve behavior, it doesn't seem to be enough. This seems like an issue with the drive that may not be fixable without a firmware update for the drive from Sabrent.

Too bad I like this computer so much. I can sell the drive I guess -- seem very unfortunate.

techaround · 2021-02-17T00:28:02Z

testing
2tb in slot 2 (boot)
8tb in slot 2

Using the default 20.04 kernel with nvme_core.default_ps_max_latency_us=2000

I was able to boot and see both drives.
I ran your test as indicated -- very quickly the 8tb disk went read only.

Installed the custom kernel and rebooted.
could not see the 8TB at all.
tried booting the custom kernel with nvme_core.default_ps_max_latency_us=2000
still cannot see the 8TB drive

reinstalled the default kernel and rebooted
still cannot see the 8TB drive.

dphurst · 2021-02-17T00:48:41Z

Would Sabrent engineers talk to System76 and work with y'all to get the drives in a robust state? I'd sure like to have 4 or 8 TB in my galp5.
Thanks,
Dow Hurst

jacobgkau · 2021-02-17T16:04:59Z

reinstalled the default kernel and rebooted
still cannot see the 8TB drive.

It doesn't sound like the modified kernel was necessarily responsible for the lack of detection there, if it still wasn't detected after going back to the normal kernel. I've had cases where I need to reseat the SSD after a failure before it will detect again, so you might want to install the modified kernel again, power off, reseat it, and power on again.

techaround · 2021-02-17T19:50:08Z

I was running out of time to return the drive so I went ahead and did that -- good luck though

techaround · 2021-08-26T23:42:10Z

Has there been any progress find any working 4 TB or greater drives? What about non-Sabrent drives? This is very frustrating.

dphurst · 2021-08-26T23:43:50Z

I'll chime in here that I sure would like this supported as well. I have a galp5 that is crying out for more than 2TB. Dow

…

On Thu, Aug 26, 2021 at 7:42 PM Stephane Deuvaert ***@***.***> wrote: Has there been any progress find any working 4 TB or greater drives? What about non-Sabrent drives? This is very frustrating. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHES3MGONJ7R7V534NQOCLLT63GN3ANCNFSM4W4DQHFA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

@devid

commit e4571b8 upstream. [BUG] It's easy to trigger NULL pointer dereference, just by removing a non-existing device id: # mkfs.btrfs -f -m single -d single /dev/test/scratch1 \ /dev/test/scratch2 # mount /dev/test/scratch1 /mnt/btrfs # btrfs device remove 3 /mnt/btrfs Then we have the following kernel NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 649 Comm: btrfs Not tainted 5.14.0-rc3-custom+ #35 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:btrfs_rm_device+0x4de/0x6b0 [btrfs] btrfs_ioctl+0x18bb/0x3190 [btrfs] ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 ? do_user_addr_fault+0x201/0x6a0 ? lock_release+0xd2/0x2d0 ? __x64_sys_ioctl+0x83/0xb0 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae [CAUSE] Commit a27a94c ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly") moves the "missing" device path check into btrfs_rm_device(). But btrfs_rm_device() itself can have case where it only receives @devid, with NULL as @device_path. In that case, calling strcmp() on NULL will trigger the NULL pointer dereference. Before that commit, we handle the "missing" case inside btrfs_find_device_by_devspec(), which will not check @device_path at all if @devid is provided, thus no way to trigger the bug. [FIX] Before calling strcmp(), also make sure @device_path is not NULL. Fixes: a27a94c ("btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly") CC: [email protected] # 5.4+ Reported-by: butt3rflyh4ck <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

jackpot51 · 2021-09-28T21:26:53Z

This is pretty out of date now

jackpot51 · 2021-09-28T21:28:52Z

We can rebase this, and target it against the master_impish kernel, after arm64 builds are finished

…ncurrent video sessions [ Upstream commit 91f2b7d ] In existing implementation, core_clk_setrate() is getting called concurrently in concurrent video sessions. Before the previous call to core_clk_setrate returns, new call to core_clk_setrate is invoked from another video session running concurrently. This results in latest calculated frequency being set (higher/lower) instead of actual frequency required for that video session. It also results in stability crashes mention below. These resources are specific to video core, hence keeping under core lock would ensure that they are estimated for all running video sessions and called once for the video core. Crash logs: [ 1.900089] WARNING: CPU: 4 PID: 1 at drivers/opp/debugfs.c:33 opp_debug_remove_one+0x2c/0x48 [ 1.908493] Modules linked in: [ 1.911524] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.67 #35 f8edb8c30cf2dd6838495dd9ef9be47af7f5f60c [ 1.921036] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP SKU2 platform (DT) [ 1.928673] pstate: 60800009 (nZCv daif -PAN +UAO -TCO BTYPE=--) [ 1.934608] pc : opp_debug_remove_one+0x2c/0x48 [ 1.939080] lr : opp_debug_remove_one+0x2c/0x48 [ 1.943560] sp : ffffffc011d7b7f0 [ 1.946836] pmr_save: 000000e0 [ 1.949854] x29: ffffffc011d7b7f0 x28: ffffffc010733bbc [ 1.955104] x27: ffffffc010733ba8 x26: ffffff8083cedd00 [ 1.960355] x25: 0000000000000001 x24: 0000000000000000 [ 1.965603] x23: ffffff8083cc2878 x22: ffffff8083ceb900 [ 1.970852] x21: ffffff8083ceb910 x20: ffffff8083cc2800 [ 1.976101] x19: ffffff8083ceb900 x18: 00000000ffff0a10 [ 1.981352] x17: ffffff80837a5620 x16: 00000000000000ec [ 1.986601] x15: ffffffc010519ad4 x14: 0000000000000003 [ 1.991849] x13: 0000000000000004 x12: 0000000000000001 [ 1.997100] x11: c0000000ffffdfff x10: 00000000ffffffff [ 2.002348] x9 : d2627c580300dc00 x8 : d2627c580300dc00 [ 2.007596] x7 : 0720072007200720 x6 : ffffff80802ecf00 [ 2.012845] x5 : 0000000000190004 x4 : 0000000000000000 [ 2.018094] x3 : ffffffc011d7b478 x2 : ffffffc011d7b480 [ 2.023343] x1 : 00000000ffffdfff x0 : 0000000000000017 [ 2.028594] Call trace: [ 2.031022] opp_debug_remove_one+0x2c/0x48 [ 2.035160] dev_pm_opp_put+0x94/0xb0 [ 2.038780] _opp_remove_all+0x7c/0xc8 [ 2.042486] _opp_remove_all_static+0x54/0x7c [ 2.046796] dev_pm_opp_remove_table+0x74/0x98 [ 2.051183] devm_pm_opp_of_table_release+0x18/0x24 [ 2.056001] devm_action_release+0x1c/0x28 [ 2.060053] release_nodes+0x23c/0x2b8 [ 2.063760] devres_release_group+0xcc/0xd0 [ 2.067900] component_bind+0xac/0x168 [ 2.071608] component_bind_all+0x98/0x124 [ 2.075664] msm_drm_bind+0x1e8/0x678 [ 2.079287] try_to_bring_up_master+0x60/0x134 [ 2.083674] component_master_add_with_match+0xd8/0x120 [ 2.088834] msm_pdev_probe+0x20c/0x2a0 [ 2.092629] platform_drv_probe+0x9c/0xbc [ 2.096598] really_probe+0x11c/0x46c [ 2.100217] driver_probe_device+0x8c/0xf0 [ 2.104270] device_driver_attach+0x54/0x78 [ 2.108407] __driver_attach+0x48/0x148 [ 2.112201] bus_for_each_dev+0x88/0xd4 [ 2.115998] driver_attach+0x2c/0x38 [ 2.119534] bus_add_driver+0x10c/0x200 [ 2.123330] driver_register+0x6c/0x104 [ 2.127122] __platform_driver_register+0x4c/0x58 [ 2.131767] msm_drm_register+0x6c/0x70 [ 2.135560] do_one_initcall+0x64/0x23c [ 2.139357] do_initcall_level+0xac/0x15c [ 2.143321] do_initcalls+0x5c/0x9c [ 2.146778] do_basic_setup+0x2c/0x38 [ 2.150401] kernel_init_freeable+0xf8/0x15c [ 2.154622] kernel_init+0x1c/0x11c [ 2.158079] ret_from_fork+0x10/0x30 [ 2.161615] ---[ end trace a2cc45a0f784b212 ]--- [ 2.166272] Removing OPP: 300000000 Signed-off-by: Mansur Alisha Shaik <[email protected]> Signed-off-by: Stanimir Varbanov <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit f2703de ] After enabling CONFIG_SCHED_CORE (landed during 5.14 cycle), 2-core 2-thread-per-core interAptiv (CPS-driven) started emitting the following: [ 0.025698] CPU1 revision is: 0001a120 (MIPS interAptiv (multi)) [ 0.048183] ------------[ cut here ]------------ [ 0.048187] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6025 sched_core_cpu_starting+0x198/0x240 [ 0.048220] Modules linked in: [ 0.048233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc3+ #35 b7b319f24073fd9a3c2aa7ad15fb7993eec0b26f [ 0.048247] Stack : 817f0000 00000004 327804c8 810eb050 00000000 00000004 00000000 c314fdd1 [ 0.048278] 830cbd64 819c0000 81800000 817f0000 83070bf4 00000001 830cbd08 00000000 [ 0.048307] 00000000 00000000 815fcbc4 00000000 00000000 00000000 00000000 00000000 [ 0.048334] 00000000 00000000 00000000 00000000 817f0000 00000000 00000000 817f6f34 [ 0.048361] 817f0000 818a3c00 817f0000 00000004 00000000 00000000 4dc33260 0018c933 [ 0.048389] ... [ 0.048396] Call Trace: [ 0.048399] [<8105a7bc>] show_stack+0x3c/0x140 [ 0.048424] [<8131c2a0>] dump_stack_lvl+0x60/0x80 [ 0.048440] [<8108b5c0>] __warn+0xc0/0xf4 [ 0.048454] [<8108b658>] warn_slowpath_fmt+0x64/0x10c [ 0.048467] [<810bd418>] sched_core_cpu_starting+0x198/0x240 [ 0.048483] [<810c6514>] sched_cpu_starting+0x14/0x80 [ 0.048497] [<8108c0f8>] cpuhp_invoke_callback_range+0x78/0x140 [ 0.048510] [<8108d914>] notify_cpu_starting+0x94/0x140 [ 0.048523] [<8106593c>] start_secondary+0xbc/0x280 [ 0.048539] [ 0.048543] ---[ end trace 0000000000000000 ]--- [ 0.048636] Synchronize counters for CPU 1: done. ...for each but CPU 0/boot. Basic debug printks right before the mentioned line say: [ 0.048170] CPU: 1, smt_mask: So smt_mask, which is sibling mask obviously, is empty when entering the function. This is critical, as sched_core_cpu_starting() calculates core-scheduling parameters only once per CPU start, and it's crucial to have all the parameters filled in at that moment (at least it uses cpu_smt_mask() which in fact is `&cpu_sibling_map[cpu]` on MIPS). A bit of debugging led me to that set_cpu_sibling_map() performing the actual map calculation, was being invocated after notify_cpu_start(), and exactly the latter function starts CPU HP callback round (sched_core_cpu_starting() is basically a CPU HP callback). While the flow is same on ARM64 (maps after the notifier, although before calling set_cpu_online()), x86 started calculating sibling maps earlier than starting the CPU HP callbacks in Linux 4.14 (see [0] for the reference). Neither me nor my brief tests couldn't find any potential caveats in calculating the maps right after performing delay calibration, but the WARN splat is now gone. The very same debug prints now yield exactly what I expected from them: [ 0.048433] CPU: 1, smt_mask: 0-1 [0] https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=76ce7cfe35ef Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 2aaa8a1 ] With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that has the link-local address as src and dst IP and will be forwarded to an external IP in the IPv6 Ext Hdr. For example, the script below generates a packet whose src IP is the link-local address and dst is updated to 11::. # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done # python3 >>> from socket import * >>> from scapy.all import * >>> >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456" >>> >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR) >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1) >>> >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW) >>> sk.sendto(bytes(pkt), (DST_ADDR, 0)) For such a packet, we call ip6_route_input() to look up a route for the next destination in these three functions depending on the header type. * ipv6_rthdr_rcv() * ipv6_rpl_srh_rcv() * ipv6_srh_rcv() If no route is found, ip6_null_entry is set to skb, and the following dst_input(skb) calls ip6_pkt_drop(). Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev as the input device is the loopback interface. Then, we have to check if skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref for ip6_null_entry. BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ip6_pkt_drop (net/ipv6/route.c:4513) ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5)) ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483) __netif_receive_skb_one_core (net/core/dev.c:5455) process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895) __napi_poll (net/core/dev.c:6460) net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4231) ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135) rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914) sock_sendmsg (net/socket.c:725 net/socket.c:748) __sys_sendto (net/socket.c:2134) __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f9dc751baea Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003 RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b </TASK> Modules linked in: CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Fixes: 4832c30 ("net: ipv6: put host and anycast routes on device with address") Reported-by: Wang Yufen <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 1834d62 ] We should pass a pointer to global_hook to the get_proto_defrag_hook() instead of its value, since the passed value won't be updated even if the request module was loaded successfully. Log: [ 54.915713] nf_defrag_ipv4 has bad registration [ 54.915779] WARNING: CPU: 3 PID: 6323 at net/netfilter/nf_bpf_link.c:62 get_proto_defrag_hook+0x137/0x160 [ 54.915835] CPU: 3 PID: 6323 Comm: fentry Kdump: loaded Tainted: G E 6.7.0-rc2+ #35 [ 54.915839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 54.915841] RIP: 0010:get_proto_defrag_hook+0x137/0x160 [ 54.915844] Code: 4f 8c e8 2c cf 68 ff 80 3d db 83 9a 01 00 0f 85 74 ff ff ff 48 89 ee 48 c7 c7 8f 12 4f 8c c6 05 c4 83 9a 01 01 e8 09 ee 5f ff <0f> 0b e9 57 ff ff ff 49 8b 3c 24 4c 63 e5 e8 36 28 6c ff 4c 89 e0 [ 54.915849] RSP: 0018:ffffb676003fbdb0 EFLAGS: 00010286 [ 54.915852] RAX: 0000000000000023 RBX: ffff9596503d5600 RCX: ffff95996fce08c8 [ 54.915854] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95996fce08c0 [ 54.915855] RBP: ffffffff8c4f12de R08: 0000000000000000 R09: 00000000fffeffff [ 54.915859] R10: ffffb676003fbc70 R11: ffffffff8d363ae8 R12: 0000000000000000 [ 54.915861] R13: ffffffff8e1f75c0 R14: ffffb676003c9000 R15: 00007ffd15e78ef0 [ 54.915864] FS: 00007fb6e9cab740(0000) GS:ffff95996fcc0000(0000) knlGS:0000000000000000 [ 54.915867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.915868] CR2: 00007ffd15e75c40 CR3: 0000000101e62006 CR4: 0000000000360ef0 [ 54.915870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 54.915871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 54.915873] Call Trace: [ 54.915891] <TASK> [ 54.915894] ? __warn+0x84/0x140 [ 54.915905] ? get_proto_defrag_hook+0x137/0x160 [ 54.915908] ? __report_bug+0xea/0x100 [ 54.915925] ? report_bug+0x2b/0x80 [ 54.915928] ? handle_bug+0x3c/0x70 [ 54.915939] ? exc_invalid_op+0x18/0x70 [ 54.915942] ? asm_exc_invalid_op+0x1a/0x20 [ 54.915948] ? get_proto_defrag_hook+0x137/0x160 [ 54.915950] bpf_nf_link_attach+0x1eb/0x240 [ 54.915953] link_create+0x173/0x290 [ 54.915969] __sys_bpf+0x588/0x8f0 [ 54.915974] __x64_sys_bpf+0x20/0x30 [ 54.915977] do_syscall_64+0x45/0xf0 [ 54.915989] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 54.915998] RIP: 0033:0x7fb6e9daa51d [ 54.916001] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 89 0c 00 f7 d8 64 89 01 48 [ 54.916003] RSP: 002b:00007ffd15e78ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 54.916006] RAX: ffffffffffffffda RBX: 00007ffd15e78fc0 RCX: 00007fb6e9daa51d [ 54.916007] RDX: 0000000000000040 RSI: 00007ffd15e78ef0 RDI: 000000000000001c [ 54.916009] RBP: 000000000000002d R08: 00007fb6e9e73a60 R09: 0000000000000001 [ 54.916010] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006 [ 54.916012] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 [ 54.916014] </TASK> [ 54.916015] ---[ end trace 0000000000000000 ]--- Fixes: 91721c2 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link") Signed-off-by: D. Wythe <[email protected]> Acked-by: Daniel Xu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit fe57575 ] The `cgrp_local_storage` test triggers a kernel panic like: # ./test_progs -t cgrp_local_storage Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. [ 550.930632] CPU 1 Unable to handle kernel paging request at virtual address 0000000000000080, era == ffff80000200be34, ra == ffff80000200be00 [ 550.931781] Oops[#1]: [ 550.931966] CPU: 1 PID: 1303 Comm: test_progs Not tainted 6.7.0-rc2-loong-devel-g2f56bb0d2327 #35 a896aca3f4164f09cc346f89f2e09832e07be5f6 [ 550.932215] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 [ 550.932403] pc ffff80000200be34 ra ffff80000200be00 tp 9000000108350000 sp 9000000108353dc0 [ 550.932545] a0 0000000000000000 a1 0000000000000517 a2 0000000000000118 a3 00007ffffbb15558 [ 550.932682] a4 00007ffffbb15620 a5 90000001004e7700 a6 0000000000000021 a7 0000000000000118 [ 550.932824] t0 ffff80000200bdc0 t1 0000000000000517 t2 0000000000000517 t3 00007ffff1c06ee0 [ 550.932961] t4 0000555578ae04d0 t5 fffffffffffffff8 t6 0000000000000004 t7 0000000000000020 [ 550.933097] t8 0000000000000040 u0 00000000000007b8 s9 9000000108353e00 s0 90000001004e7700 [ 550.933241] s1 9000000004005000 s2 0000000000000001 s3 0000000000000000 s4 0000555555eb2ec8 [ 550.933379] s5 00007ffffbb15bb8 s6 00007ffff1dafd60 s7 000055555663f610 s8 00007ffff1db0050 [ 550.933520] ra: ffff80000200be00 bpf_prog_98f1b9e767be2a84_on_enter+0x40/0x200 [ 550.933911] ERA: ffff80000200be34 bpf_prog_98f1b9e767be2a84_on_enter+0x74/0x200 [ 550.934105] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 550.934596] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 550.934712] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 550.934836] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 550.934976] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 550.935097] BADV: 0000000000000080 [ 550.935181] PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) [ 550.935291] Modules linked in: [ 550.935391] Process test_progs (pid: 1303, threadinfo=000000006c3b1c41, task=0000000061f84a55) [ 550.935643] Stack : 00007ffffbb15bb8 0000555555eb2ec8 0000000000000000 0000000000000001 [ 550.935844] 9000000004005000 ffff80001b864000 00007ffffbb15450 90000000029aa034 [ 550.935990] 0000000000000000 9000000108353ec0 0000000000000118 d07d9dfb09721a09 [ 550.936175] 0000000000000001 0000000000000000 9000000108353ec0 0000000000000118 [ 550.936314] 9000000101d46ad0 900000000290abf0 000055555663f610 0000000000000000 [ 550.936479] 0000000000000003 9000000108353ec0 00007ffffbb15450 90000000029d7288 [ 550.936635] 00007ffff1dafd60 000055555663f610 0000000000000000 0000000000000003 [ 550.936779] 9000000108353ec0 90000000035dd1f0 00007ffff1dafd58 9000000002841c5c [ 550.936939] 0000000000000119 0000555555eea5a8 00007ffff1d78780 00007ffffbb153e0 [ 550.937083] ffffffffffffffda 00007ffffbb15518 0000000000000040 00007ffffbb15558 [ 550.937224] ... [ 550.937299] Call Trace: [ 550.937521] [<ffff80000200be34>] bpf_prog_98f1b9e767be2a84_on_enter+0x74/0x200 [ 550.937910] [<90000000029aa034>] bpf_trace_run2+0x90/0x154 [ 550.938105] [<900000000290abf0>] syscall_trace_enter.isra.0+0x1cc/0x200 [ 550.938224] [<90000000035dd1f0>] do_syscall+0x48/0x94 [ 550.938319] [<9000000002841c5c>] handle_syscall+0xbc/0x158 [ 550.938477] [ 550.938607] Code: 580009ae 50016000 262402e4 <28c20085> 14092084 03a00084 16000024 03240084 00150006 [ 550.938851] [ 550.939021] ---[ end trace 0000000000000000 ]--- Further investigation shows that this panic is triggered by memory load operations: ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); The expression `task->cgroups->dfl_cgrp` involves two memory load. Since the field offset fits in imm12 or imm14, we use ldd or ldptrd instructions. But both instructions have the side effect that it will signed-extended the imm operand. Finally, we got the wrong addresses and panics is inevitable. Use a generic ldxd instruction to avoid this kind of issues. With this change, we have: # ./test_progs -t cgrp_local_storage Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec #48/1 cgrp_local_storage/tp_btf:OK test_attach_cgroup:PASS:skel_open 0 nsec test_attach_cgroup:PASS:prog_attach 0 nsec test_attach_cgroup:PASS:prog_attach 0 nsec libbpf: prog 'update_cookie_tracing': failed to attach: ERROR: strerror_r(-524)=22 test_attach_cgroup:FAIL:prog_attach unexpected error: -524 #48/2 cgrp_local_storage/attach_cgroup:FAIL test_recursion:PASS:skel_open_and_load 0 nsec libbpf: prog 'on_lookup': failed to attach: ERROR: strerror_r(-524)=22 libbpf: prog 'on_lookup': failed to auto-attach: -524 test_recursion:FAIL:skel_attach unexpected error: -524 (errno 524) #48/3 cgrp_local_storage/recursion:FAIL #48/4 cgrp_local_storage/negative:OK #48/5 cgrp_local_storage/cgroup_iter_sleepable:OK test_yes_rcu_lock:PASS:skel_open 0 nsec test_yes_rcu_lock:PASS:skel_load 0 nsec libbpf: prog 'yes_rcu_lock': failed to attach: ERROR: strerror_r(-524)=22 libbpf: prog 'yes_rcu_lock': failed to auto-attach: -524 test_yes_rcu_lock:FAIL:skel_attach unexpected error: -524 (errno 524) #48/6 cgrp_local_storage/yes_rcu_lock:FAIL #48/7 cgrp_local_storage/no_rcu_lock:OK #48 cgrp_local_storage:FAIL All error logs: test_cgrp_local_storage:PASS:join_cgroup /cgrp_local_storage 0 nsec test_attach_cgroup:PASS:skel_open 0 nsec test_attach_cgroup:PASS:prog_attach 0 nsec test_attach_cgroup:PASS:prog_attach 0 nsec libbpf: prog 'update_cookie_tracing': failed to attach: ERROR: strerror_r(-524)=22 test_attach_cgroup:FAIL:prog_attach unexpected error: -524 #48/2 cgrp_local_storage/attach_cgroup:FAIL test_recursion:PASS:skel_open_and_load 0 nsec libbpf: prog 'on_lookup': failed to attach: ERROR: strerror_r(-524)=22 libbpf: prog 'on_lookup': failed to auto-attach: -524 test_recursion:FAIL:skel_attach unexpected error: -524 (errno 524) #48/3 cgrp_local_storage/recursion:FAIL test_yes_rcu_lock:PASS:skel_open 0 nsec test_yes_rcu_lock:PASS:skel_load 0 nsec libbpf: prog 'yes_rcu_lock': failed to attach: ERROR: strerror_r(-524)=22 libbpf: prog 'yes_rcu_lock': failed to auto-attach: -524 test_yes_rcu_lock:FAIL:skel_attach unexpected error: -524 (errno 524) #48/6 cgrp_local_storage/yes_rcu_lock:FAIL #48 cgrp_local_storage:FAIL Summary: 0/4 PASSED, 0 SKIPPED, 1 FAILED No panics any more (The test still failed because lack of BPF trampoline which I am actively working on). Fixes: 5dc6155 ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 5d47ec2 ] The `cls_redirect` test triggers a kernel panic like: # ./test_progs -t cls_redirect Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. [ 30.938489] CPU 3 Unable to handle kernel paging request at virtual address fffffffffd814de0, era == ffff800002009fb8, ra == ffff800002009f9c [ 30.939331] Oops[#1]: [ 30.939513] CPU: 3 PID: 1260 Comm: test_progs Not tainted 6.7.0-rc2-loong-devel-g2f56bb0d2327 #35 a896aca3f4164f09cc346f89f2e09832e07be5f6 [ 30.939732] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 [ 30.939901] pc ffff800002009fb8 ra ffff800002009f9c tp 9000000104da4000 sp 9000000104da7ab0 [ 30.940038] a0 fffffffffd814de0 a1 9000000104da7a68 a2 0000000000000000 a3 9000000104da7c10 [ 30.940183] a4 9000000104da7c14 a5 0000000000000002 a6 0000000000000021 a7 00005555904d7f90 [ 30.940321] t0 0000000000000110 t1 0000000000000000 t2 fffffffffd814de0 t3 0004c4b400000000 [ 30.940456] t4 ffffffffffffffff t5 00000000c3f63600 t6 0000000000000000 t7 0000000000000000 [ 30.940590] t8 000000000006d803 u0 0000000000000020 s9 9000000104da7b10 s0 900000010504c200 [ 30.940727] s1 fffffffffd814de0 s2 900000010504c200 s3 9000000104da7c10 s4 9000000104da7ad0 [ 30.940866] s5 0000000000000000 s6 90000000030e65bc s7 9000000104da7b44 s8 90000000044f6fc0 [ 30.941015] ra: ffff800002009f9c bpf_prog_846803e5ae81417f_cls_redirect+0xa0/0x590 [ 30.941535] ERA: ffff800002009fb8 bpf_prog_846803e5ae81417f_cls_redirect+0xbc/0x590 [ 30.941696] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 30.942224] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 30.942330] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 30.942453] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 30.942612] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 30.942764] BADV: fffffffffd814de0 [ 30.942854] PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) [ 30.942974] Modules linked in: [ 30.943078] Process test_progs (pid: 1260, threadinfo=00000000ce303226, task=000000007d10bb76) [ 30.943306] Stack : 900000010a064000 90000000044f6fc0 9000000104da7b48 0000000000000000 [ 30.943495] 0000000000000000 9000000104da7c14 9000000104da7c10 900000010504c200 [ 30.943626] 0000000000000001 ffff80001b88c000 9000000104da7b70 90000000030e6668 [ 30.943785] 0000000000000000 9000000104da7b58 ffff80001b88c048 9000000003d05000 [ 30.943936] 900000000303ac88 0000000000000000 0000000000000000 9000000104da7b70 [ 30.944091] 0000000000000000 0000000000000001 0000000731eeab00 0000000000000000 [ 30.944245] ffff80001b88c000 0000000000000000 0000000000000000 54b99959429f83b8 [ 30.944402] ffff80001b88c000 90000000044f6fc0 9000000101d70000 ffff80001b88c000 [ 30.944538] 000000000000005a 900000010504c200 900000010a064000 900000010a067000 [ 30.944697] 9000000104da7d88 0000000000000000 9000000003d05000 90000000030e794c [ 30.944852] ... [ 30.944924] Call Trace: [ 30.945120] [<ffff800002009fb8>] bpf_prog_846803e5ae81417f_cls_redirect+0xbc/0x590 [ 30.945650] [<90000000030e6668>] bpf_test_run+0x1ec/0x2f8 [ 30.945958] [<90000000030e794c>] bpf_prog_test_run_skb+0x31c/0x684 [ 30.946065] [<90000000026d4f68>] __sys_bpf+0x678/0x2724 [ 30.946159] [<90000000026d7288>] sys_bpf+0x20/0x2c [ 30.946253] [<90000000032dd224>] do_syscall+0x7c/0x94 [ 30.946343] [<9000000002541c5c>] handle_syscall+0xbc/0x158 [ 30.946492] [ 30.946549] Code: 0015030e 5c0009c0 5001d000 <28c00304> 02c00484 29c00304 00150009 2a42d2e4 0280200d [ 30.946793] [ 30.946971] ---[ end trace 0000000000000000 ]--- [ 32.093225] Kernel panic - not syncing: Fatal exception in interrupt [ 32.093526] Kernel relocated by 0x2320000 [ 32.093630] .text @ 0x9000000002520000 [ 32.093725] .data @ 0x9000000003400000 [ 32.093792] .bss @ 0x9000000004413200 [ 34.971998] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- This is because we signed-extend function return values. When subprog mode is enabled, we have: cls_redirect() -> get_global_metrics() returns pcpu ptr 0xfffffefffc00b480 The pointer returned is later signed-extended to 0xfffffffffc00b480 at `BPF_JMP | BPF_EXIT`. During BPF prog run, this triggers unhandled page fault and a kernel panic. Drop the unnecessary signed-extension on return values like other architectures do. With this change, we have: # ./test_progs -t cls_redirect Can't find bpf_testmod.ko kernel module: -2 WARNING! Selftests relying on bpf_testmod.ko will be skipped. #51/1 cls_redirect/cls_redirect_inlined:OK #51/2 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/3 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/4 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/5 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/6 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/7 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/8 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/9 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/10 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/11 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/12 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/13 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/14 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/15 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51/16 cls_redirect/cls_redirect_subprogs:OK #51/17 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/18 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/19 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/20 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/21 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/22 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/23 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/24 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/25 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/26 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/27 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/28 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/29 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/30 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51/31 cls_redirect/cls_redirect_dynptr:OK #51/32 cls_redirect/IPv4 TCP accept unknown (no hops, flags: SYN):OK #51/33 cls_redirect/IPv6 TCP accept unknown (no hops, flags: SYN):OK #51/34 cls_redirect/IPv4 TCP accept unknown (no hops, flags: ACK):OK #51/35 cls_redirect/IPv6 TCP accept unknown (no hops, flags: ACK):OK #51/36 cls_redirect/IPv4 TCP forward unknown (one hop, flags: ACK):OK #51/37 cls_redirect/IPv6 TCP forward unknown (one hop, flags: ACK):OK #51/38 cls_redirect/IPv4 TCP accept known (one hop, flags: ACK):OK #51/39 cls_redirect/IPv6 TCP accept known (one hop, flags: ACK):OK #51/40 cls_redirect/IPv4 UDP accept unknown (no hops, flags: none):OK #51/41 cls_redirect/IPv6 UDP accept unknown (no hops, flags: none):OK #51/42 cls_redirect/IPv4 UDP forward unknown (one hop, flags: none):OK #51/43 cls_redirect/IPv6 UDP forward unknown (one hop, flags: none):OK #51/44 cls_redirect/IPv4 UDP accept known (one hop, flags: none):OK #51/45 cls_redirect/IPv6 UDP accept known (one hop, flags: none):OK #51 cls_redirect:OK Summary: 1/45 PASSED, 0 SKIPPED, 0 FAILED Fixes: 5dc6155 ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 04a342c upstream. When we are slave role and receives l2cap conn req when encryption has started, we should check the enc key size to avoid KNOB attack or BLUFFS attack. From SIG recommendation, implementations are advised to reject service-level connections on an encrypted baseband link with key strengths below 7 octets. A simple and clear way to achieve this is to place the enc key size check in hci_cc_read_enc_key_size() The btmon log below shows the case that lacks enc key size check. > HCI Event: Connect Request (0x04) plen 10 Address: BB:22:33:44:55:99 (OUI BB-22-33) Class: 0x480104 Major class: Computer (desktop, notebook, PDA, organizers) Minor class: Desktop workstation Capturing (Scanner, Microphone) Telephony (Cordless telephony, Modem, Headset) Link type: ACL (0x01) < HCI Command: Accept Connection Request (0x01|0x0009) plen 7 Address: BB:22:33:44:55:99 (OUI BB-22-33) Role: Peripheral (0x01) > HCI Event: Command Status (0x0f) plen 4 Accept Connection Request (0x01|0x0009) ncmd 2 Status: Success (0x00) > HCI Event: Connect Complete (0x03) plen 11 Status: Success (0x00) Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33) Link type: ACL (0x01) Encryption: Disabled (0x00) ... > HCI Event: Encryption Change (0x08) plen 4 Status: Success (0x00) Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33) Encryption: Enabled with E0 (0x01) < HCI Command: Read Encryption Key Size (0x05|0x0008) plen 2 Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33) > HCI Event: Command Complete (0x0e) plen 7 Read Encryption Key Size (0x05|0x0008) ncmd 2 Status: Success (0x00) Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33) Key size: 6 // We should check the enc key size ... > ACL Data RX: Handle 1 flags 0x02 dlen 12 L2CAP: Connection Request (0x02) ident 3 len 4 PSM: 25 (0x0019) Source CID: 64 < ACL Data TX: Handle 1 flags 0x00 dlen 16 L2CAP: Connection Response (0x03) ident 3 len 8 Destination CID: 64 Source CID: 64 Result: Connection pending (0x0001) Status: Authorization pending (0x0002) > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 1 Address: BB:22:33:44:55:99 (OUI BB-22-33) Count: 1 #35: len 16 (25 Kb/s) Latency: 5 msec (2-7 msec ~4 msec) < ACL Data TX: Handle 1 flags 0x00 dlen 16 L2CAP: Connection Response (0x03) ident 3 len 8 Destination CID: 64 Source CID: 64 Result: Connection successful (0x0000) Status: No further information available (0x0000) Cc: [email protected] Signed-off-by: Alex Lu <[email protected]> Signed-off-by: Max Chou <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

svens-s390 and others added 30 commits January 21, 2021 13:21

jackpot51 force-pushed the master branch from 82fb226 to cf74746 Compare March 29, 2021 16:38

jackpot51 force-pushed the master branch from 97a8d1a to ecb25cd Compare April 22, 2021 15:19

jackpot51 force-pushed the master branch from 383c0a9 to 3abeff8 Compare June 22, 2021 16:36

jackpot51 force-pushed the master branch 2 times, most recently from a887ad7 to e862cf5 Compare August 26, 2021 15:31

jackpot51 force-pushed the master branch from ae2753e to cefa1b4 Compare September 14, 2021 18:18

jackpot51 closed this Sep 28, 2021

jackpot51 deleted the sabrent-rocket-q branch September 28, 2021 21:28

jacobgkau mentioned this pull request Jan 28, 2022

Boot and/or suspend issues with ADATA SSDs system76/firmware-open#278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quirks for Sabrent Rocket Q on darp7, galp5, lemp9, and lemp10 #35

Add quirks for Sabrent Rocket Q on darp7, galp5, lemp9, and lemp10 #35

jacobgkau commented Feb 1, 2021

techaround commented Feb 12, 2021

techaround commented Feb 16, 2021

jacobgkau commented Feb 16, 2021 •

edited

Loading

techaround commented Feb 16, 2021

techaround commented Feb 17, 2021

dphurst commented Feb 17, 2021

jacobgkau commented Feb 17, 2021

techaround commented Feb 17, 2021

techaround commented Aug 26, 2021

dphurst commented Aug 26, 2021 via email

jackpot51 commented Sep 28, 2021

jackpot51 commented Sep 28, 2021

Add quirks for Sabrent Rocket Q on darp7, galp5, lemp9, and lemp10 #35

Add quirks for Sabrent Rocket Q on darp7, galp5, lemp9, and lemp10 #35

Conversation

jacobgkau commented Feb 1, 2021

techaround commented Feb 12, 2021

techaround commented Feb 16, 2021

jacobgkau commented Feb 16, 2021 • edited Loading

techaround commented Feb 16, 2021

techaround commented Feb 17, 2021

dphurst commented Feb 17, 2021

jacobgkau commented Feb 17, 2021

techaround commented Feb 17, 2021

techaround commented Aug 26, 2021

dphurst commented Aug 26, 2021 via email

jackpot51 commented Sep 28, 2021

jackpot51 commented Sep 28, 2021

jacobgkau commented Feb 16, 2021 •

edited

Loading