-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xu4-4.14 kernel memory leak (FCoE VN2VN) #360
Comments
So there seems to be two leak points:
And:
They might be related. The Xu4 has a load of macvlans on vlans and an FCoE on vlan. Since the FCoE is not used on this odroid, I can block the FCoE vlan, and see if the lack of FCoE announcements will stop the vn2vn leak, and the r8152_poll leak. |
And a site note: mdrjr: I do not assume you are going to fix it ;-). I just need a place to document the bug. |
Shutting down the FCoE vlan (so the XU4 FCoE setup is unchanged. The switch just doesn't pass the vn2vn multicasts) for now reveals no new and important leaks. |
I tried it with a PC and it also got a memory leak. That's good, because it means we can fix something. |
Kmemleak after patches from Johannes: |
Leaving a PC with 4.14 (patched) and a steam machine with 4.16 (not patched), results in kmemleak on setup chatter. |
Now I turned off the steam machine, turned on the PC, doing scan almost every minute.
Also the kmemleaks are back |
Resume of the past logs: EDIT: Pasted wrong kernlog and kmemleak, see 2 comments further |
The linux-scsi threads: |
Wrong files, but:
And now the real kern log end memleak |
commit 6d1e6bc upstream. _get_table_maxdiv() tries to access "clk_div_table" array out of bound defined in phy-j721e-wiz.c. Add a sentinel entry to prevent the following global-out-of-bounds error reported by enabling KASAN. [ 9.552392] BUG: KASAN: global-out-of-bounds in _get_maxdiv+0xc0/0x148 [ 9.558948] Read of size 4 at addr ffff8000095b25a4 by task kworker/u4:1/38 [ 9.565926] [ 9.567441] CPU: 1 PID: 38 Comm: kworker/u4:1 Not tainted 5.16.0-116492-gdaadb3bd0e8d-dirty hardkernel#360 [ 9.576242] Hardware name: Texas Instruments J721e EVM (DT) [ 9.581832] Workqueue: events_unbound deferred_probe_work_func [ 9.587708] Call trace: [ 9.590174] dump_backtrace+0x20c/0x218 [ 9.594038] show_stack+0x18/0x68 [ 9.597375] dump_stack_lvl+0x9c/0xd8 [ 9.601062] print_address_description.constprop.0+0x78/0x334 [ 9.606830] kasan_report+0x1f0/0x260 [ 9.610517] __asan_load4+0x9c/0xd8 [ 9.614030] _get_maxdiv+0xc0/0x148 [ 9.617540] divider_determine_rate+0x88/0x488 [ 9.622005] divider_round_rate_parent+0xc8/0x124 [ 9.626729] wiz_clk_div_round_rate+0x54/0x68 [ 9.631113] clk_core_determine_round_nolock+0x124/0x158 [ 9.636448] clk_core_round_rate_nolock+0x68/0x138 [ 9.641260] clk_core_set_rate_nolock+0x268/0x3a8 [ 9.645987] clk_set_rate+0x50/0xa8 [ 9.649499] cdns_sierra_phy_init+0x88/0x248 [ 9.653794] phy_init+0x98/0x108 [ 9.657046] cdns_pcie_enable_phy+0xa0/0x170 [ 9.661340] cdns_pcie_init_phy+0x250/0x2b0 [ 9.665546] j721e_pcie_probe+0x4b8/0x798 [ 9.669579] platform_probe+0x8c/0x108 [ 9.673350] really_probe+0x114/0x630 [ 9.677037] __driver_probe_device+0x18c/0x220 [ 9.681505] driver_probe_device+0xac/0x150 [ 9.685712] __device_attach_driver+0xec/0x170 [ 9.690178] bus_for_each_drv+0xf0/0x158 [ 9.694124] __device_attach+0x184/0x210 [ 9.698070] device_initial_probe+0x14/0x20 [ 9.702277] bus_probe_device+0xec/0x100 [ 9.706223] deferred_probe_work_func+0x124/0x180 [ 9.710951] process_one_work+0x4b0/0xbc0 [ 9.714983] worker_thread+0x74/0x5d0 [ 9.718668] kthread+0x214/0x230 [ 9.721919] ret_from_fork+0x10/0x20 [ 9.725520] [ 9.727032] The buggy address belongs to the variable: [ 9.732183] clk_div_table+0x24/0x440 Fixes: 091876c ("phy: ti: j721e-wiz: Add support for WIZ module present in TI J721E SoC") Cc: [email protected] # v5.10+ Signed-off-by: Kishon Vijay Abraham I <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…ode. [ Upstream commit d5c367e ] creating a large files during checkpoint disable until it runs out of space and then delete it, then remount to enable checkpoint again, and then unmount the filesystem triggers the f2fs_bug_on as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:896! CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360 Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:f2fs_evict_inode+0x58c/0x610 Call Trace: __die_body+0x15/0x60 die+0x33/0x50 do_trap+0x10a/0x120 f2fs_evict_inode+0x58c/0x610 do_error_trap+0x60/0x80 f2fs_evict_inode+0x58c/0x610 exc_invalid_op+0x53/0x60 f2fs_evict_inode+0x58c/0x610 asm_exc_invalid_op+0x16/0x20 f2fs_evict_inode+0x58c/0x610 evict+0x101/0x260 dispose_list+0x30/0x50 evict_inodes+0x140/0x190 generic_shutdown_super+0x2f/0x150 kill_block_super+0x11/0x40 kill_f2fs_super+0x7d/0x140 deactivate_locked_super+0x2a/0x70 cleanup_mnt+0xb3/0x140 task_work_run+0x61/0x90 The root cause is: creating large files during disable checkpoint period results in not enough free segments, so when writing back root inode will failed in f2fs_enable_checkpoint. When umount the file system after enabling checkpoint, the root inode is dirty in f2fs_evict_inode function, which triggers BUG_ON. The steps to reproduce are as follows: dd if=/dev/zero of=f2fs.img bs=1M count=55 mount f2fs.img f2fs_dir -o checkpoint=disable:10% dd if=/dev/zero of=big bs=1M count=50 sync rm big mount -o remount,checkpoint=enable f2fs_dir umount f2fs_dir Let's redirty inode when there is not free segments during checkpoint is disable. Signed-off-by: Qi Han <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Hi mdrjr,
Just a heads up, I've had this leak since 4.14.30 something. I think 4.14.22 was stable, not sure though.
So the problem: kernel was leaking memory:
https://plus.google.com/u/0/+ArdvanBreemen/posts/cpLBpnizyLv
even the last kernel, 4.14.55 seems to be leaking.
So I decided to build with DEBUG_KMEMLEAK, and here is my "surprise"
kmemleak.txt
The text was updated successfully, but these errors were encountered: