[RFC]Suspend/Resume flow for APL #32

ranj063 · 2018-07-13T07:41:00Z

This patchset aims to implement the preliminary flow for suspend/resume for APL.

V2: Now the pipeline status is restored at resume by sending the ipc's to create all the components and set up the pipeline. With this change, the DSP is power off at suspend and when it resumes from suspend, audio playback works normally.

But when suspend is invoked while audio playback is in progress, playback does not resume and aplay quits with the error "pcm_write:2011: write error: Input/output error".

V3: changes made to reflect previous comments

Audio playback resumes normally after suspend/resume.

V4: Added support for runtime PM and made changes based on previous feedback.

Now, both runtime PM and suspend/resume work well.

2 tasks still left to do:

store/restore kcontrol values
free sroute/connect during route_unload.

ranj063 · 2018-07-13T07:49:27Z

Here's the dmesg log from the point when suspend occurs until after the firmware boots up.
[ 1271.092324] PM: hibernation entry
[ 1271.092743] PM: Syncing filesystems ...
[ 1271.111902] PM: done.
[ 1271.111907] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 1271.113553] OOM killer disabled.
[ 1271.113686] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
[ 1271.113688] PM: Marking nosave pages: [mem 0x0003f000-0x0003ffff]
[ 1271.113689] PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
[ 1271.113692] PM: Marking nosave pages: [mem 0x10000000-0x12150fff]
[ 1271.113822] PM: Marking nosave pages: [mem 0x778a0000-0x778a0fff]
[ 1271.113823] PM: Marking nosave pages: [mem 0x77b12000-0x7a08ffff]
[ 1271.113969] PM: Marking nosave pages: [mem 0x7a3fe000-0x7a428fff]
[ 1271.113970] PM: Marking nosave pages: [mem 0x7a965000-0x7a966fff]
[ 1271.113972] PM: Marking nosave pages: [mem 0x7b000000-0xffffffff]
[ 1271.116164] PM: Basic memory bitmaps created
[ 1271.116239] PM: Preallocating image memory... done (allocated 174133 pages)
[ 1271.259697] PM: Allocated 696532 kbytes in 0.14 seconds (4975.22 MB/s)
[ 1271.259698] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 1271.339787] Suspending console(s) (use no_console_suspend to debug)
[ 1271.371521] sof-audio sof-audio: DSP core(s) enabled? 0 : core_mask 3
[ 1271.492289] PM: hibernation debug: Waiting for 5 seconds.
[ 1276.493320] sof-audio sof-audio: loading firmware
[ 1276.493406] sof-audio sof-audio: booting DSP firmware
[ 1276.493555] usb usb1: root hub lost power or was reset
[ 1276.493559] usb usb2: root hub lost power or was reset
[ 1276.493674] sof-audio sof-audio: unstall/run core: core_mask = 1
[ 1276.493678] sof-audio sof-audio: DSP core(s) enabled? 1 : core_mask 1
[ 1276.550947] sof-audio sof-audio: pstream 0 status 0x4
[ 1276.572308] sof-audio sof-audio: ipc: DSP is ready 0x70000000 offset 0x81000
[ 1276.572344] sof-audio sof-audio: Firmware info: version 1.1-bf14b build 32 on Jul 5 2018:22:52:51
[ 1276.572428] sof-audio sof-audio: found ext header type 1 size 0x9c
[ 1276.572451] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572455] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572458] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572462] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572465] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572469] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572472] sof-audio sof-audio: cannot create debugfs entry.
[ 1276.572475] sof-audio sof-audio: mailbox upstream 0x81000 - size 0x1000
[ 1276.572478] sof-audio sof-audio: mailbox downstream 0xa0000 - size 0x2000
[ 1276.572480] sof-audio sof-audio: stream region 0xc1000 - size 0x1000
[ 1276.572482] sof-audio sof-audio: booting DSP firmware completed
[ 1276.572486] sof-audio sof-audio: ipc rx: 0x70000000 done
[ 1276.575709] sof-audio sof-audio: Firmware download successful, booting...
[ 1276.575714] sof-audio sof-audio: firmware boot complete
[ 1276.575926] rtc_cmos 00:02: Alarms can be up to one month in the future
[ 1276.602266] r8169 0000:03:00.0 enp3s0: link down
[ 1276.604724] r8169 0000:02:00.0 enp2s0: link down
[ 1276.814309] ata2: SATA link down (SStatus 4 SControl 300)
[ 1276.818299] ata1: SATA link down (SStatus 4 SControl 300)
[ 1276.848119] usb 1-2: reset high-speed USB device number 2 using xhci_hcd
[ 1277.312140] usb 1-2.3: reset full-speed USB device number 3 using xhci_hcd
[ 1277.562286] PM: Basic memory bitmaps freed
[ 1277.562289] OOM killer enabled.
[ 1277.562290] Restarting tasks ... done.
[ 1277.564141] video LNXVIDEO:00: Restoring backlight state
[ 1277.564144] PM: hibernation exit
[ 1277.658120] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[ 1278.173462] r8169 0000:02:00.0 enp2s0: link up
[ 1278.173500] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[ 1280.138162] pcm512x i2c-104C5122:00: No SCLK, using BCLK: -2
[ 1280.138181] sof-audio sof-audio: pcm: open stream 0 dir 0
[ 1280.138185] sof-audio sof-audio: period min 192 max 16384 bytes
[ 1280.138188] sof-audio sof-audio: period count 2 max 16
[ 1280.138190] sof-audio sof-audio: buffer max 65536 bytes
[ 1280.138540] sof-audio sof-audio: rate_min: 48000 rate_max: 48000
[ 1280.138544] sof-audio sof-audio: channels_min: 2 channels_max: 2
[ 1280.138547] sof-audio sof-audio: rate_min: 48000 rate_max: 48000
[ 1280.138550] sof-audio sof-audio: channels_min: 2 channels_max: 2
[ 1280.138555] sof-audio sof-audio: rate_min: 48000 rate_max: 48000
[ 1280.138557] sof-audio sof-audio: channels_min: 2 channels_max: 2
[ 1280.138562] sof-audio sof-audio: pcm: hw params stream 0 dir 0
[ 1280.138566] sof-audio sof-audio: generating page table for 00000000575ba841 size 0xff00 pages 16
[ 1280.138570] sof-audio sof-audio: pfn i 0 idx 0 pfn 1795a0
[ 1280.138573] sof-audio sof-audio: pfn i 1 idx 2 pfn 1795a1
[ 1280.138575] sof-audio sof-audio: pfn i 2 idx 5 pfn 1795a2
[ 1280.138578] sof-audio sof-audio: pfn i 3 idx 7 pfn 1795a3
[ 1280.138581] sof-audio sof-audio: pfn i 4 idx 10 pfn 1795a4
[ 1280.138583] sof-audio sof-audio: pfn i 5 idx 12 pfn 1795a5
[ 1280.138586] sof-audio sof-audio: pfn i 6 idx 15 pfn 1795a6
[ 1280.138588] sof-audio sof-audio: pfn i 7 idx 17 pfn 1795a7
[ 1280.138591] sof-audio sof-audio: pfn i 8 idx 20 pfn 1795a8
[ 1280.138629] sof-audio sof-audio: pfn i 9 idx 22 pfn 1795a9
[ 1280.138632] sof-audio sof-audio: pfn i 10 idx 25 pfn 1795aa
[ 1280.138635] sof-audio sof-audio: pfn i 11 idx 27 pfn 1795ab
[ 1280.138637] sof-audio sof-audio: pfn i 12 idx 30 pfn 1795ac
[ 1280.138661] sof-audio sof-audio: pfn i 13 idx 32 pfn 1795ad
[ 1280.138664] sof-audio sof-audio: pfn i 14 idx 35 pfn 1795ae
[ 1280.138666] sof-audio sof-audio: pfn i 15 idx 37 pfn 1795af
[ 1280.138686] sof-audio sof-audio: period_bytes:0x3f00
[ 1280.138694] sof-audio sof-audio: stream_tag 1
[ 1280.138725] sof-audio sof-audio: ipc: send 0x60010000
[ 1280.138824] sof-audio sof-audio: error: ipc error for 0x60010000 size 0x14
[ 1280.146584] sof-audio sof-audio: ASoC: sof-audio hw params failed: -19
[ 1280.153897] Passthrough: ASoC: hw_params FE failed -19
[ 1280.153935] sof-audio sof-audio: pcm: free stream 0 dir 0
[ 1280.159819] sof-audio sof-audio: ipc: send 0x60030000
[ 1280.159873] sof-audio sof-audio: error: ipc error for 0x60030000 size 0xc
[ 1280.167679] sof-audio sof-audio: pcm: close stream 0 dir 0

RanderWang · 2018-07-13T09:45:00Z

From kernel log, rmbox should be not work ? so no rmbox message?
for 0x60010000 error, I think it is caused by no topology. So enable rmbox is important for debug.

And I think each internal buffer pointer or component status should be kept ?Or OS would send hw_param and start again?

RanderWang · 2018-07-13T09:55:33Z

What happens to all the components created when the topology is loaded before suspend?
[Rander] Each component initialize its status

Should the topology be reloaded after resume?
[Rander] Each component should restore status. loading topology would restore each components to initialized status, maybe extra restore?

RanderWang · 2018-07-13T09:59:53Z

I am not sure: some memory in DSP is still valid, so it can be used to keep some status

plbossart · 2018-07-13T17:52:51Z

@ranj063 : I don't think the topology should be parsed again on resume, but certainly anything that results in an IPC should be invoked on resume, e.g. SSP settings and pipeline commands (likely why the hw_params fail). I am afraid today we combined token parsing and IPC, maybe not such a good idea in hindsight.

lgirdwood · 2018-07-16T15:30:14Z

@ranj063 we probably dont want to pasre the topology again, but sof/topology.c does track a lot of the topology structures locally when probing. i.e. we have a list of widgets, controls etc. We should store a pointer to the topology raw structure for each topology object and on resume we iterate through all the topology object lists and send IPCs.

ranj063 · 2018-07-16T15:36:44Z

@plbossart @lgirdwood Thanks! Let me re-send the ipc's to restore the pipeline from the topology objects stored in the driver.

ranj063 · 2018-07-17T06:31:31Z

@plbossart @lgirdwood , I have updated the commits in this pull request to restore the pipeline at resume.
With this change, I am able to start playback after resume. But I still fail when I suspend while audio playback is in progress with the error: "pcm_write:2011: write error: Input/output error".

lgirdwood · 2018-07-17T07:39:06Z

@ranj063 ok, half way there now. Have a look at the legacy baytrail driver (sound/soc/intel/baytral/*) as it stores PCM playback state prior to S3 and restores at S0. You probably need to duplicate this flow.

lgirdwood · 2018-07-17T07:41:42Z

sound/soc/sof/pm.c

+	}
+
+	/* restore pipeline */
+	ret = sof_restore_pipeline(sdev);


What about if we have 2 or more pipelines ? If this restores all pipelines then maybe rename func.

lgirdwood · 2018-07-17T07:44:56Z

sound/soc/sof/pm.c

+		switch (swidget->id) {
+		case snd_soc_dapm_dai_in:
+		case snd_soc_dapm_dai_out:
+			dai = (struct snd_sof_dai *)swidget->private;


Just wondering if we need all these case statements given all the IPC data uses a standard header (which includes cmd and size). i.e. we cas the private data to the header type to get size and cmd.

@lgirdwood I started it that way but I kept running into errors with it. The only way I can send the ipc without errors is by having all these case statements.

ok, interesting, what errors ? The only thing the case statements do is cast types

@lgirdwood. I've fixed it now. The error I was seeing was because of not casting the comp objects to (void *) while storing them.

lgirdwood · 2018-07-17T07:50:04Z

sound/soc/sof/intel/hda-dsp.c

+	/* TODO: uncomment send ipc once FW D0->D3 is complete */
+
+#if 0
+	/* send ipc message */


Best to try and send this today as it allows DSP to enter D3 ready state. Any context can be saved later.

lgirdwood · 2018-07-17T07:53:50Z

sound/soc/sof/intel/hda-dsp.c

@@ -239,3 +239,53 @@ int hda_dsp_core_reset_power_down(struct snd_sof_dev *sdev,
 	return ret;
 }

+int hda_dsp_suspend(struct snd_sof_dev *sdev, int state)
+{


I'd expect most of this function to be in pm.c as generic code. The HW specific part hda_dsp_core_reset_power_down() can be an HW abstracted operation (add ops to the operations callback structure). e.g. we could have snd_sof_dsp_suspend(sdev), snd_sof_dsp_resume(sdev)

Agree with Liam, this feels too APL-specific.

@plbossart @lgirdwood I have updated this to use the chip cores_mask to make it generic.

ranj063 · 2018-07-18T08:32:49Z

@lgirdwood I've just pushed the v3 changes. Now I can resume audio playback normally after suspend/resume.

There's an occasional xrun that happens at resume but its been hard to reproduce.

lgirdwood

Mostly minor things, glad its working now :)

lgirdwood · 2018-07-18T11:13:58Z

sound/soc/sof/intel/hda-dsp.c

+{
+	/* power up the DSP */
+	return hda_dsp_core_power_up(sdev, HDA_DSP_CORE_MASK(0) |
+				     HDA_DSP_CORE_MASK(1));


Btw, we need to be able to handle different core masks, e.g. APL has 2, CNL has 4.

@lgirdwood sure. I have tested this to work only for APL as yet. I will make the changes for CNL next.

lgirdwood · 2018-07-18T11:16:22Z

sound/soc/sof/pm.c

+		switch (swidget->id) {
+		case snd_soc_dapm_dai_in:
+		case snd_soc_dapm_dai_out:
+			dai = (struct snd_sof_dai *)swidget->private;


ok, interesting, what errors ? The only thing the case statements do is cast types

lgirdwood · 2018-07-18T11:17:22Z

sound/soc/sof/pm.c

+	 * Suspend running pcm streams.
+	 * They will be restarted by ALSA resume trigger call.
+	 */
+	list_for_each_entry(spcm, &sdev->pcm_list, list) {


I'd make the stream suspend a separate function

lgirdwood · 2018-07-18T11:20:17Z

sound/soc/sof/pcm.c

+		}
+
+		/* trigger stream */
+		stream.hdr.cmd |= SOF_IPC_STREAM_TRIG_START;


Where do we send the IPC start message if we set cmd here ?

@lgirdwood It is sent right outside the switch/case block where we set the cmd.

plbossart

Looks good overall, still some style issues and my usual quixotic chase of memory leaks. One functional part also to align times with the firmware so that on resume the trace logs aren't completely misleading.

plbossart · 2018-07-18T17:02:37Z

sound/soc/sof/topology.c

@@ -1365,8 +1400,11 @@ static int sof_widget_unload(struct snd_soc_component *scomp,

 		/* free volume table */
 		kfree(scontrol->volume_table);


For the volume case, don't you need to free swidget->private?

@plbossart good catch. I'll fix it.

plbossart · 2018-07-18T17:03:37Z

sound/soc/sof/topology.c

+
+	connect = kzalloc(sizeof(*connect), GFP_KERNEL);
+	if (!connect)
+		return -ENOMEM;


need to free sroute first? this doesn't look consistent error management, or do you assume that all frees are done in an sof_route_unload()?

you need a more consistent memory allocation/free. Here you free on one error, but not for all the other cases in this function below (look for all the return -EINVAL cases). You should have a set of gotos and deal with errors in a more organized way, or you deal with all the frees in another cleanup routine called on failure.

oh yes. let me address this. seems like I've missed the route_load case to fix the memory allocation/free.

plbossart · 2018-07-18T17:05:36Z

sound/soc/sof/topology.c

@@ -1892,6 +1903,13 @@ static int sof_route_load(struct snd_soc_component *scomp, int index,
 		}
 	}

+	/* TODO: free sroute/connect when unloading route */


well, duh. Can you please update this one, this will create issues with module load/unload and break CI.

@plbossart sure, I am working on it. It wasnt very straightforward which route is being unloaded from the arguments. I need to study this a bit more to implement it correctly.

@plbossart @lgirdwood it doesnt look like route_unload ever gets called.
I see a comment in soc-topology.c in the remove_widget() function that suggests that routes must
be removed before the widget itself is removed. But I cant find any references to route_unload.

Also, the arguments to the route_unload() method seem ambiguous. There's no dobj member in snd_soc_dapm_route structure. So I'm not sure how to get a handle to it in order to be able to remove it. Can you please help?

You will need to deep dive this more as I'm short of time and may also need to modify the API if needed to pass in the extra info.

Looks like we still have a dependency on a core cleanup?

plbossart · 2018-07-18T17:12:10Z

sound/soc/sof/pm.c

+						 sizeof(r));
+			break;
+		case snd_soc_dapm_aif_out:
+		case snd_soc_dapm_aif_in:


need explicit /* fallthrough */ to make tools happy that this is not a programming error

Thanks for the reminder. I'll add this where appropriate.

plbossart · 2018-07-18T17:12:31Z

sound/soc/sof/pm.c

+			break;
+		case snd_soc_dapm_buffer:
+			buffer = (struct sof_ipc_buffer *)swidget->private;
+			ret = sof_ipc_tx_message(sdev->ipc,


ret is not tested in most cases?

@plbossart ret is tested right outside the switch/case block to make sure none of the ipc's failed.

plbossart · 2018-07-18T17:16:27Z

sound/soc/sof/pcm.c

@@ -159,6 +159,9 @@ static int sof_pcm_hw_params(struct snd_pcm_substream *substream,
 	spcm->posn_offset[substream->stream] =
 		sdev->stream_box.offset + posn_offset;

+	/* save pcm hw_params */


should this be squashed?

plbossart · 2018-07-18T17:19:12Z

sound/soc/sof/pcm.c

+		/* trigger stream */
+		stream.hdr.cmd |= SOF_IPC_STREAM_TRIG_START;
+
+		break;
 	case SNDRV_PCM_TRIGGER_SUSPEND:
 		break;


shouldn't there be some IPC sent on suspend, so that e.g. if there is any sort of context saving at the firmware level they'd know about it?

@plbossart I do send the CTX_SAVE ipc during suspend in the suspend callback but not here as it is not a stream ipc message.

plbossart · 2018-07-18T17:20:30Z

sound/soc/sof/pcm.c

+		}
+
+		/* trigger stream */
+		stream.hdr.cmd |= SOF_IPC_STREAM_TRIG_START;


BTW on resume we'd need to pass the new wall clock time to the firmware so that any trace logs are timestamped with the right values.

@plbossart good point. Let me add it if we're not doing that already.

ranj063 · 2018-07-20T08:16:32Z

@plbossart @lgirdwood I've made some more changes to the flow now.
But the good news is that both suspend/resume and runtime_pm work well. Could you please help review?
This needs some more extensive testing and there are 2 opens to address that I've mentioned in first comment that shows the progression of patches.

lgirdwood

Sorry could not see the opens you mentioned ?

lgirdwood · 2018-07-20T08:45:54Z

sound/soc/sof/pm.c

+		if (ret < 0) {
+			dev_err(sdev->dev,
+				"error: failed to load widget with ID: %d\n",
+				swidget->comp_id);


Would also be good to see the widget type failing here swidget->id, we would then have the type and id.

ranj063 · 2018-07-20T14:38:13Z

@lgirdwood the two opens left are:
2 tasks still left to do:

store/restore kcontrol values
free sroute/connect during route_unload

plbossart · 2018-07-20T14:53:16Z

@ranj063 do you want us to review/test now or will the two opens come shortly enough that we want to wait for the update.

ranj063 · 2018-07-20T15:02:53Z

@plbossart I should be done with the 2 opens by today and I think it will be better to test after those.

But I've made some change regarding which device we register the PM callbacks for. I could use some feedback to make sure I've done the right thing.

plbossart · 2018-07-20T15:09:38Z

@ranj063 can you describe the device change so that the rest of us don't have to reverse-engineer the deltas between patches?

ranj063 · 2018-07-20T17:15:06Z

@plbossart this is the commit id for the device change in this patchset:
0e6a2a584473a05efbb34185272e1c34cf436fcc

With the callbacks set for the pci device, they never get called as the runtime_pm_enable() is called for the platform device and not the pci device.

So setting the callbacks for the platform device fixes the issue.

lgirdwood · 2018-07-20T21:26:46Z

@ranj063 for the kcontrol restore, iirc baytrail/haswell drivers did this by caching the values in the host.

ranj063 · 2018-07-21T01:12:38Z

@lgirdwood, sure. I think I have an idea of how to do this. I will update the PR after I fix it.

ranj063 · 2018-07-21T05:29:42Z

@plbossart @lgirdwood I've addressed all opens now. Could I request you to please review the latest version.

Move the code to send ipc for initializing trace into a separate function that can be called during suspend/resume. Signed-off-by: Ranjani Sridharan <[email protected]>

This will be called during resume to send ipc for pipeline completion. Signed-off-by: Ranjani Sridharan <[email protected]>

This is the initial implementation for PM and runtime PM callbacks in the SOF driver. The suspend callback includes: suspend all pcm's stream that are running, send CTX_SAVE ipc, drop all ipc's, release trace dma and then power off the DSP. And the resume callback performs the following steps: load FW, run FW, re-initialize trace, restore pipeline, restore the kcontrol values and finally send the ctx restore ipc to the dsp. The streams that are suspended are resumed by the ALSA resume trigger. If the streams are paused during system suspend, they are marked explicitly so they can be restored during PAUSE_RELEASE. Signed-off-by: Ranjani Sridharan <[email protected]>

This patch adds the changes required to save the pcm hw_params that will be used to restart streams during PM resume. It also implements the flow for pcm resume trigger and handles the stop/pause_release triggersafter resume, There are 3 possible situations when the system resumes from sleep: 1. If the stream was running at suspend, the hw_params is restored and the stream started from the last know host dma position. 2. If the stream was paused at suspend and the user undoes pause after resume, the SNDRV_PCM_TRIGGER_RESUME does not get invoked for such streams. So these streams need to marked for hw_params to be restored at resume and started from the paused host dma position. 3. If the stream was paused at suspend and the user stops playback after resume, the trigger callback method should return without any further action because the stream has not been set up after resume anyway. Signed-off-by: Ranjani Sridharan <[email protected]>

Without this change the runtime_usage count never reaches 0 and the device never suspends even when it is idle. Signed-off-by: Ranjani Sridharan <[email protected]>

set the kcontrol cmd which will be used to send the correct ipc command to restore volume control value during resume. Signed-off-by: Ranjani Sridharan <[email protected]>

Set the PM callbacks for the SOF device instead of the acpi/pci/spi device. pm_runtime_enable() is called in the pcm_probe() method and it is enabled for the platform device. Signed-off-by: Ranjani Sridharan <[email protected]>

plbossart · 2018-07-24T14:43:09Z

@ranj063 I am no longer able to track what's going on with this series. I see something is new in GitHub but I see no new comments, no indication that there's new code to review and no answer to some of the old comments.

ranj063 · 2018-07-24T17:15:14Z

@plbossart @lgirdwood I havent made any new changes after I added support for paused streams.
I am working on it today. I will close this pull request and re-open a new one once I finish.

ranj063 · 2018-07-25T05:02:28Z

@lgirdwood @plbossart closing this one. Will open a new one so that it is easier to track the changes.

When enable SMMU, remove HNS driver will cause a WARNING: [ 141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8 [ 141.954673] Modules linked in: hns_enet_drv(-) [ 141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G W 5.0.0-rc1-28723-gb729c57de95c-dirty #32 [ 141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017 [ 142.000244] pstate: 60000005 (nZCv daif -PAN -UAO) [ 142.009886] pc : __iommu_dma_unmap+0xc0/0xc8 [ 142.018476] lr : __iommu_dma_unmap+0xc0/0xc8 [ 142.027066] sp : ffff000013533b90 [ 142.033728] x29: ffff000013533b90 x28: ffff8013e6983600 [ 142.044420] x27: 0000000000000000 x26: 0000000000000000 [ 142.055113] x25: 0000000056000000 x24: 0000000000000015 [ 142.065806] x23: 0000000000000028 x22: ffff8013e66eee68 [ 142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000 [ 142.087192] x19: 0000000000001000 x18: 0000000000000007 [ 142.097885] x17: 000000000000000e x16: 0000000000000001 [ 142.108578] x15: 0000000000000019 x14: 363139343a70616d [ 142.119270] x13: 6e75656761705f67 x12: 0000000000000000 [ 142.129963] x11: 00000000ffffffff x10: 0000000000000006 [ 142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0 [ 142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8 [ 142.162042] x5 : 0000000000000000 x4 : 0000000000000000 [ 142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500 [ 142.183427] x1 : 0000000000000000 x0 : 0000000000000035 [ 142.194120] Call trace: [ 142.199030] __iommu_dma_unmap+0xc0/0xc8 [ 142.206920] iommu_dma_unmap_page+0x20/0x28 [ 142.215335] __iommu_unmap_page+0x40/0x60 [ 142.223399] hnae_unmap_buffer+0x110/0x134 [ 142.231639] hnae_free_desc+0x6c/0x10c [ 142.239177] hnae_fini_ring+0x14/0x34 [ 142.246540] hnae_fini_queue+0x2c/0x40 [ 142.254080] hnae_put_handle+0x38/0xcc [ 142.261619] hns_nic_dev_remove+0x54/0xfc [hns_enet_drv] [ 142.272312] platform_drv_remove+0x24/0x64 [ 142.280552] device_release_driver_internal+0x17c/0x20c [ 142.291070] driver_detach+0x4c/0x90 [ 142.298259] bus_remove_driver+0x5c/0xd8 [ 142.306148] driver_unregister+0x2c/0x54 [ 142.314037] platform_driver_unregister+0x10/0x18 [ 142.323505] hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv] [ 142.335248] __arm64_sys_delete_module+0x214/0x25c [ 142.344891] el0_svc_common+0xb0/0x10c [ 142.352430] el0_svc_handler+0x24/0x80 [ 142.359968] el0_svc+0x8/0x7c0 [ 142.366104] ---[ end trace 60ad1cd58e63c407 ]--- The tx ring buffer map when xmit and unmap when xmit done. So in hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring() have a unmap operation for tx ring buffer, which is already unmapped when xmit done, than cause this WARNING. The hnae_alloc_buffers() is called in hnae_init_ring(), so the hnae_free_buffers() should be in hnae_fini_ring(), not in hnae_free_desc(). In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring(). When the ring buffer is tx ring, adds a piece of code to ensure that the tx ring is unmap. Signed-off-by: Yonglong Liu <[email protected]> Signed-off-by: Peng Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>

If a network driver provides to napi_gro_frags() an skb with a page fragment of exactly 14 bytes, the call to gro_pull_from_frag0() will 'consume' the fragment by calling skb_frag_unref(skb, 0), and the page might be freed and reused. Reading eth->h_proto at the end of napi_frags_skb() might read mangled data, or crash under specific debugging features. BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline] BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 Read of size 2 at addr ffff88809366840c by task syz-executor599/8957 CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142 napi_frags_skb net/core/dev.c:5833 [inline] napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 Fixes: a50e233 ("net-gro: restore frag0 optimization") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Nine years ago, I added RCU handling to neighbours, not pneighbours. (pneigh are not commonly used) Unfortunately I missed that /proc dump operations would use a common entry and exit point : neigh_seq_start() and neigh_seq_stop() We need to read_lock(tbl->lock) or risk use-after-free while iterating the pneigh structures. We might later convert pneigh to RCU and revert this patch. sysbot reported : BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825 CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240 seq_read+0x9cf/0x1110 fs/seq_file.c:258 proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221 do_loop_readv_writev fs/read_write.c:714 [inline] do_loop_readv_writev fs/read_write.c:701 [inline] do_iter_read+0x4a4/0x660 fs/read_write.c:935 vfs_readv+0xf0/0x160 fs/read_write.c:997 kernel_readv fs/splice.c:359 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:414 do_splice_to+0x127/0x180 fs/splice.c:877 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954 do_splice_direct+0x1da/0x2a0 fs/splice.c:1063 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4592c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4 R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff Allocated by task 9827: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731 arp_req_set_public net/ipv4/arp.c:1010 [inline] arp_req_set+0x613/0x720 net/ipv4/arp.c:1026 arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226 inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926 sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043 sock_ioctl+0x3ed/0x780 net/socket.c:1194 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9824: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline] __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356 neigh_ifdown+0x20/0x30 net/core/neighbour.c:372 arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274 inetdev_destroy net/ipv4/devinet.c:319 [inline] inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544 notifier_call_chain+0xc2/0x230 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178 rollback_registered+0x109/0x1d0 net/core/dev.c:8220 unregister_netdevice_queue net/core/dev.c:9267 [inline] unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260 unregister_netdevice include/linux/netdevice.h:2631 [inline] __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724 tun_detach drivers/net/tun.c:741 [inline] tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888097f2a700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888097f2a700, ffff888097f2a740) The buggy address belongs to the page: page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340 raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc Fixes: 767e97e ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Before thread in process context uses bh_lock_sock() we must disable bh. sysbot reported : WARNING: inconsistent lock state 5.2.0-rc3+ thesofproject#32 Not tainted inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline] 00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221 __sys_connect+0x264/0x330 net/socket.c:1834 __do_sys_connect net/socket.c:1845 [inline] __se_sys_connect net/socket.c:1842 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1842 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 2272 hardirqs last enabled at (2272): [<ffffffff810065f3>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (2271): [<ffffffff8100660f>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (1522): [<ffffffff87400654>] __do_softirq+0x654/0x94c kernel/softirq.c:320 softirqs last disabled at (2267): [<ffffffff81449010>] invoke_softirq kernel/softirq.c:374 [inline] softirqs last disabled at (2267): [<ffffffff81449010>] irq_exit+0x180/0x1d0 kernel/softirq.c:414 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_AX25); <Interrupt> lock(slock-AF_AX25); *** DEADLOCK *** 1 lock held by blkid/26581: #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline] #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312 stack backtrace: CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935 valid_state kernel/locking/lockdep.c:2948 [inline] mark_lock_irq kernel/locking/lockdep.c:3138 [inline] mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513 mark_irqflags kernel/locking/lockdep.c:3391 [inline] __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 call_timer_fn+0x193/0x720 kernel/time/timer.c:1322 expire_timers kernel/time/timer.c:1366 [inline] __run_timers kernel/time/timer.c:1685 [inline] __run_timers kernel/time/timer.c:1653 [inline] run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0033:0x7f858d5c3232 Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24 RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8 RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480 R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005 R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>

[ Upstream commit a4270d6 ] If a network driver provides to napi_gro_frags() an skb with a page fragment of exactly 14 bytes, the call to gro_pull_from_frag0() will 'consume' the fragment by calling skb_frag_unref(skb, 0), and the page might be freed and reused. Reading eth->h_proto at the end of napi_frags_skb() might read mangled data, or crash under specific debugging features. BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline] BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 Read of size 2 at addr ffff88809366840c by task syz-executor599/8957 CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142 napi_frags_skb net/core/dev.c:5833 [inline] napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 Fixes: a50e233 ("net-gro: restore frag0 optimization") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit d4d5d8e ] Before thread in process context uses bh_lock_sock() we must disable bh. sysbot reported : WARNING: inconsistent lock state 5.2.0-rc3+ thesofproject#32 Not tainted inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline] 00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221 __sys_connect+0x264/0x330 net/socket.c:1834 __do_sys_connect net/socket.c:1845 [inline] __se_sys_connect net/socket.c:1842 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1842 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 2272 hardirqs last enabled at (2272): [<ffffffff810065f3>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (2271): [<ffffffff8100660f>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (1522): [<ffffffff87400654>] __do_softirq+0x654/0x94c kernel/softirq.c:320 softirqs last disabled at (2267): [<ffffffff81449010>] invoke_softirq kernel/softirq.c:374 [inline] softirqs last disabled at (2267): [<ffffffff81449010>] irq_exit+0x180/0x1d0 kernel/softirq.c:414 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_AX25); <Interrupt> lock(slock-AF_AX25); *** DEADLOCK *** 1 lock held by blkid/26581: #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline] #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312 stack backtrace: CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935 valid_state kernel/locking/lockdep.c:2948 [inline] mark_lock_irq kernel/locking/lockdep.c:3138 [inline] mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513 mark_irqflags kernel/locking/lockdep.c:3391 [inline] __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:338 [inline] ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275 call_timer_fn+0x193/0x720 kernel/time/timer.c:1322 expire_timers kernel/time/timer.c:1366 [inline] __run_timers kernel/time/timer.c:1685 [inline] __run_timers kernel/time/timer.c:1653 [inline] run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0033:0x7f858d5c3232 Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24 RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8 RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480 R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005 R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit f3e92cb ] Nine years ago, I added RCU handling to neighbours, not pneighbours. (pneigh are not commonly used) Unfortunately I missed that /proc dump operations would use a common entry and exit point : neigh_seq_start() and neigh_seq_stop() We need to read_lock(tbl->lock) or risk use-after-free while iterating the pneigh structures. We might later convert pneigh to RCU and revert this patch. sysbot reported : BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825 CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ thesofproject#32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240 seq_read+0x9cf/0x1110 fs/seq_file.c:258 proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221 do_loop_readv_writev fs/read_write.c:714 [inline] do_loop_readv_writev fs/read_write.c:701 [inline] do_iter_read+0x4a4/0x660 fs/read_write.c:935 vfs_readv+0xf0/0x160 fs/read_write.c:997 kernel_readv fs/splice.c:359 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:414 do_splice_to+0x127/0x180 fs/splice.c:877 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954 do_splice_direct+0x1da/0x2a0 fs/splice.c:1063 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4592c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4 R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff Allocated by task 9827: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731 arp_req_set_public net/ipv4/arp.c:1010 [inline] arp_req_set+0x613/0x720 net/ipv4/arp.c:1026 arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226 inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926 sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043 sock_ioctl+0x3ed/0x780 net/socket.c:1194 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9824: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline] __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356 neigh_ifdown+0x20/0x30 net/core/neighbour.c:372 arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274 inetdev_destroy net/ipv4/devinet.c:319 [inline] inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544 notifier_call_chain+0xc2/0x230 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178 rollback_registered+0x109/0x1d0 net/core/dev.c:8220 unregister_netdevice_queue net/core/dev.c:9267 [inline] unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260 unregister_netdevice include/linux/netdevice.h:2631 [inline] __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724 tun_detach drivers/net/tun.c:741 [inline] tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888097f2a700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888097f2a700, ffff888097f2a740) The buggy address belongs to the page: page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340 raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc Fixes: 767e97e ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

… delivery After a treclaim, we expect to be in non-transactional state. If we don't clear the current thread's MSR[TS] before we get preempted, then tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in suspended transaction state. When handling a signal caught in transactional state, handle_rt_signal64() calls get_tm_stackpointer() that treclaims the transaction using tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause the TM Bad Thing exception below if later we pagefault and get preempted trying to access the user's sigframe, using __put_user(). Afterwards, when we are rescheduled back into do_page_fault() (but now in suspended state since the thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of the page fault handling, the exception is raised because a transition from suspended to non-transactional state is invalid. Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 thesofproject#32 NIP: c00000000000de44 LR: c000000000034728 CTR: 0000000000000000 REGS: c00000003fe7bd70 TRAP: 0700 Not tainted (5.4.0-rc2) MSR: 8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]> CR: 44000884 XER: 00000000 CFAR: c00000000000dda4 IRQMASK: 0 PACATMSCRATCH: 800000010280b033 GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78 GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0 GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000 GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140 GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800 GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000 NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0 LR [c000000000034728] handle_rt_signal64+0x78/0xc50 Call Trace: [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable) [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460 [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74 Instruction dump: 7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088 60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989 ---[ end trace 93094aa44b442f87 ]--- The simplified sequence of events that triggers the above exception is: ... # userspace in NON-TRANSACTIONAL state tbegin # userspace in TRANSACTIONAL state signal delivery # kernelspace in SUSPENDED state handle_rt_signal64() get_tm_stackpointer() treclaim # kernelspace in NON-TRANSACTIONAL state __put_user() page fault happens. We will never get back here because of the TM Bad Thing exception. page fault handling kicks in and we voluntarily preempt ourselves do_page_fault() __schedule() __switch_to(other_task) our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared __switch_to(our_task) switch_to_tm() tm_recheckpoint_new_task() trechkpt # kernelspace in SUSPENDED state The page fault handling resumes, but now we are in suspended transaction state do_page_fault() completes rfid <----- trying to get back where the page fault happened (we were non-transactional back then) TM Bad Thing # illegal transition from suspended to non-transactional This patch fixes that issue by clearing the current thread's MSR[TS] just after treclaim in get_tm_stackpointer() so that we stay in non-transactional state in case we are preempted. In order to make treclaim and clearing the thread's MSR[TS] atomic from a preemption perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is used. It's also necessary to save the previous value of the thread's MSR before get_tm_stackpointer() is called so that it can be exposed to the signal handler later in setup_tm_sigcontexts() to inform the userspace MSR at the moment of the signal delivery. Found with tm-signal-context-force-tm kernel selftest. Fixes: 2b0a576 ("powerpc: Add new transactional memory state to the signal context") Cc: [email protected] # v3.9 Signed-off-by: Gustavo Luiz Duarte <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 thesofproject#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 thesofproject#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 thesofproject#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 thesofproject#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 thesofproject#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 thesofproject#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 thesofproject#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 thesofproject#32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 thesofproject#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 thesofproject#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 thesofproject#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 thesofproject#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 thesofproject#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 thesofproject#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 thesofproject#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 thesofproject#32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 thesofproject#33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 thesofproject#35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Use @file to pass the --cfg flags to rustc (fixes thesofproject#22)

When the dwc2 platform device is removed, it unregisters the generic phy. usb_remove_phy() is called and the dwc2 usb_phy is removed from the "phy_list", but the uevent may still attempt to get the usb_phy from the list, resulting in a page fault bug. Currently we can't access the usb_phy from the "phy_list" after the device is removed. As a fix check to make sure that we can get the usb_phy before moving forward with the uevent. [ 84.949345] BUG: unable to handle page fault for address:00000007935688d8 [ 84.949349] #PF: supervisor read access in kernel mode [ 84.949351] #PF: error_code(0x0000) - not-present page [ 84.949353] PGD 0 P4D 0 [ 84.949356] Oops: 0000 [#1] SMP PTI [ 84.949360] CPU: 2 PID: 2081 Comm: rmmod Not tainted 5.13.0-rc4-snps-16547-ga8534cb092d7-dirty #32 [ 84.949363] Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS 786G3 v03.54 11/02/2011 [ 84.949365] RIP: 0010:usb_phy_uevent+0x99/0x121 [ 84.949372] Code: 8d 83 f8 00 00 00 48 3d b0 12 22 94 74 05 4c 3b 23 75 5b 8b 83 9c 00 00 00 be 32 00 00 00 48 8d 7c 24 04 48 c7 c2 d4 5d 7b 93 <48> 8b 0c c5 e0 88 56 93 e8 0f 63 8a ff 8b 83 98 00 00 00 be 32 00 [ 84.949375] RSP: 0018:ffffa46bc0f2fc70 EFLAGS: 00010246 [ 84.949378] RAX: 00000000ffffffff RBX: ffffffff942211b8 RCX: 0000000000000027 [ 84.949380] RDX: ffffffff937b5dd4 RSI: 0000000000000032 RDI: ffffa46bc0f2fc74 [ 84.949383] RBP: ffff94a306613000 R08: 0000000000000000 R09: 00000000fffeffff [ 84.949385] R10: ffffa46bc0f2faa8 R11: ffffa46bc0f2faa0 R12: ffff94a30186d410 [ 84.949387] R13: ffff94a32d188a80 R14: ffff94a30029f960 R15: ffffffff93522dd0 [ 84.949389] FS: 00007efdbd417540(0000) GS:ffff94a513a80000(0000) knlGS:0000000000000000 [ 84.949392] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.949394] CR2: 00000007935688d8 CR3: 0000000165606000 CR4: 00000000000006e0 [ 84.949396] Call Trace: [ 84.949401] dev_uevent+0x190/0x1ad [ 84.949408] kobject_uevent_env+0x18e/0x46c [ 84.949414] device_release_driver_internal+0x17f/0x18e [ 84.949418] bus_remove_device+0xd3/0xe5 [ 84.949421] device_del+0x1c3/0x31d [ 84.949425] ? kobject_put+0x97/0xa8 [ 84.949428] platform_device_del+0x1c/0x63 [ 84.949432] platform_device_unregister+0xa/0x11 [ 84.949436] dwc2_pci_remove+0x1e/0x2c [dwc2_pci] [ 84.949440] pci_device_remove+0x31/0x81 [ 84.949445] device_release_driver_internal+0xea/0x18e [ 84.949448] driver_detach+0x68/0x72 [ 84.949450] bus_remove_driver+0x63/0x82 [ 84.949453] pci_unregister_driver+0x1a/0x75 [ 84.949457] __do_sys_delete_module+0x149/0x1e9 [ 84.949462] ? task_work_run+0x64/0x6e [ 84.949465] ? exit_to_user_mode_prepare+0xd4/0x10d [ 84.949471] do_syscall_64+0x5d/0x70 [ 84.949475] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 84.949480] RIP: 0033:0x7efdbd563bcb [ 84.949482] Code: 73 01 c3 48 8b 0d c5 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 82 0c 00 f7 d8 64 89 01 48 [ 84.949485] RSP: 002b:00007ffe944d7d98 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 84.949489] RAX: ffffffffffffffda RBX: 00005651072eb700 RCX: 00007efdbd563bcb [ 84.949491] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005651072eb768 [ 84.949493] RBP: 00007ffe944d7df8 R08: 0000000000000000 R09: 0000000000000000 [ 84.949495] R10: 00007efdbd5dfac0 R11: 0000000000000206 R12: 00007ffe944d7fd0 [ 84.949497] R13: 00007ffe944d8610 R14: 00005651072eb2a0 R15: 00005651072eb700 [ 84.949500] Modules linked in: uas configfs dwc2_pci(-) phy_generic fuse crc32c_intel [last unloaded: udc_core] [ 84.949508] CR2: 00000007935688d8 [ 84.949510] ---[ end trace e40c871ca3e4dc9e ]--- [ 84.949512] RIP: 0010:usb_phy_uevent+0x99/0x121 Fixes: a8534cb ("usb: phy: introduce usb_phy device type with its own uevent handler") Reviewed-by: Peter Chen <[email protected]> Signed-off-by: Artur Petrosyan <[email protected]> Signed-off-by: Thinh Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

This commit adds python script to parse CoreSight tracing event and print out source line and disassembly, it generates readable program execution flow for easier humans inspecting. The script receives CoreSight tracing packet with below format: +------------+------------+------------+ packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+ packet::addr presents the start address of the coming branch sample, and packet::ip is the last address of the branch smple. Therefore, a code section between branches starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we combines the two continuous packets to generate the address range for instructions: [ sample(n)::addr .. sample(n+1)::ip ] The script supports both objdump or llvm-objdump for disassembly with specifying option '-d'. If doesn't specify option '-d', the script simply outputs source lines and symbols. Below shows usages with llvm-objdump or objdump to output disassembly. # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux ARM CoreSight Trace Data Assembler Dump ffff800008eb3198 <etm4_enable_hw>: ffff800008eb3310: c0 38 00 35 cbnz w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890> ffff800008eb3314: 9f 3f 03 d5 dsb sy ffff800008eb3318: df 3f 03 d5 isb ffff800008eb331c: f5 5b 42 a9 ldp x21, x22, [sp, #32] ffff800008eb3320: fb 73 45 a9 ldp x27, x28, [sp, #80] ffff800008eb3324: e0 82 40 39 ldrb w0, [x23, #32] ffff800008eb3328: 60 00 00 34 cbz w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c> ffff800008eb332c: e0 03 19 aa mov x0, x25 ffff800008eb3330: 8c fe ff 97 bl 0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0> main 6728/6728 [0004] 0.000000000 etm4_enable_hw+0x198 [kernel.kallsyms] ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>: ffff800008eb2d60: 1f 20 03 d5 nop ffff800008eb2d64: 1f 20 03 d5 nop ffff800008eb2d68: 3f 23 03 d5 hint #25 ffff800008eb2d6c: 00 00 40 f9 ldr x0, [x0] ffff800008eb2d70: 9f 3f 03 d5 dsb sy ffff800008eb2d74: 00 c0 3e 91 add x0, x0, #4016 ffff800008eb2d78: 1f 00 00 b9 str wzr, [x0] ffff800008eb2d7c: bf 23 03 d5 hint #29 ffff800008eb2d80: c0 03 5f d6 ret main 6728/6728 [0004] 0.000000000 etm4_cs_lock.isra.0.part.0+0x20 # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux ARM CoreSight Trace Data Assembler Dump ffff800008eb3310 <etm4_enable_hw+0x178>: ffff800008eb3310: 350038c0 cbnz w0, ffff800008eb3a28 <etm4_enable_hw+0x890> ffff800008eb3314: d5033f9f dsb sy ffff800008eb3318: d5033fdf isb ffff800008eb331c: a9425bf5 ldp x21, x22, [sp, #32] ffff800008eb3320: a94573fb ldp x27, x28, [sp, #80] ffff800008eb3324: 394082e0 ldrb w0, [x23, #32] ffff800008eb3328: 34000060 cbz w0, ffff800008eb3334 <etm4_enable_hw+0x19c> ffff800008eb332c: aa1903e0 mov x0, x25 ffff800008eb3330: 97fffe8c bl ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0> main 6728/6728 [0004] 0.000000000 etm4_enable_hw+0x198 [kernel.kallsyms] ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>: ffff800008eb2d60: d503201f nop ffff800008eb2d64: d503201f nop ffff800008eb2d68: d503233f paciasp ffff800008eb2d6c: f9400000 ldr x0, [x0] ffff800008eb2d70: d5033f9f dsb sy ffff800008eb2d74: 913ec000 add x0, x0, #0xfb0 ffff800008eb2d78: b900001f str wzr, [x0] ffff800008eb2d7c: d50323bf autiasp ffff800008eb2d80: d65f03c0 ret main 6728/6728 [0004] 0.000000000 etm4_cs_lock.isra.0.part.0+0x20 Signed-off-by: Leo Yan <[email protected]> Co-authored-by: Al Grant <[email protected]> Co-authored-by: Mathieu Poirier <[email protected]> Co-authored-by: Tor Jeremiassen <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Eelco Chaudron <[email protected]> Cc: German Gomez <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Tanmay Jagdale <[email protected]> Cc: [email protected] Cc: zengshun . wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

[ Upstream commit bcd7026 ] By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 thesofproject#32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 thesofproject#33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 thesofproject#34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 thesofproject#35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 thesofproject#36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

The conclusion "j1939_session_deactivate() should be called with a session ref-count of at least 2" is incorrect. In some concurrent scenarios, j1939_session_deactivate can be called with the session ref-count less than 2. But there is not any problem because it will check the session active state before session putting in j1939_session_deactivate_locked(). Here is the concurrent scenario of the problem reported by syzbot and my reproduction log. cpu0 cpu1 j1939_xtp_rx_eoma j1939_xtp_rx_abort_one j1939_session_get_by_addr [kref == 2] j1939_session_get_by_addr [kref == 3] j1939_session_deactivate [kref == 2] j1939_session_put [kref == 1] j1939_session_completed j1939_session_deactivate WARN_ON_ONCE(kref < 2) ===================================================== WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70 CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ thesofproject#32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:j1939_session_deactivate+0x5f/0x70 Call Trace: j1939_session_deactivate_activate_next+0x11/0x28 j1939_xtp_rx_eoma+0x12a/0x180 j1939_tp_recv+0x4a2/0x510 j1939_can_recv+0x226/0x380 can_rcv_filter+0xf8/0x220 can_receive+0x102/0x220 ? process_backlog+0xf0/0x2c0 can_rcv+0x53/0xf0 __netif_receive_skb_one_core+0x67/0x90 ? process_backlog+0x97/0x2c0 __netif_receive_skb+0x22/0x80 Fixes: 0c71437 ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object") Reported-by: [email protected] Signed-off-by: Ziyang Xuan <[email protected]> Acked-by: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>

IRZTAR

si

ranj063 force-pushed the topic/suspend branch from 1baaa2f to a1a2397 Compare July 13, 2018 07:45

ranj063 force-pushed the topic/suspend branch from a1a2397 to 78e6b11 Compare July 17, 2018 06:25

lgirdwood requested changes Jul 17, 2018

View reviewed changes

ranj063 force-pushed the topic/suspend branch from 78e6b11 to 59d7272 Compare July 18, 2018 08:29

lgirdwood requested changes Jul 18, 2018

View reviewed changes

plbossart reviewed Jul 18, 2018

View reviewed changes

ranj063 force-pushed the topic/suspend branch from 59d7272 to 0ce69cd Compare July 20, 2018 08:11

lgirdwood reviewed Jul 20, 2018

View reviewed changes

ranj063 force-pushed the topic/suspend branch from 0ce69cd to b7b3384 Compare July 21, 2018 05:28

ranj063 force-pushed the topic/suspend branch from b7b3384 to 8723f3a Compare July 21, 2018 05:43

ranj063 added 3 commits July 21, 2018 23:46

ASoC: SOF: move ipc for initializing trace into a separate function

9bc5251

Move the code to send ipc for initializing trace into a separate function that can be called during suspend/resume. Signed-off-by: Ranjani Sridharan <[email protected]>

ASoC: SOF: make sof_complete_pipeline non static

47d23a6

This will be called during resume to send ipc for pipeline completion. Signed-off-by: Ranjani Sridharan <[email protected]>

ranj063 force-pushed the topic/suspend branch from 2eae944 to d284a2d Compare July 22, 2018 06:48

ranj063 added 4 commits July 22, 2018 21:44

ASoC: SOF: invoke runtime_put after booting firmware

2824587

Without this change the runtime_usage count never reaches 0 and the device never suspends even when it is idle. Signed-off-by: Ranjani Sridharan <[email protected]>

ASoC: SOF: set kcontrol cmd for pga widget

a1e21c6

set the kcontrol cmd which will be used to send the correct ipc command to restore volume control value during resume. Signed-off-by: Ranjani Sridharan <[email protected]>

ASoC: SOF: move PM callbacks to the SOF platform device

aa8cb23

Set the PM callbacks for the SOF device instead of the acpi/pci/spi device. pm_runtime_enable() is called in the pcm_probe() method and it is enabled for the platform device. Signed-off-by: Ranjani Sridharan <[email protected]>

ranj063 force-pushed the topic/suspend branch from d284a2d to aa8cb23 Compare July 23, 2018 04:49

ranj063 closed this Jul 25, 2018

ranj063 deleted the topic/suspend branch March 22, 2019 17:08

ClarexZhou mentioned this pull request Apr 2, 2020

[SDW][BUG]Output source in sound setting is wrong(shows:multichannel output ). #1918

Closed

aiChaoSONG pushed a commit to aiChaoSONG/linux that referenced this pull request May 6, 2021

Merge pull request thesofproject#32 from Rust-for-Linux/rust-cfg-file

c28ec6f

Use @file to pass the --cfg flags to rustc (fixes thesofproject#22)

IRZTAR approved these changes Feb 23, 2023

View reviewed changes

		@@ -1365,8 +1400,11 @@ static int sof_widget_unload(struct snd_soc_component *scomp,

		/* free volume table */
		kfree(scontrol->volume_table);

[RFC]Suspend/Resume flow for APL #32

[RFC]Suspend/Resume flow for APL #32

Conversation

ranj063 commented Jul 13, 2018 • edited Loading

ranj063 commented Jul 13, 2018

RanderWang commented Jul 13, 2018 • edited Loading

RanderWang commented Jul 13, 2018

RanderWang commented Jul 13, 2018

plbossart commented Jul 13, 2018 • edited Loading

lgirdwood commented Jul 16, 2018

ranj063 commented Jul 16, 2018

ranj063 commented Jul 17, 2018

lgirdwood commented Jul 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranj063 commented Jul 18, 2018

lgirdwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plbossart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranj063 commented Jul 20, 2018

lgirdwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranj063 commented Jul 20, 2018

plbossart commented Jul 20, 2018

ranj063 commented Jul 20, 2018 • edited Loading

plbossart commented Jul 20, 2018

ranj063 commented Jul 20, 2018 • edited Loading

lgirdwood commented Jul 20, 2018

ranj063 commented Jul 21, 2018

ranj063 commented Jul 21, 2018

plbossart commented Jul 24, 2018

ranj063 commented Jul 24, 2018

ranj063 commented Jul 25, 2018

IRZTAR left a comment

Choose a reason for hiding this comment

ranj063 commented Jul 13, 2018 •

edited

Loading

RanderWang commented Jul 13, 2018 •

edited

Loading

plbossart commented Jul 13, 2018 •

edited

Loading

ranj063 commented Jul 20, 2018 •

edited

Loading

ranj063 commented Jul 20, 2018 •

edited

Loading