Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio_video patches #2

Closed
wants to merge 84 commits into from

Conversation

dwlsalmeida
Copy link

@dwlsalmeida dwlsalmeida commented Nov 12, 2022

Keeping the patches I've applied locally here. Let me know what I have to do to get these through Gerrit as well.

hikalium and others added 30 commits October 3, 2022 15:34
This adds a Virtio based video driver for video streaming device that
operates input and output data buffers to share video devices with
several guests. The current implementation consist of V4L2 based video
driver supporting video functions of decoder and encoder. The device
uses command structures to advertise and negotiate stream formats and
controls. This allows the driver to modify the processing logic of the
device on a per stream basis.

Signed-off-by: Dmitry Sepp <[email protected]>
Signed-off-by: Kiran Pawar <[email protected]>
Signed-off-by: Nikolay Martyanov <[email protected]>
Signed-off-by: Samiullah Khawaja <[email protected]>

(am from https://patchwork.linuxtv.org/patch/61717/)

BUG=b:120456557
TEST=compile with VIRTIO_VIDEO enabled

Fixes:
* Fixed a typo in the commit message: "video_video" -> "virtio_video"
* Fixed SPDX-License-Identifier in /include/uapi/linux/virtio_video.h
* Removed vidioc_enum_fmt_vid_{cap, out}_mplane callbacks that were removed by
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e98b7b542a456582ea3029be857cc99a3b19bd5

Change-Id: I037b20a9faa1b31bb260cc2a0d57932e1979395e
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2066459
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge, resolved conflicts on:
            include/uapi/linux/virtio_ids.h]
Signed-off-by: Hikaru Nishida <[email protected]>
Replace pix_mp->width with pix_mp->height.

BUG=b:151703605
TEST=compile

Change-Id: I75baead1b9679b54990fd1e13267f634078c2df1
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2107041
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
Add missing NULL check in init_ctrls callbacks for decoder and encoder.

BUG=b:151703605
TEST=compile

Change-Id: I2789ef2a4eae1c140f7d28a0228dd1c29ad36eee
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2107042
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
…ontaining frame sizes in stream

Some coded format streams like VP8 and H.264 contain metadata.
For such formats, try_fmt is called with width=0 and height=0.
This CL makes try_fmt callback support this case.

BUG=b:151703605
TEST=compile

Change-Id: I7a950f52c0ffdba0aba9febdc10f11e640d347ed
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2107043
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
Support V4L2_SEL_TGT_COMPOSE as a selection target in the g_selection
callback.

BUG=b:151703605
TEST=compile

Change-Id: Ibc156fc2c8eeda4ddb1c45d8bd65b03a0d160df8
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2107044
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
…r with EOS flag

The host can return an empty CAPTURE buffer to notify EOS or an error.
In such case, the driver must be mark the buffer as "done" with byteused=0.

BUG=b:151703605
TEST=compile

Change-Id: I78700fb33a8797b42a7c522f368c41b693fb81f9
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2107045
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
…deo driver

This patch makes the virtio-video driver use virtio objects as DMA buffers.
So, users will beable to import resources exported by other virtio devices
such as virtio-gpu.

Currently, we assumes that only one virtio object for each v4l2_buffer
format even if it's for a multiplanar format.

Signed-off-by: Keiichi Watanabe <[email protected]>
(am from https://patchwork.kernel.org/patch/11457439/)

BUG=b:120456557
TEST=run simple decoding test

Change-Id: I6cd47add6e760fa3b3b8bf08e3a3d93cc97243a8
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2060511
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Sean Paul <[email protected]>
Tested-by: Keiichi Watanabe <[email protected]>
Commit-Queue: Keiichi Watanabe <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]

Change-Id: Ie92e7cedca5da5553162cfd7fc27947fb7ae8fb5
…iver

0-day reports:

drivers/media/virtio/virtio_video_driver.c:103:5-11: ERROR:
	allocation function on line 102 returns NULL not ERR_PTR on failure

The various basic memory allocation functions don't return ERR_PTR.

BUG=b:120456557, b:151703605
TEST=compile with VIRTIO_VIDEO enabled

Change-Id: Ia556120dd001a4f395bcfb66a29f5f3c81ee8dd0
Fixes: bbca485 ("BACKPORT: FROMLIST: virtio_video: Add the Virtio Video V4L2 driver")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2138634
Reviewed-by: Alexandre Courbot <[email protected]>
Reviewed-by: Keiichi Watanabe <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
This invalidates a uuid associated with a vb2_buffer object when the
uuid is used for another vb2_buffer.

This patch will fix a problem that happens in scenarios like this:

(1) QBUF(index=0, DMABUF uuid=1)
(2) DQBUF(index=0)
(3) QBUF(index=1, DMABUF uuid=1)
(4) DQBUF(index=1)
(5) QBUF(index=0, DMABUF uuid=1)

Note that index is associated with a vb2_buffer and a resource_id here.

In this scenario, resource_create needs to be sent to register of a new
combination of resource_id and a uuid for (1), (3) and (5).

However, in the previous implementation, resource_create wasn't sent at (5)
because the combination (index=0, uuid=1) had been registered for (1) and
not been invalidated.

With this patch, a uuid in a vbb2_buffer object will be invalidated when
the uuid is tied with another vb2_buffer.
In this scenario, the combination registered at (1) will be invalidated
at (3).

Change-Id: Ie2eb7f50980cc2258540f692331a16aa11b5c334

---

This patch fixes a bug introduced by CL:2060511, which was submitted to
LKML but not approved. So, this change should be included in the next
revision of CL:2060511.

BUG=b:151703605
TEST=compile

Change-Id: I441bb15d4c5da834a189f8709a2b95246cea36fb
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2120830
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
When the virtio-video driver sends RESOURCE_DESTROY_ALL command, it must
wait for the host returning a response for the command.

BUG=b:151703605
TEST=compile

Change-Id: I7eeea5d3ad6beab172f9590611fe558b2e5035fb
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2060510
[5.4-arcvm: picked from 5.4 for initial merge]
…iver

0-day reports:

drivers/media/virtio/virtio_video_dec.c:156:5: warning:
	no previous prototype for 'virtio_video_dec_init_ctrls'
drivers/media/virtio/virtio_video_dec.c:177:5: warning:
	no previous prototype for 'virtio_video_dec_init_queues'
drivers/media/virtio/virtio_video_dec.c:335:5: warning:
	no previous prototype for 'virtio_video_dec_enum_fmt_vid_out'
drivers/media/virtio/virtio_video_dec.c:417:5: warning:
	no previous prototype for 'virtio_video_dec_init'
drivers/media/virtio/virtio_video_dec.c: In function 'virtio_video_dec_init':
drivers/media/virtio/virtio_video_dec.c:419:10: warning:
	variable 'num' set but not used

drivers/media/virtio/virtio_video_enc.c:162:5: warning:
	no previous prototype for 'virtio_video_enc_init_ctrls'
drivers/media/virtio/virtio_video_enc.c:223:5: warning:
	no previous prototype for 'virtio_video_enc_init_queues'
drivers/media/virtio/virtio_video_enc.c:559:5: warning:
	no previous prototype for 'virtio_video_enc_init'
drivers/media/virtio/virtio_video_enc.c: In function 'virtio_video_enc_init':
drivers/media/virtio/virtio_video_enc.c:561:10: warning:
	variable 'num' set but not used

BUG=b:120456557, b:151703605
TEST=compile with VIRTIO_VIDEO enabled

Change-Id: Ic0e9f9b5efc54b627cd9024e90b3559a731ab1cc
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2214405
Reviewed-by: Keiichi Watanabe <[email protected]>
Tested-by: Keiichi Watanabe <[email protected]>
Commit-Queue: Keiichi Watanabe <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
…iver

0-day reports:

drivers/media/virtio/virtio_video_device.c:1188:6: error:
	no previous prototype for 'virtio_video_device_destroy'
drivers/media/virtio/virtio_video_helpers.c:188:10: error:
	no previous prototype for 'virtio_video_get_format_from_virtio_profile'

Somewhat guessing here, virtio_video_device_destroy() is only called
locally and thus marked static.
virtio_video_get_format_from_virtio_profile() is not called at all and
presumably intended to be used as helper function. Removed this function,
as the conversion from a profile to a format doesn't make much sense.
If we need a V4L2 format, we should virtio_video_format_to_v4l2 instead.

BUG=b:120456557, b:151703605
TEST=compile with VIRTIO_VIDEO enabled

Change-Id: I02d577f8e9b1a65cbbe9877391f8f755a474fc78
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Keiichi Watanabe <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2226628
[5.4-arcvm: picked from 5.4 for initial merge]
BUG=b:151703605
TEST=tast run ARCVM-DUT arc.VideoDecodeAccel*vm

Change-Id: I8c699abfef01c5063494546031496afa806274ea
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2253500
Reviewed-by: Tomasz Figa <[email protected]>
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
After calling V4L2_DEC_CMD_START, the stream state should be updated
from STOPPED to RUNNING.

BUG=b:168557465
BUG=b:151703605
TEST=pass android.media.cts.AdaptivePlaybackTest#testVP8_eosFlushSeek

Change-Id: I849a5d92c692bf675c54d443c7157da86877db50
Signed-off-by: Chih-Yu Huang <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2410146
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
Commit-Queue: Chih-Yu Huang <[email protected]>
Tested-by: Chih-Yu Huang <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
0-day reports:

virtio_video_device.c:(.text+0xc00): undefined reference to `virtio_dma_buf_get_uuid'

VIRTIO_VIDEO now unconditionally requires VIRTIO_DMA_SHARED_BUFFER.
Since VIRTIO_DMA_SHARED_BUFFER depends on VIRTIO_MENU, which is
independent of VIRTIO, VIRTIO_VIDEO now unconditionally depends on
VIRTIO_MENU as well.

This still leaves

WARNING: unmet direct dependencies detected for VIRTIO_DMA_SHARED_BUFFER
  Depends on [n]: VIRTIO_MENU [=n] && DMA_SHARED_BUFFER [=y]
  Selected by [m]:
  - DRM_VIRTIO_GPU [=m] && HAS_IOMEM [=y] && DRM [=m] && VIRTIO [=y] && MMU [=y] && PCI [=y]

which is seen if VIRTIO_MENU is disabled. This problem was introduced by
commit 7c5e019 ("BACKPORT: FROMGIT: virtio: fix build for configs
without dma-bufs"). Since that commit, DRM_VIRTIO_GPU depends on
VIRTIO_MENU as well. Add that dependency.

BUG=b:142423916
TEST=Test builds

Change-Id: Ifa87ff51b39bf39feffc28e65c3c346a324415c9
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2410825
Reviewed-by: David Stevens <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
BUG=b:151703605
TEST=manual - Run ARCVM

Signed-off-by: Lepton Wu <[email protected]>
Change-Id: Ife9d5a8c35e6496a523e15ae95c585cce4949c84
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2477075
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
Tested-by: Keiichi Watanabe <[email protected]>
Commit-Queue: Chih-Yu Huang <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
When possible, avoid calling dma_buf_get. If dma_buf_get is called, make
sure dma_buf_put is also called.

BUG=b:151703605, b:170702290, b:167992701
TEST=No leaks in /sys/kernel/debug/dma_buf/bufinfo after decoding

Change-Id: I1b39f8dd2599c625cb6396bd6b5914147976f564
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2483785
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
BUG=b:151703605
TEST=manual - Run ARCVM

Signed-off-by: Lepton Wu <[email protected]>
Change-Id: I6621bd04d76efc920348eeda0393e1efd0b79c7f
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2491443
Commit-Queue: Keiichi Watanabe <[email protected]>
Commit-Queue: Alexandre Courbot <[email protected]>
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
[5.4-arcvm: picked from 5.4 for initial merge]
This CL introduces the VIRTIO_VIDEO_CONTROL_FORCE_KEYFRAME control to
the virtio encoder, to support the V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME
control.

Signed-off-by: David Staessens <[email protected]>

BUG=b:161498590,b:174444769,b:175270403
TEST=tast run DUT arc.VideoEncodeAccel.h264_192p_i420_vm

[5.4-arcvm: picked from 5.4]

Change-Id: I4b7057dcf4595c91b26bb2c70b4b3a584f9baa13
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2594594
Reviewed-by: Alexandre Courbot <[email protected]>
Tested-by: David Staessens <[email protected]>
Commit-Queue: David Staessens <[email protected]>
When stopping a stream, set its state to drain before sending the drain
command to prevent a race where the drain command callback is processed
before the state gets set to drain.

BUG=b:151703605
TEST=No flakes in AdaptivePlaybackTest#testVP9_adaptiveEosFlushSeek

[5.4-arcvm: picked from 5.4]
Change-Id: I680b60d59ac0a7619e7acb2066419d1a2128efd4
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2592625
Reviewed-by: Hikaru Nishida <[email protected]>
Treat VIRTIO_VIDEO_EVENT_ERROR as an unrecoverable errors for the
respective stream.

The spec isn't explicit as to whether or not the error event is fatal.
However, since the error event is generic, the cause is opaque to the
driver, so the driver can't meaningfully do anything to recover in the
case of a non-fatal error. Therefore it doesn't make sense for the
device to send non-fatal error events in the first place.

BUG=b:151703605, b:174445948
TEST=android.security.cts.StagefrightTest#testStagefright_bug_33818508

[5.4-arcvm: picked from 5.4]
Change-Id: I4d9eee47ff3e42058768c0e6e70c69518e8abe20
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2594587
Reviewed-by: Hikaru Nishida <[email protected]>
…O_BITRATE.

Upon initialization the encode device requests the bitrate from the
underlying encoder and uses that as both the default and maximum value.
This prevents us from configuring bitrates higher than the initial
value, so this CL changes the maximum value to 1GBs instead.

BUG=b:162799179,b:177213709
TEST=tast run DUT arc.VideoEncodeAccel.h264_192p_i420_vm

[5.4-arcvm: picked from 5.4]
(cherry picked from commit 675b572)

Signed-off-by: David Staessens <[email protected]>
Change-Id: Ic32f5de210f93d2d2789be24602aee2c0a736a16
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2627240
Reviewed-by: Hikaru Nishida <[email protected]>
Tested-by: David Staessens <[email protected]>
Commit-Queue: David Staessens <[email protected]>
Remove streams from the id map and wait for any ongoing events to
complete before cleanup begins.

BUG=b:177697115, b:151703605
TEST=android.security.cts.StagefrightTest

Change-Id: I141b3844ffb507bfda0ea0d94c9eacb8173781a9
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2684016
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
BUG=b:177697115, b:151703605
TEST=android.security.cts.StagefrightTest

Change-Id: Ib83251555bc0af40cb015fd08c6a9335c3e3f4a5
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2683951
Reviewed-by: Alexandre Courbot <[email protected]>
Reviewed-by: Keiichi Watanabe <[email protected]>
If a buffer's data offset changes, create a new object for the buffer.
In particular, this can happen when data offset is used to skip headers
at the start of input buffers.

BUG=b:174531173,b:151703605
TEST=android.media.cts.MediaDrmClearkeyTest#testClearKeyPlaybackMpeg2ts

Change-Id: I538a0bd1667a9e4b17d77f8e4707dda9700c824b
Signed-off-by: David Stevens <[email protected]>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2696229
Reviewed-by: Chih-Yu Huang <[email protected]>
Reviewed-by: Keiichi Watanabe <[email protected]>
This CL removes the unused virtio_video_mark_drain_complete function
from the virtio video device. All steps required to finish draining are
already done in the virtio_video_buf_done function.

Signed-off-by: David Staessens <[email protected]>

BUG=b:151703605
TEST=emerge-$BOARD sys-kernel/arcvm-kernel-ack-5_4

Change-Id: Ibcb59624b4d5e4d016a7a3627db755c92a5a2d67
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2879433
Reviewed-by: Keiichi Watanabe <[email protected]>
Tested-by: David Staessens <[email protected]>
Commit-Queue: David Staessens <[email protected]>
When draining the V4L2 encode device or handling a drain done callback,
the stream is not properly locked. In some cases this causes the
virtio_video_buf_done (EOS done) and virtio_video_encoder_cmd
(restart encoder) functions to be executed before the original
virtio_video_encoder_cmd call to drain the encoder is even finished
executing. This leads to the stream ending up in the draining state
after it has already finished draining.

Extra mutex lock and unlock operations have been introduced here to
serialize access to the virtio_video_encoder_cmd and
virtio_video_buf_done functions to avoid the above issue.

Signed-off-by: David Staessens <[email protected]>

BUG=b:186588566,b:151703605,b:187470003
TEST=android.mediav2.cts.EncoderProfileLevelTest

Change-Id: I35d36b31549a9883eaff1362d8b47653222f5613
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2859568
Reviewed-by: Keiichi Watanabe <[email protected]>
Reviewed-by: David Stevens <[email protected]>
Reviewed-by: Alexandre Courbot <[email protected]>
Commit-Queue: David Staessens <[email protected]>
Tested-by: David Staessens <[email protected]>
This reverts commit 3f5e7b5cba1430eef2a7d22236de4df678afb804.

Reason for revert: It caused GTS regression: b/188009828 

Original change's description:
> CHROMIUM: virtio_video_enc: Lock stream mutex when draining.
>
> When draining the V4L2 encode device or handling a drain done callback,
> the stream is not properly locked. In some cases this causes the
> virtio_video_buf_done (EOS done) and virtio_video_encoder_cmd
> (restart encoder) functions to be executed before the original
> virtio_video_encoder_cmd call to drain the encoder is even finished
> executing. This leads to the stream ending up in the draining state
> after it has already finished draining.
>
> Extra mutex lock and unlock operations have been introduced here to
> serialize access to the virtio_video_encoder_cmd and
> virtio_video_buf_done functions to avoid the above issue.
>
> Signed-off-by: David Staessens <[email protected]>
>
> BUG=b:186588566,b:151703605,b:187470003
> TEST=android.mediav2.cts.EncoderProfileLevelTest
>
> Change-Id: I35d36b31549a9883eaff1362d8b47653222f5613
> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2859568
> Reviewed-by: Keiichi Watanabe <[email protected]>
> Reviewed-by: David Stevens <[email protected]>
> Reviewed-by: Alexandre Courbot <[email protected]>
> Commit-Queue: David Staessens <[email protected]>
> Tested-by: David Staessens <[email protected]>

Bug: b:186588566
Bug: b:151703605
Bug: b:187470003
Bug: b:188009828
Change-Id: Id1053f1ea2f19ae44ad9d78855687c12e7fa8707
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2896200
Reviewed-by: Chih-Yu Huang <[email protected]>
Tested-by: Chih-Yu Huang <[email protected]>
Owners-Override: Chih-Yu Huang <[email protected]>
Auto-Submit: Chih-Yu Huang <[email protected]>
Bot-Commit: Rubber Stamper <[email protected]>
Commit-Queue: Chih-Yu Huang <[email protected]>
This CL relands crrev.com/c/2859568 which was reverted in
crrev.com/c/2896200 because of b/188009828.

When draining the V4L2 encode device or handling a drain done callback,
the stream was not properly locked leading to b/186588566. In some
cases this caused virtio_video_buf_done() to be executed before the
V4L2_ENC_CMD_STOP command was fully executed.

Adding locks unfortunately seems to have caused another issue. This CL
reduces the scope of the lock so draining is properly synced, while at
the same time avoiding holding the lock during
virtio_video_queue_eos_event() and v4l2_m2m_buf_done() which might
cause issues.

Original change's description:
> CHROMIUM: virtio_video_enc: Lock stream mutex when draining.
>
> When draining the V4L2 encode device or handling a drain done callback,
> the stream is not properly locked. In some cases this causes the
> virtio_video_buf_done (EOS done) and virtio_video_encoder_cmd
> (restart encoder) functions to be executed before the original
> virtio_video_encoder_cmd call to drain the encoder is even finished
> executing. This leads to the stream ending up in the draining state
> after it has already finished draining.
>
> Extra mutex lock and unlock operations have been introduced here to
> serialize access to the virtio_video_encoder_cmd and
> virtio_video_buf_done functions to avoid the above issue.
>
> Signed-off-by: David Staessens <[email protected]>
>
> BUG=b:186588566,b:151703605,b:187470003
> TEST=android.mediav2.cts.EncoderProfileLevelTest
>
> Change-Id: I35d36b31549a9883eaff1362d8b47653222f5613
> Reviewed-on: http://crrev.com/c/2859568
> Reviewed-by: Keiichi Watanabe <[email protected]>
> Reviewed-by: David Stevens <[email protected]>
> Reviewed-by: Alexandre Courbot <[email protected]>
> Commit-Queue: David Staessens <[email protected]>
> Tested-by: David Staessens <[email protected]>

Signed-off-by: David Staessens <[email protected]>

BUG=b:188009828,b:186588566,b:151703605,b:187470003
TEST=android.mediav2.cts.EncoderProfileLevelTest
     com.google.android.exoplayer.gts.DashTest#testH264Adaptive

Change-Id: I4b7948f24e0634b857405bf84ee08cabbdbcf3cd
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2900138
Reviewed-by: Alexandre Courbot <[email protected]>
Commit-Queue: David Staessens <[email protected]>
Tested-by: David Staessens <[email protected]>
Gnurou and others added 15 commits October 12, 2022 14:14
pr_debug will only print if DEBUG is defined, which does not
correspond to what we want to do here.
This is useful so the guest can make decisions based on the capabilities
of a given host backend, i.e. using modifiers on frame buffers when it
knows the host can support them.

BUG=b:169295147
TEST=v4l2-ctl reports host device name as card type.

Change-Id: I84813dd48a0a0b1795616faf2edfbd6066e45d60
The virtio-video device has a "no format currently set" state. Translate
this into the first supported format as V4L2 devices must always have an
active format.

Signed-off-by: Alexandre Courbot <[email protected]>

BUG=b:161774071
This is needed for the VVU vsock device, since in VVU the communication
is between the device and sibling VMs (and not the host)
This is needed for testing virtio-vhost-user devices.
Do not call set_params during REQBUFS(0). If the count is zero, the
queue is freed and we should not do any further processing. Calling
set_params in this case is correctly identified by CrosVM as a mistake
and the following error is emitted:

[ERROR devices::virtio::video::decoder] parameter for input cannot be
changed once decoding started

Signed-off-by: Daniel Almeida <[email protected]>
The V4L2 m2m stateful specification says the following w.r.t VIDIOC_REQBUFS:

'The actual number of allocated buffers may differ from the count given. The
client must check the updated value of count after the call returns.'

The VIDIOC_REQBUFS documentation explicitly mentions that implementers are
free to allocate more buffers if needed, i.e.:

'A larger number is also possible when the driver requires more buffers to
function correctly.'

Thus, enforce a minimum number of capture buffers as per
V4L2_CID_MIN_BUFFERS_FOR_CAPTURE. This is very important in a zero-copy
workflow where the host is importing dma-bufs to serve as backing memory for
video data.

If the number of capture buffers allocated is not enough, it can stall or
corrupt the actual video decoder in the host, as codecs need a minimum number
of buffers to function. This is related to the DPB size, among other
codec-specific considerations.

Signed-off-by: Daniel Almeida <[email protected]>
@Gnurou
Copy link
Owner

Gnurou commented Nov 16, 2022

Thanks! I have cherry-picked and signed these commits, so closing this PR. 👍

@Gnurou Gnurou closed this Nov 16, 2022
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
Syzbot reported the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0 Not tainted
------------------------------------------------------
syz-executor307/3029 is trying to acquire lock:
ffff0000c02525d8 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x54/0xb4 mm/memory.c:5576

but task is already holding lock:
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (btrfs-root-00){++++}-{3:3}:
       down_read_nested+0x64/0x84 kernel/locking/rwsem.c:1624
       __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
       btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
       btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
       btrfs_search_slot_get_root+0x74/0x338 fs/btrfs/ctree.c:1637
       btrfs_search_slot+0x1b0/0xfd8 fs/btrfs/ctree.c:1944
       btrfs_update_root+0x6c/0x5a0 fs/btrfs/root-tree.c:132
       commit_fs_roots+0x1f0/0x33c fs/btrfs/transaction.c:1459
       btrfs_commit_transaction+0x89c/0x12d8 fs/btrfs/transaction.c:2343
       flush_space+0x66c/0x738 fs/btrfs/space-info.c:786
       btrfs_async_reclaim_metadata_space+0x43c/0x4e0 fs/btrfs/space-info.c:1059
       process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
       worker_thread+0x340/0x610 kernel/workqueue.c:2436
       kthread+0x12c/0x158 kernel/kthread.c:376
       ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

-> #2 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __mutex_lock_common+0xd4/0xca8 kernel/locking/mutex.c:603
       __mutex_lock kernel/locking/mutex.c:747 [inline]
       mutex_lock_nested+0x38/0x44 kernel/locking/mutex.c:799
       btrfs_record_root_in_trans fs/btrfs/transaction.c:516 [inline]
       start_transaction+0x248/0x944 fs/btrfs/transaction.c:752
       btrfs_start_transaction+0x34/0x44 fs/btrfs/transaction.c:781
       btrfs_create_common+0xf0/0x1b4 fs/btrfs/inode.c:6651
       btrfs_create+0x8c/0xb0 fs/btrfs/inode.c:6697
       lookup_open fs/namei.c:3413 [inline]
       open_last_lookups fs/namei.c:3481 [inline]
       path_openat+0x804/0x11c4 fs/namei.c:3688
       do_filp_open+0xdc/0x1b8 fs/namei.c:3718
       do_sys_openat2+0xb8/0x22c fs/open.c:1313
       do_sys_open fs/open.c:1329 [inline]
       __do_sys_openat fs/open.c:1345 [inline]
       __se_sys_openat fs/open.c:1340 [inline]
       __arm64_sys_openat+0xb0/0xe0 fs/open.c:1340
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
       el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
       do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
       el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
       el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
       el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

-> #1 (sb_internal#2){.+.+}-{0:0}:
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1826 [inline]
       sb_start_intwrite include/linux/fs.h:1948 [inline]
       start_transaction+0x360/0x944 fs/btrfs/transaction.c:683
       btrfs_join_transaction+0x30/0x40 fs/btrfs/transaction.c:795
       btrfs_dirty_inode+0x50/0x140 fs/btrfs/inode.c:6103
       btrfs_update_time+0x1c0/0x1e8 fs/btrfs/inode.c:6145
       inode_update_time fs/inode.c:1872 [inline]
       touch_atime+0x1f0/0x4a8 fs/inode.c:1945
       file_accessed include/linux/fs.h:2516 [inline]
       btrfs_file_mmap+0x50/0x88 fs/btrfs/file.c:2407
       call_mmap include/linux/fs.h:2192 [inline]
       mmap_region+0x7fc/0xc14 mm/mmap.c:1752
       do_mmap+0x644/0x97c mm/mmap.c:1540
       vm_mmap_pgoff+0xe8/0x1d0 mm/util.c:552
       ksys_mmap_pgoff+0x1cc/0x278 mm/mmap.c:1586
       __do_sys_mmap arch/arm64/kernel/sys.c:28 [inline]
       __se_sys_mmap arch/arm64/kernel/sys.c:21 [inline]
       __arm64_sys_mmap+0x58/0x6c arch/arm64/kernel/sys.c:21
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
       el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
       do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
       el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
       el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
       el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

-> #0 (&mm->mmap_lock){++++}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3095 [inline]
       check_prevs_add kernel/locking/lockdep.c:3214 [inline]
       validate_chain kernel/locking/lockdep.c:3829 [inline]
       __lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
       lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
       __might_fault+0x7c/0xb4 mm/memory.c:5577
       _copy_to_user include/linux/uaccess.h:134 [inline]
       copy_to_user include/linux/uaccess.h:160 [inline]
       btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
       btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
       el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
       do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
       el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
       el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
       el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

other info that might help us debug this:

Chain exists of:
  &mm->mmap_lock --> &fs_info->reloc_mutex --> btrfs-root-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-root-00);
                               lock(&fs_info->reloc_mutex);
                               lock(btrfs-root-00);
  lock(&mm->mmap_lock);

 *** DEADLOCK ***

1 lock held by syz-executor307/3029:
 #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
 #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
 #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279

stack backtrace:
CPU: 0 PID: 3029 Comm: syz-executor307 Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
Call trace:
 dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
 show_stack+0x2c/0x54 arch/arm64/kernel/stacktrace.c:163
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
 dump_stack+0x1c/0x58 lib/dump_stack.c:113
 print_circular_bug+0x2c4/0x2c8 kernel/locking/lockdep.c:2053
 check_noncircular+0x14c/0x154 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3095 [inline]
 check_prevs_add kernel/locking/lockdep.c:3214 [inline]
 validate_chain kernel/locking/lockdep.c:3829 [inline]
 __lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
 lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
 __might_fault+0x7c/0xb4 mm/memory.c:5577
 _copy_to_user include/linux/uaccess.h:134 [inline]
 copy_to_user include/linux/uaccess.h:160 [inline]
 btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
 btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
 el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
 do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
 el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
 el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581

We do generally the right thing here, copying the references into a
temporary buffer, however we are still holding the path when we do
copy_to_user from the temporary buffer.  Fix this by freeing the path
before we copy to user space.

Reported-by: [email protected]
CC: [email protected] # 4.19+
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
Recent commit aa626da ("iavf: Detach device during reset task")
removed netif_tx_stop_all_queues() with an assumption that Tx queues
are already stopped by netif_device_detach() in the beginning of
reset task. This assumption is incorrect because during reset
task a potential link event can start Tx queues again.
Revert this change to fix this issue.

Reproducer:
1. Run some Tx traffic (e.g. iperf3) over iavf interface
2. Switch MTU of this interface in a loop

[root@host ~]# cat repro.sh

IF=enp2s0f0v0

iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
sleep 2

while :; do
        for i in 1280 1500 2000 900 ; do
                ip link set $IF mtu $i
                sleep 2
        done
done
[root@host ~]# ./repro.sh

Result:
[  306.199917] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
[  308.205944] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
[  310.103223] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  310.110179] #PF: supervisor write access in kernel mode
[  310.115396] #PF: error_code(0x0002) - not-present page
[  310.120526] PGD 0 P4D 0
[  310.123057] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  310.127408] CPU: 24 PID: 183 Comm: kworker/u64:9 Kdump: loaded Not tainted 6.1.0-rc3+ #2
[  310.135485] Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
[  310.145728] Workqueue: iavf iavf_reset_task [iavf]
[  310.150520] RIP: 0010:iavf_xmit_frame_ring+0xd1/0xf70 [iavf]
[  310.156180] Code: d0 0f 86 da 00 00 00 83 e8 01 0f b7 fa 29 f8 01 c8 39 c6 0f 8f a0 08 00 00 48 8b 45 20 48 8d 14 92 bf 01 00 00 00 4c 8d 3c d0 <49> 89 5f 08 8b 43 70 66 41 89 7f 14 41 89 47 10 f6 83 82 00 00 00
[  310.174918] RSP: 0018:ffffbb5f0082caa0 EFLAGS: 00010293
[  310.180137] RAX: 0000000000000000 RBX: ffff92345471a6e8 RCX: 0000000000000200
[  310.187259] RDX: 0000000000000000 RSI: 000000000000000d RDI: 0000000000000001
[  310.194385] RBP: ffff92341d249000 R08: ffff92434987fcac R09: 0000000000000001
[  310.201509] R10: 0000000011f683b9 R11: 0000000011f50641 R12: 0000000000000008
[  310.208631] R13: ffff923447500000 R14: 0000000000000000 R15: 0000000000000000
[  310.215756] FS:  0000000000000000(0000) GS:ffff92434ee00000(0000) knlGS:0000000000000000
[  310.223835] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  310.229572] CR2: 0000000000000008 CR3: 0000000fbc210004 CR4: 0000000000770ee0
[  310.236696] PKRU: 55555554
[  310.239399] Call Trace:
[  310.241844]  <IRQ>
[  310.243855]  ? dst_alloc+0x5b/0xb0
[  310.247260]  dev_hard_start_xmit+0x9e/0x1f0
[  310.251439]  sch_direct_xmit+0xa0/0x370
[  310.255276]  __qdisc_run+0x13e/0x580
[  310.258848]  __dev_queue_xmit+0x431/0xd00
[  310.262851]  ? selinux_ip_postroute+0x147/0x3f0
[  310.267377]  ip_finish_output2+0x26c/0x540

Fixes: aa626da ("iavf: Detach device during reset task")
Cc: Jacob Keller <[email protected]>
Cc: Patryk Piotrowski <[email protected]>
Cc: SlawomirX Laba <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
After commit aa626da ("iavf: Detach device during reset task")
the device is detached during reset task and re-attached at its end.
The problem occurs when reset task fails because Tx queues are
restarted during device re-attach and this leads later to a crash.

To resolve this issue properly close the net device in cause of
failure in reset task to avoid restarting of tx queues at the end.
Also replace the hacky manipulation with IFF_UP flag by device close
that clears properly both IFF_UP and __LINK_STATE_START flags.
In these case iavf_close() does not do anything because the adapter
state is already __IAVF_DOWN.

Reproducer:
1) Run some Tx traffic (e.g. iperf3) over iavf interface
2) Set VF trusted / untrusted in loop

[root@host ~]# cat repro.sh

PF=enp65s0f0
IF=${PF}v0

ip link set up $IF
ip addr add 192.168.0.2/24 dev $IF
sleep 1

iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
sleep 2

while :; do
        ip link set $PF vf 0 trust on
        ip link set $PF vf 0 trust off
done
[root@host ~]# ./repro.sh

Result:
[ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53
[ 2006.675662] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.689997] iavf 0000:41:01.0: Reset task did not complete, VF disabled
[ 2006.696611] iavf 0000:41:01.0: failed to allocate resources during reinit
[ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.737011] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.764536] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.768919] BUG: kernel NULL pointer dereference, address: 0000000000000b4a
[ 2006.776358] #PF: supervisor read access in kernel mode
[ 2006.781488] #PF: error_code(0x0000) - not-present page
[ 2006.786620] PGD 0 P4D 0
[ 2006.789152] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.793501] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2
[ 2006.805668] Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
[ 2006.815915] RIP: 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf]
[ 2006.821028] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00
[ 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293
[ 2006.850397] RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00
[ 2006.856327] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.857523] RDX: 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0
[ 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09: 0000000000000001
[ 2006.857526] R10: ffff9d154de45940 R11: 0000000000000000 R12: 0000000000000002
[ 2006.883600] R13: ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380
[ 2006.885840] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.890725] FS:  0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
[ 2006.890727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4: 0000000000770ee0
[ 2006.916543] PKRU: 55555554
[ 2006.918254] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.919248] Call Trace:
[ 2006.919250]  <IRQ>
[ 2006.919252]  dev_hard_start_xmit+0x9e/0x1f0
[ 2006.932587]  sch_direct_xmit+0xa0/0x370
[ 2006.936424]  __dev_queue_xmit+0x7af/0xd00
[ 2006.940429]  ip_finish_output2+0x26c/0x540
[ 2006.944519]  ip_output+0x71/0x110
[ 2006.947831]  ? __ip_finish_output+0x2b0/0x2b0
[ 2006.952180]  __ip_queue_xmit+0x16d/0x400
[ 2006.952721] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.956098]  __tcp_transmit_skb+0xa96/0xbf0
[ 2006.965148]  __tcp_retransmit_skb+0x174/0x860
[ 2006.969499]  ? cubictcp_cwnd_event+0x40/0x40
[ 2006.973769]  tcp_retransmit_skb+0x14/0xb0
...

Fixes: aa626da ("iavf: Detach device during reset task")
Cc: Jacob Keller <[email protected]>
Cc: Patryk Piotrowski <[email protected]>
Cc: SlawomirX Laba <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
…kernel/git/at91/linux into arm/fixes

AT91 fixes for 6.1 #2

It contains:
- fix UDC on at91sam9g20ek boards by adding vbus pin

* tag 'at91-fixes-6.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
ldev->lock is used to serialize lag change operations. Since multiport
eswtich functionality was added, we now change the mode dynamically.
However, acquiring ldev->lock is not allowed as it could possibly lead
to a deadlock as reported by the lockdep mechanism.

[  836.154963] WARNING: possible circular locking dependency detected
[  836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted
[  836.156549] ------------------------------------------------------
[  836.157418] handler1/12198 is trying to acquire lock:
[  836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.159575]
[  836.159575] but task is already holding lock:
[  836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[  836.161669] which lock already depends on the new lock.
[  836.162905]
[  836.162905] the existing dependency chain (in reverse order) is:
[  836.164008] -> #3 (&block->cb_lock){++++}-{3:3}:
[  836.164946]        down_write+0x25/0x60
[  836.165548]        tcf_block_get_ext+0x1c6/0x5d0
[  836.166253]        ingress_init+0x74/0xa0 [sch_ingress]
[  836.167028]        qdisc_create.constprop.0+0x130/0x5e0
[  836.167805]        tc_modify_qdisc+0x481/0x9f0
[  836.168490]        rtnetlink_rcv_msg+0x16e/0x5a0
[  836.169189]        netlink_rcv_skb+0x4e/0xf0
[  836.169861]        netlink_unicast+0x190/0x250
[  836.170543]        netlink_sendmsg+0x243/0x4b0
[  836.171226]        sock_sendmsg+0x33/0x40
[  836.171860]        ____sys_sendmsg+0x1d1/0x1f0
[  836.172535]        ___sys_sendmsg+0xab/0xf0
[  836.173183]        __sys_sendmsg+0x51/0x90
[  836.173836]        do_syscall_64+0x3d/0x90
[  836.174471]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  836.175282]

[  836.175282] -> #2 (rtnl_mutex){+.+.}-{3:3}:
[  836.176190]        __mutex_lock+0x6b/0xf80
[  836.176830]        register_netdevice_notifier+0x21/0x120
[  836.177631]        rtnetlink_init+0x2d/0x1e9
[  836.178289]        netlink_proto_init+0x163/0x179
[  836.178994]        do_one_initcall+0x63/0x300
[  836.179672]        kernel_init_freeable+0x2cb/0x31b
[  836.180403]        kernel_init+0x17/0x140
[  836.181035]        ret_from_fork+0x1f/0x30

 [  836.181687] -> #1 (pernet_ops_rwsem){+.+.}-{3:3}:
[  836.182628]        down_write+0x25/0x60
[  836.183235]        unregister_netdevice_notifier+0x1c/0xb0
[  836.184029]        mlx5_ib_roce_cleanup+0x94/0x120 [mlx5_ib]
[  836.184855]        __mlx5_ib_remove+0x35/0x60 [mlx5_ib]
[  836.185637]        mlx5_eswitch_unregister_vport_reps+0x22f/0x440 [mlx5_core]
[  836.186698]        auxiliary_bus_remove+0x18/0x30
[  836.187409]        device_release_driver_internal+0x1f6/0x270
[  836.188253]        bus_remove_device+0xef/0x160
[  836.188939]        device_del+0x18b/0x3f0
[  836.189562]        mlx5_rescan_drivers_locked+0xd6/0x2d0 [mlx5_core]
[  836.190516]        mlx5_lag_remove_devices+0x69/0xe0 [mlx5_core]
[  836.191414]        mlx5_do_bond_work+0x441/0x620 [mlx5_core]
[  836.192278]        process_one_work+0x25c/0x590
[  836.192963]        worker_thread+0x4f/0x3d0
[  836.193609]        kthread+0xcb/0xf0
[  836.194189]        ret_from_fork+0x1f/0x30

[  836.194826] -> #0 (&ldev->lock){+.+.}-{3:3}:
[  836.195734]        __lock_acquire+0x15b8/0x2a10
[  836.196426]        lock_acquire+0xce/0x2d0
[  836.197057]        __mutex_lock+0x6b/0xf80
[  836.197708]        mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.198575]        tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
[  836.199467]        parse_tc_actions+0x168/0x5a0 [mlx5_core]
[  836.200340]        __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
[  836.201241]        mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
[  836.202187]        tc_setup_cb_add+0xd7/0x200
[  836.202856]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  836.203739]        fl_change+0xbbe/0x1730 [cls_flower]
[  836.204501]        tc_new_tfilter+0x407/0xd90
[  836.205168]        rtnetlink_rcv_msg+0x406/0x5a0
[  836.205877]        netlink_rcv_skb+0x4e/0xf0
[  836.206535]        netlink_unicast+0x190/0x250
[  836.207217]        netlink_sendmsg+0x243/0x4b0
[  836.207915]        sock_sendmsg+0x33/0x40
[  836.208538]        ____sys_sendmsg+0x1d1/0x1f0
[  836.209219]        ___sys_sendmsg+0xab/0xf0
[  836.209878]        __sys_sendmsg+0x51/0x90
[  836.210510]        do_syscall_64+0x3d/0x90
[  836.211137]        entry_SYSCALL_64_after_hwframe+0x46/0xb0

[  836.211954] other info that might help us debug this:
[  836.213174] Chain exists of:
[  836.213174]   &ldev->lock --> rtnl_mutex --> &block->cb_lock
   836.214650]  Possible unsafe locking scenario:
[  836.214650]
[  836.215574]        CPU0                    CPU1
[  836.216255]        ----                    ----
[  836.216943]   lock(&block->cb_lock);
[  836.217518]                                lock(rtnl_mutex);
[  836.218348]                                lock(&block->cb_lock);
[  836.219212]   lock(&ldev->lock);
[  836.219758]
[  836.219758]  *** DEADLOCK ***
[  836.219758]
 [  836.220747] 2 locks held by handler1/12198:
[  836.221390]  #0: ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[  836.222646]  #1: ffff88810c9a92c0 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]

[  836.224063] stack backtrace:
[  836.224799] CPU: 6 PID: 12198 Comm: handler1 Not tainted 5.19.0-rc5_net_56b7df2 #1
[  836.225923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  836.227476] Call Trace:
[  836.227929]  <TASK>
[  836.228332]  dump_stack_lvl+0x57/0x7d
[  836.228924]  check_noncircular+0x104/0x120
[  836.229562]  __lock_acquire+0x15b8/0x2a10
[  836.230201]  lock_acquire+0xce/0x2d0
[  836.230776]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.231614]  ? find_held_lock+0x2b/0x80
[  836.232221]  __mutex_lock+0x6b/0xf80
[  836.232799]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.233636]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.234451]  ? xa_load+0xc3/0x190
[  836.234995]  mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
[  836.235803]  tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
[  836.236636]  ? tc_act_can_offload_mirred+0x135/0x210 [mlx5_core]
[  836.237550]  parse_tc_actions+0x168/0x5a0 [mlx5_core]
[  836.238364]  __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
[  836.239202]  mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
[  836.240076]  ? lock_acquire+0xce/0x2d0
[  836.240668]  ? tc_setup_cb_add+0x5b/0x200
[  836.241294]  tc_setup_cb_add+0xd7/0x200
[  836.241917]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[  836.242709]  fl_change+0xbbe/0x1730 [cls_flower]
[  836.243408]  tc_new_tfilter+0x407/0xd90
[  836.244043]  ? tc_del_tfilter+0x880/0x880
[  836.244672]  rtnetlink_rcv_msg+0x406/0x5a0
[  836.245310]  ? netlink_deliver_tap+0x7a/0x4b0
[  836.245991]  ? if_nlmsg_stats_size+0x2b0/0x2b0
[  836.246675]  netlink_rcv_skb+0x4e/0xf0
[  836.258046]  netlink_unicast+0x190/0x250
[  836.258669]  netlink_sendmsg+0x243/0x4b0
[  836.259288]  sock_sendmsg+0x33/0x40
[  836.259857]  ____sys_sendmsg+0x1d1/0x1f0
[  836.260473]  ___sys_sendmsg+0xab/0xf0
[  836.261064]  ? lock_acquire+0xce/0x2d0
[  836.261669]  ? find_held_lock+0x2b/0x80
[  836.262272]  ? __fget_files+0xb9/0x190
[  836.262871]  ? __fget_files+0xd3/0x190
[  836.263462]  __sys_sendmsg+0x51/0x90
[  836.264064]  do_syscall_64+0x3d/0x90
[  836.264652]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  836.265425] RIP: 0033:0x7fdbe5e2677d

[  836.266012] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ba ee
ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ee ee ff ff 48
[  836.268485] RSP: 002b:00007fdbe48a75a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  836.269598] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fdbe5e2677d
[  836.270576] RDX: 0000000000000000 RSI: 00007fdbe48a7640 RDI: 000000000000003c
[  836.271565] RBP: 00007fdbe48a8368 R08: 0000000000000000 R09: 0000000000000000
[  836.272546] R10: 00007fdbe48a84b0 R11: 0000000000000293 R12: 0000557bd17dc860
[  836.273527] R13: 0000000000000000 R14: 0000557bd17dc860 R15: 00007fdbe48a7640

[  836.274521]  </TASK>

To avoid using mode holding ldev->lock in the configure flow, we queue a
work to the lag workqueue and cease wait on a completion object.

In addition, we remove the lock from mlx5_lag_do_mirred() since it is
not really protecting anything.

It should be noted that an actual deadlock has not been observed.

Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
When logging an inode in full mode, or when logging xattrs or when logging
the dir index items of a directory, we are modifying the log tree while
holding a read lock on a leaf from the fs/subvolume tree. This can lead to
a deadlock in rare circumstances, but it is a real possibility, and it was
recently reported by syzbot with the following trace from lockdep:

   WARNING: possible circular locking dependency detected
   6.1.0-rc5-next-20221116-syzkaller #0 Not tainted
   ------------------------------------------------------
   syz-executor.1/16154 is trying to acquire lock:
   ffff88807e3084a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256

   but task is already holding lock:
   ffff88807df33078 (btrfs-log-00){++++}-{3:3}, at: __btrfs_tree_lock+0x32/0x3d0 fs/btrfs/locking.c:197

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #2 (btrfs-log-00){++++}-{3:3}:
          down_read_nested+0x9e/0x450 kernel/locking/rwsem.c:1634
          __btrfs_tree_read_lock+0x32/0x350 fs/btrfs/locking.c:135
          btrfs_tree_read_lock fs/btrfs/locking.c:141 [inline]
          btrfs_read_lock_root_node+0x82/0x3a0 fs/btrfs/locking.c:280
          btrfs_search_slot_get_root fs/btrfs/ctree.c:1678 [inline]
          btrfs_search_slot+0x3ca/0x2c70 fs/btrfs/ctree.c:1998
          btrfs_lookup_csum+0x116/0x3f0 fs/btrfs/file-item.c:209
          btrfs_csum_file_blocks+0x40e/0x1370 fs/btrfs/file-item.c:1021
          log_csums.isra.0+0x244/0x2d0 fs/btrfs/tree-log.c:4258
          copy_items.isra.0+0xbfb/0xed0 fs/btrfs/tree-log.c:4403
          copy_inode_items_to_log+0x13d6/0x1d90 fs/btrfs/tree-log.c:5873
          btrfs_log_inode+0xb19/0x4680 fs/btrfs/tree-log.c:6495
          btrfs_log_inode_parent+0x890/0x2a20 fs/btrfs/tree-log.c:6982
          btrfs_log_dentry_safe+0x59/0x80 fs/btrfs/tree-log.c:7083
          btrfs_sync_file+0xa41/0x13c0 fs/btrfs/file.c:1921
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   -> #1 (btrfs-tree-00){++++}-{3:3}:
          __lock_release kernel/locking/lockdep.c:5382 [inline]
          lock_release+0x371/0x810 kernel/locking/lockdep.c:5688
          up_write+0x2a/0x520 kernel/locking/rwsem.c:1614
          btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline]
          btrfs_unlock_up_safe+0x1e3/0x290 fs/btrfs/locking.c:238
          search_leaf fs/btrfs/ctree.c:1832 [inline]
          btrfs_search_slot+0x265e/0x2c70 fs/btrfs/ctree.c:2074
          btrfs_insert_empty_items+0xbd/0x1c0 fs/btrfs/ctree.c:4133
          btrfs_insert_delayed_item+0x826/0xfa0 fs/btrfs/delayed-inode.c:746
          btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline]
          __btrfs_commit_inode_delayed_items fs/btrfs/delayed-inode.c:1111 [inline]
          __btrfs_run_delayed_items+0x280/0x590 fs/btrfs/delayed-inode.c:1153
          flush_space+0x147/0xe90 fs/btrfs/space-info.c:728
          btrfs_async_reclaim_metadata_space+0x541/0xc10 fs/btrfs/space-info.c:1086
          process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
          worker_thread+0x669/0x1090 kernel/workqueue.c:2436
          kthread+0x2e8/0x3a0 kernel/kthread.c:376
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

   -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
          check_prev_add kernel/locking/lockdep.c:3097 [inline]
          check_prevs_add kernel/locking/lockdep.c:3216 [inline]
          validate_chain kernel/locking/lockdep.c:3831 [inline]
          __lock_acquire+0x2a43/0x56d0 kernel/locking/lockdep.c:5055
          lock_acquire kernel/locking/lockdep.c:5668 [inline]
          lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
          __mutex_lock_common kernel/locking/mutex.c:603 [inline]
          __mutex_lock+0x12f/0x1360 kernel/locking/mutex.c:747
          __btrfs_release_delayed_node.part.0+0xa1/0xf30 fs/btrfs/delayed-inode.c:256
          __btrfs_release_delayed_node fs/btrfs/delayed-inode.c:251 [inline]
          btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline]
          btrfs_remove_delayed_node+0x52/0x60 fs/btrfs/delayed-inode.c:1285
          btrfs_evict_inode+0x511/0xf30 fs/btrfs/inode.c:5554
          evict+0x2ed/0x6b0 fs/inode.c:664
          dispose_list+0x117/0x1e0 fs/inode.c:697
          prune_icache_sb+0xeb/0x150 fs/inode.c:896
          super_cache_scan+0x391/0x590 fs/super.c:106
          do_shrink_slab+0x464/0xce0 mm/vmscan.c:843
          shrink_slab_memcg mm/vmscan.c:912 [inline]
          shrink_slab+0x388/0x660 mm/vmscan.c:991
          shrink_node_memcgs mm/vmscan.c:6088 [inline]
          shrink_node+0x93d/0x1f30 mm/vmscan.c:6117
          shrink_zones mm/vmscan.c:6355 [inline]
          do_try_to_free_pages+0x3b4/0x17a0 mm/vmscan.c:6417
          try_to_free_mem_cgroup_pages+0x3a4/0xa70 mm/vmscan.c:6732
          reclaim_high.constprop.0+0x182/0x230 mm/memcontrol.c:2393
          mem_cgroup_handle_over_high+0x190/0x520 mm/memcontrol.c:2578
          try_charge_memcg+0xe0c/0x12f0 mm/memcontrol.c:2816
          try_charge mm/memcontrol.c:2827 [inline]
          charge_memcg+0x90/0x3b0 mm/memcontrol.c:6889
          __mem_cgroup_charge+0x2b/0x90 mm/memcontrol.c:6910
          mem_cgroup_charge include/linux/memcontrol.h:667 [inline]
          __filemap_add_folio+0x615/0xf80 mm/filemap.c:852
          filemap_add_folio+0xaf/0x1e0 mm/filemap.c:934
          __filemap_get_folio+0x389/0xd80 mm/filemap.c:1976
          pagecache_get_page+0x2e/0x280 mm/folio-compat.c:104
          find_or_create_page include/linux/pagemap.h:612 [inline]
          alloc_extent_buffer+0x2b9/0x1580 fs/btrfs/extent_io.c:4588
          btrfs_init_new_buffer fs/btrfs/extent-tree.c:4869 [inline]
          btrfs_alloc_tree_block+0x2e1/0x1320 fs/btrfs/extent-tree.c:4988
          __btrfs_cow_block+0x3b2/0x1420 fs/btrfs/ctree.c:440
          btrfs_cow_block+0x2fa/0x950 fs/btrfs/ctree.c:595
          btrfs_search_slot+0x11b0/0x2c70 fs/btrfs/ctree.c:2038
          btrfs_update_root+0xdb/0x630 fs/btrfs/root-tree.c:137
          update_log_root fs/btrfs/tree-log.c:2841 [inline]
          btrfs_sync_log+0xbfb/0x2870 fs/btrfs/tree-log.c:3064
          btrfs_sync_file+0xdb9/0x13c0 fs/btrfs/file.c:1947
          vfs_fsync_range+0x13e/0x230 fs/sync.c:188
          generic_write_sync include/linux/fs.h:2856 [inline]
          iomap_dio_complete+0x73a/0x920 fs/iomap/direct-io.c:128
          btrfs_direct_write fs/btrfs/file.c:1536 [inline]
          btrfs_do_write_iter+0xba2/0x1470 fs/btrfs/file.c:1668
          call_write_iter include/linux/fs.h:2160 [inline]
          do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
          do_iter_write+0x182/0x700 fs/read_write.c:861
          vfs_iter_write+0x74/0xa0 fs/read_write.c:902
          iter_file_splice_write+0x745/0xc90 fs/splice.c:686
          do_splice_from fs/splice.c:764 [inline]
          direct_splice_actor+0x114/0x180 fs/splice.c:931
          splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886
          do_splice_direct+0x1ab/0x280 fs/splice.c:974
          do_sendfile+0xb19/0x1270 fs/read_write.c:1255
          __do_sys_sendfile64 fs/read_write.c:1323 [inline]
          __se_sys_sendfile64 fs/read_write.c:1309 [inline]
          __x64_sys_sendfile64+0x259/0x2c0 fs/read_write.c:1309
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x63/0xcd

   other info that might help us debug this:

   Chain exists of:
     &delayed_node->mutex --> btrfs-tree-00 --> btrfs-log-00

   Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(btrfs-log-00);
                                  lock(btrfs-tree-00);
                                  lock(btrfs-log-00);
     lock(&delayed_node->mutex);

Holding a read lock on a leaf from a fs/subvolume tree creates a nasty
lock dependency when we are COWing extent buffers for the log tree and we
have two tasks modifying the log tree, with each one in one of the
following 2 scenarios:

1) Modifying the log tree triggers an extent buffer allocation while
   holding a write lock on a parent extent buffer from the log tree.
   Allocating the pages for an extent buffer, or the extent buffer
   struct, can trigger inode eviction and finally the inode eviction
   will trigger a release/remove of a delayed node, which requires
   taking the delayed node's mutex;

2) Allocating a metadata extent for a log tree can trigger the async
   reclaim thread and make us wait for it to release enough space and
   unblock our reservation ticket. The reclaim thread can start flushing
   delayed items, and that in turn results in the need to lock delayed
   node mutexes and in the need to write lock extent buffers of a
   subvolume tree - all this while holding a write lock on the parent
   extent buffer in the log tree.

So one task in scenario 1) running in parallel with another task in
scenario 2) could lead to a deadlock, one wanting to lock a delayed node
mutex while having a read lock on a leaf from the subvolume, while the
other is holding the delayed node's mutex and wants to write lock the same
subvolume leaf for flushing delayed items.

Fix this by cloning the leaf of the fs/subvolume tree, release/unlock the
fs/subvolume leaf and use the clone leaf instead.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 6.0+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Gnurou pushed a commit that referenced this pull request Nov 30, 2022
Matt reported a splat at msk close time:

    BUG: sleeping function called from invalid context at net/mptcp/protocol.c:2877
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 155, name: packetdrill
    preempt_count: 201, expected: 0
    RCU nest depth: 0, expected: 0
    4 locks held by packetdrill/155:
    #0: ffff888001536990 (&sb->s_type->i_mutex_key#6){+.+.}-{3:3}, at: __sock_release (net/socket.c:650)
    #1: ffff88800b498130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
    #2: ffff88800b49a130 (sk_lock-AF_INET/1){+.+.}-{0:0}, at: __mptcp_close_ssk (net/mptcp/protocol.c:2363)
    #3: ffff88800b49a0b0 (slock-AF_INET){+...}-{2:2}, at: __lock_sock_fast (include/net/sock.h:1820)
    Preemption disabled at:
    0x0
    CPU: 1 PID: 155 Comm: packetdrill Not tainted 6.1.0-rc5 torvalds#365
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
    Call Trace:
    <TASK>
    dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
    __might_resched.cold (kernel/sched/core.c:9891)
    __mptcp_destroy_sock (include/linux/kernel.h:110)
    __mptcp_close (net/mptcp/protocol.c:2959)
    mptcp_subflow_queue_clean (include/net/sock.h:1777)
    __mptcp_close_ssk (net/mptcp/protocol.c:2363)
    mptcp_destroy_common (net/mptcp/protocol.c:3170)
    mptcp_destroy (include/net/sock.h:1495)
    __mptcp_destroy_sock (net/mptcp/protocol.c:2886)
    __mptcp_close (net/mptcp/protocol.c:2959)
    mptcp_close (net/mptcp/protocol.c:2974)
    inet_release (net/ipv4/af_inet.c:432)
    __sock_release (net/socket.c:651)
    sock_close (net/socket.c:1367)
    __fput (fs/file_table.c:320)
    task_work_run (kernel/task_work.c:181 (discriminator 1))
    exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
    syscall_exit_to_user_mode (kernel/entry/common.c:130)
    do_syscall_64 (arch/x86/entry/common.c:87)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

We can't call mptcp_close under the 'fast' socket lock variant, replace
it with a sock_lock_nested() as the relevant code is already under the
listening msk socket lock protection.

Reported-by: Matthieu Baerts <[email protected]>
Closes: multipath-tcp/mptcp_net-next#316
Fixes: 30e51b9 ("mptcp: fix unreleased socket in accept queue")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Gnurou pushed a commit that referenced this pull request Dec 16, 2022
We shouldn't be calling runtime PM APIs from within the genpd
enable/disable path for a couple reasons.

First, this causes an AA lockdep splat[1] because genpd can call into
genpd code again while holding the genpd lock.

WARNING: possible recursive locking detected
5.19.0-rc2-lockdep+ torvalds#7 Not tainted
--------------------------------------------
kworker/2:1/49 is trying to acquire lock:
ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

but task is already holding lock:
ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&genpd->mlock);
  lock(&genpd->mlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/2:1/49:
 #0: 74ffff80811a5748 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x320/0x5fc
 #1: ffffffc008537cf8 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x354/0x5fc
 #2: ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

stack backtrace:
CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 5.19.0-rc2-lockdep+ torvalds#7
Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT)
Workqueue: pm genpd_power_off_work_fn
Call trace:
 dump_backtrace+0x1a0/0x200
 show_stack+0x24/0x30
 dump_stack_lvl+0x7c/0xa0
 dump_stack+0x18/0x44
 __lock_acquire+0xb38/0x3634
 lock_acquire+0x180/0x2d4
 __mutex_lock_common+0x118/0xe30
 mutex_lock_nested+0x70/0x7c
 genpd_lock_mtx+0x24/0x30
 genpd_runtime_suspend+0x2f0/0x414
 __rpm_callback+0xdc/0x1b8
 rpm_callback+0x4c/0xcc
 rpm_suspend+0x21c/0x5f0
 rpm_idle+0x17c/0x1e0
 __pm_runtime_idle+0x78/0xcc
 gdsc_disable+0x24c/0x26c
 _genpd_power_off+0xd4/0x1c4
 genpd_power_off+0x2d8/0x41c
 genpd_power_off_work_fn+0x60/0x94
 process_one_work+0x398/0x5fc
 worker_thread+0x42c/0x6c4
 kthread+0x194/0x1b4
 ret_from_fork+0x10/0x20

Second, this confuses runtime PM on CoachZ for the camera devices by
causing the camera clock controller's runtime PM usage_count to go
negative after resuming from suspend. This is because runtime PM is
being used on the clock controller while runtime PM is disabled for the
device.

The reason for the negative count is because a GDSC is represented as a
genpd and each genpd that is attached to a device is resumed during the
noirq phase of system wide suspend/resume (see the noirq suspend ops
assignment in pm_genpd_init() for more details). The camera GDSCs are
attached to camera devices with the 'power-domains' property in DT.
Every device has runtime PM disabled in the late system suspend phase
via __device_suspend_late(). Runtime PM is not usable until runtime PM
is enabled in device_resume_early(). The noirq phases run after the
'late' and before the 'early' phase of suspend/resume. When the genpds
are resumed in genpd_resume_noirq(), we call down into gdsc_enable()
that calls pm_runtime_resume_and_get() and that returns -EACCES to
indicate failure to resume because runtime PM is disabled for all
devices.

Upon closer inspection, calling runtime PM APIs like this in the GDSC
driver doesn't make sense. It was intended to make sure the GDSC for the
clock controller providing other GDSCs was enabled, specifically the
MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
that GDSC register accesses succeeded. That will already happen because
we make the 'dev->pm_domain' a parent domain of each GDSC we register in
gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
are accessed, we'll enable the parent domain (in this specific case
MMCX).

We also remove any getting of runtime PM during registration, because
when a genpd is registered it increments the count on the parent if the
genpd itself is already enabled.

Cc: Dmitry Baryshkov <[email protected]>
Cc: Johan Hovold <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Taniya Das <[email protected]>
Cc: Satya Priya <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Reported-by: Stephen Boyd <[email protected]>
Link: https://lore.kernel.org/r/CAE-0n52xbZeJ66RaKwggeRB57fUAwjvxGxfFMKOKJMKVyFTe+w@mail.gmail.com [1]
Fixes: 1b77183 ("clk: qcom: gdsc: enable optional power domain support")
Signed-off-by: Stephen Boyd <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Johan Hovold <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Gnurou pushed a commit that referenced this pull request Dec 16, 2022
test_bpf tail call tests end up as:

  test_bpf: #0 Tail call leaf jited:1 85 PASS
  test_bpf: #1 Tail call 2 jited:1 111 PASS
  test_bpf: #2 Tail call 3 jited:1 145 PASS
  test_bpf: #3 Tail call 4 jited:1 170 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
  test_bpf: #5 Tail call load/store jited:1
  BUG: Unable to handle kernel data access on write at 0xf1b4e000
  Faulting instruction address: 0xbe86b710
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=4K MMU=Hash PowerMac
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ torvalds#195
  Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
  NIP:  be86b710 LR: be857e88 CTR: be86b704
  REGS: f1b4df20 TRAP: 0300   Not tainted  (6.1.0-rc4+)
  MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28008242  XER: 00000000
  DAR: f1b4e000 DSISR: 42000000
  GPR00: 00000001 f1b4dfe c11d2280 00000000 00000000 00000000 00000002 00000000
  GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
  GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
  GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
  NIP [be86b710] 0xbe86b710
  LR [be857e88] __run_one+0xec/0x264 [test_bpf]
  Call Trace:
  [f1b4dfe] [00000002] 0x2 (unreliable)
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 0000000000000000 ]---

This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee ("bpf, tests: Add load store
test case for tail call")

This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.

This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.

Let's take a new approach. Use register r4 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. This means the input parameter must be in r3, which is more
correct as it is a 32 bits parameter, then tail call better match with
normal BPF function entry, the down side being that we move that input
parameter back and forth between r3 and r4. That can be optimised later.

Doing that also has the advantage of maximising the common parts between
tail calls and a normal function exit.

With the fix, tail call tests are now successfull:

  test_bpf: #0 Tail call leaf jited:1 53 PASS
  test_bpf: #1 Tail call 2 jited:1 115 PASS
  test_bpf: #2 Tail call 3 jited:1 154 PASS
  test_bpf: #3 Tail call 4 jited:1 165 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
  test_bpf: #5 Tail call load/store jited:1 141 PASS
  test_bpf: torvalds#6 Tail call error path, max count reached jited:1 994 PASS
  test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 140975 PASS
  test_bpf: torvalds#8 Tail call error path, NULL target jited:1 110 PASS
  test_bpf: torvalds#9 Tail call error path, index out of range jited:1 69 PASS
  test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

Suggested-by: Naveen N. Rao <[email protected]>
Fixes: 51c66ad ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Tested-by: Naveen N. Rao <[email protected]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu
Gnurou pushed a commit that referenced this pull request Dec 16, 2022
Wei Chen reported a NULL deref in sk_user_ns() [0][1], and Paolo diagnosed
the root cause: in unix_diag_get_exact(), the newly allocated skb does not
have sk. [2]

We must get the user_ns from the NETLINK_CB(in_skb).sk and pass it to
sk_diag_fill().

[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000270
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12bbce067 P4D 12bbce067 PUD 12bc40067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 27942 Comm: syz-executor.0 Not tainted 6.1.0-rc5-next-20221118 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
RIP: 0010:sk_user_ns include/net/sock.h:920 [inline]
RIP: 0010:sk_diag_dump_uid net/unix/diag.c:119 [inline]
RIP: 0010:sk_diag_fill+0x77d/0x890 net/unix/diag.c:170
Code: 89 ef e8 66 d4 2d fd c7 44 24 40 00 00 00 00 49 8d 7c 24 18 e8
54 d7 2d fd 49 8b 5c 24 18 48 8d bb 70 02 00 00 e8 43 d7 2d fd <48> 8b
9b 70 02 00 00 48 8d 7b 10 e8 33 d7 2d fd 48 8b 5b 10 48 8d
RSP: 0018:ffffc90000d67968 EFLAGS: 00010246
RAX: ffff88812badaa48 RBX: 0000000000000000 RCX: ffffffff840d481d
RDX: 0000000000000465 RSI: 0000000000000000 RDI: 0000000000000270
RBP: ffffc90000d679a8 R08: 0000000000000277 R09: 0000000000000000
R10: 0001ffffffffffff R11: 0001c90000d679a8 R12: ffff88812ac03800
R13: ffff88812c87c400 R14: ffff88812ae42210 R15: ffff888103026940
FS:  00007f08b4e6f700(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000270 CR3: 000000012c58b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 unix_diag_get_exact net/unix/diag.c:285 [inline]
 unix_diag_handler_dump+0x3f9/0x500 net/unix/diag.c:317
 __sock_diag_cmd net/core/sock_diag.c:235 [inline]
 sock_diag_rcv_msg+0x237/0x250 net/core/sock_diag.c:266
 netlink_rcv_skb+0x13e/0x250 net/netlink/af_netlink.c:2564
 sock_diag_rcv+0x24/0x40 net/core/sock_diag.c:277
 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
 netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1356
 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1932
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 ____sys_sendmsg+0x38f/0x500 net/socket.c:2476
 ___sys_sendmsg net/socket.c:2530 [inline]
 __sys_sendmsg+0x197/0x230 net/socket.c:2559
 __do_sys_sendmsg net/socket.c:2568 [inline]
 __se_sys_sendmsg net/socket.c:2566 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2566
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x4697f9
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f08b4e6ec48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000077bf80 RCX: 00000000004697f9
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 00000000004d29e9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000077bf80
R13: 0000000000000000 R14: 000000000077bf80 R15: 00007ffdb36bc6c0
 </TASK>
Modules linked in:
CR2: 0000000000000270

[1]: https://lore.kernel.org/netdev/CAO4mrfdvyjFpokhNsiwZiP-wpdSD0AStcJwfKcKQdAALQ9_2Qw@mail.gmail.com/
[2]: https://lore.kernel.org/netdev/[email protected]/

Fixes: cae9910 ("net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.")
Reported-by: syzbot <[email protected]>
Reported-by: Wei Chen <[email protected]>
Diagnosed-by: Paolo Abeni <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Gnurou pushed a commit that referenced this pull request Dec 16, 2022
QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

This patch adds an extra device TLB invalidation for the affected devices,
it is needed to ensure no more posted writes with translated address
following the invalidation completion. Therefore, the ordering is
preserved and data-corruption is prevented.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Gnurou pushed a commit that referenced this pull request Dec 16, 2022
When sending packets between nodes in netns, it calls tipc_lxc_xmit() for
peer node to receive the packets where tipc_sk_mcast_rcv()/tipc_sk_rcv()
might be called, and it's pretty much like in tipc_rcv().

Currently the local 'node rw lock' is held during calling tipc_lxc_xmit()
to protect the peer_net not being freed by another thread. However, when
receiving these packets, tipc_node_add_conn() might be called where the
peer 'node rw lock' is acquired. Then a dead lock warning is triggered by
lockdep detector, although it is not a real dead lock:

    WARNING: possible recursive locking detected
    --------------------------------------------
    conn_server/1086 is trying to acquire lock:
    ffff8880065cb020 (&n->lock#2){++--}-{2:2}, \
                     at: tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]

    but task is already holding lock:
    ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
                     at: tipc_node_xmit+0x285/0xb30 [tipc]

    other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&n->lock#2);
      lock(&n->lock#2);

     *** DEADLOCK ***

     May be due to missing lock nesting notation

    4 locks held by conn_server/1086:
     #0: ffff8880036d1e40 (sk_lock-AF_TIPC){+.+.}-{0:0}, \
                          at: tipc_accept+0x9c0/0x10b0 [tipc]
     #1: ffff8880036d5f80 (sk_lock-AF_TIPC/1){+.+.}-{0:0}, \
                          at: tipc_accept+0x363/0x10b0 [tipc]
     #2: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
                          at: tipc_node_xmit+0x285/0xb30 [tipc]
     #3: ffff888012e13370 (slock-AF_TIPC){+...}-{2:2}, \
                          at: tipc_sk_rcv+0x2da/0x1b40 [tipc]

    Call Trace:
     <TASK>
     dump_stack_lvl+0x44/0x5b
     __lock_acquire.cold.77+0x1f2/0x3d7
     lock_acquire+0x1d2/0x610
     _raw_write_lock_bh+0x38/0x80
     tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
     tipc_sk_finish_conn+0x21e/0x640 [tipc]
     tipc_sk_filter_rcv+0x147b/0x3030 [tipc]
     tipc_sk_rcv+0xbb4/0x1b40 [tipc]
     tipc_lxc_xmit+0x225/0x26b [tipc]
     tipc_node_xmit.cold.82+0x4a/0x102 [tipc]
     __tipc_sendstream+0x879/0xff0 [tipc]
     tipc_accept+0x966/0x10b0 [tipc]
     do_accept+0x37d/0x590

This patch avoids this warning by not holding the 'node rw lock' before
calling tipc_lxc_xmit(). As to protect the 'peer_net', rcu_read_lock()
should be enough, as in cleanup_net() when freeing the netns, it calls
synchronize_rcu() before the free is continued.

Also since tipc_lxc_xmit() is like the RX path in tipc_rcv(), it makes
sense to call it under rcu_read_lock(). Note that the right lock order
must be:

   rcu_read_lock();
   tipc_node_read_lock(n);
   tipc_node_read_unlock(n);
   tipc_lxc_xmit();
   rcu_read_unlock();

instead of:

   tipc_node_read_lock(n);
   rcu_read_lock();
   tipc_node_read_unlock(n);
   tipc_lxc_xmit();
   rcu_read_unlock();

and we have to call tipc_node_read_lock/unlock() twice in
tipc_node_xmit().

Fixes: f73b128 ("tipc: improve throughput between nodes in netns")
Reported-by: Shuang Li <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/5bdd1f8fee9db695cfff4528a48c9b9d0523fb00.1670110641.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants