-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USB SSD disconnects randomly: xHCI host not responding to stop endpoint command #5060
Comments
I've made another try with quirks to disable UAS for the case with The kernel log is a bit different this time, I can't see the message about the unresponsive xHCI host:
What's strange that the last two crash occurred after around 77000 seconds of runtime. This is the log for the previous crash (before disabling UAS):
Probably not a coincidence and I'd assume it depends on the amount of data written during that time. |
Another test, another crash. I've tried a different case, it's an Axagon EE25-XA3, which has an ASMedia ASM1153 (174c:1153). It doesn't support UASP, so I've used BOT without enabling quirks. The result is the same, I'd like to help more by not just looking for dmesg output. How should I progress further? I'm doing another test at the moment and replaced the SSD with an HDD in the XA6 case to see if it changes anything. I've also requested a firmware update from the case vendor, but since another version of the case has the same issue, I don't think it'll behave differently. |
What happens if you use a powered USB3 hub between the Pi and the SSD? |
I haven't tried it yet, this will be the next after the HDD test. After that, I could try disabling ASPM for the USB host controller via Another possibility could be disabling LPM via Edit: the HDD test is still running, it's still stable after 1 day and 12 hours. I'm going to let it run for a while to make sure it stays stable. Edit2: HDD test finished, it ran stable for 4 days. I'm running the external hub test with the SSD now. The hub in question is a RaidSonic Icy Box IB-AC6113. Output of
@P33M It died after 20 hours. The external hub doesn't seem to make a difference. Kernel log:
I'm now trying with disabled LPM. Edit3: it died with LPM disabled. Right now the only working alternative seems to be using an HDD in the case instead of an SSD. @P33M what should I look at next? |
It's really a strange issue. One would think that loading both the USB-SATA bridge and the USB controller should lead to a controller crash, but unfortunately not. I've tried
It doesn't just run fine, but easily produces 34k IOPS which seems reasonable for this configuration. It's a significantly higher load that the Docker containers put on the system, yet it doesn't hang the xHCI controller. The system also seems to be stable during a random 4K read-write test, but it's running only for a few minutes at the time I'm writing this. I use the following command:
Edit: R/W test didn't crash after 1:30 hours. Edit2: I'm running an xHCI trace overnight, it might help if the system crashes. |
I've finished the xHCI tracing. It crashed after 2 days. I saw a few When the file system first detected the error, these event were logged:
After that, just before the controller died, I saw these:
I've uploaded the full log file (1.3 GiB) zipped: https://www.icloud.com/iclouddrive/0fePUI0rqF_J_XGW_2BgkwORg#putty |
It's very likely that this issue has to do something with the one that has been reported more than 2 years ago and still not resolved: #3404 |
Just wanted to chime in and say I'm having the exact same issue (so I won't post lenghy details). It seems we're using the exact same ASMedia chip as well. Raspberry Pi 4B 8GB with external disk case connected to USB 3.0 port using a Kingston SV300S37A240G SSD.
After the error happens, unplugging/plugging the disk makes no difference (the disk is gone and nothing is detected or shows up in dmesg). I have to reboot the device to get the disk detected again. |
@gtirloni Can you try it with a powered USB 3.0 hub just to be sure it's not power related? |
@tomikaa87 sure, just connected them with a powered USB 3.0 hub. It seems to be working and I'm sync'ing about 100GB to it. I'll report back soon. |
@gtirloni thanks. If it's really the same issue, it should not crash when you copy a large data set. It seems to be sensitive to short I/O operations issued during a longer period of time (1-2 days in my case). I couldn't make it crash by copying 200+ GBytes of data. |
I did a fresh install of the previous LTS version of Ubuntu Server (20.04), which has a
I can't see the drive even if I unplug and re-plug it. |
I'm seeing similar error on my Raspberry pi 4B 8GB, connected to Seagate External HDD via Orico USB Hub(externally powered, uses chip VL817) [ 1.422691] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 brian@parmesan:/ $ lsusb -tv p.s. |
@you-are-invited thanks for the info. I tried to find images of the controller board your HDD has and probably found one that might match, and I see an ASMedia controller, but it's possible that your drive has a totally different type (JMicron for instance). Can you somehow check which one you have? Unfortunately the only way to be sure is to look at the chip on the controller board. If you can and want to do this, you should look at the biggest IC with the most legs/solder pads and write down the model number as well (a string that looks like ASM225CM or JMS567 etc.) The strange thing is that I didn't have any issues when I put an HDD in the case, only with SSDs, but I can imagine that another controller might have problems with all kinds of disks. |
Anyways, I followed this sticky: https://forums.raspberrypi.com/viewtopic.php?f=28&t=245931 also indepth post I found related to the same problem: [ 1.453858] usbhid: USB HID core driver Update1: brian@parmesan:/media/jamun $ ls Doesn't look well!! [ 1009.106954] usb 2-1.3: USB disconnect, device number 3 |
@you-are-invited UAS quirks (essentially falling back to BOT protocol) doesn’t seem to help. I’ve tried it multiple times in various configurations, but at the end the xHCI controller died anyway. |
One possibility is a malformed packet over PCIe is causing target or completer aborts (which the xhci controller generally can't recover from). One way in which this can happen is marginal silicon that needs more volts to be stable. What happens if you specify |
I've tried that just forgot to mention, it didn't change anything other than the base temperature of the SoC. I also tried with forced turbo mode to see if CPU clock switching causes it, but nothing. @P33M is there a possibility that this is a bug in the VL805 firmware? If there is, do you have a way to debug it and report it to VIA Labs so they can release an updated version? |
I suppose the acid test would be to swap the Pi4 board for another, keeping all other components exactly the same (and the same VLI firmware version), and seeing if the crash still happens. |
That's doable, I have another Pi4 (rev 1.5, 8GB), but @you-are-invited ran into this problem with that same exact board version. |
Just wanted to report that my RPI 4B 8GB has been running stable for days after connecting the SSD to a powered USB 3.0 hub. |
@P33M tried the 8GB Pi4, but no difference. Same VL805 firmware. |
Well at least that should eliminate a hardware fault within the Pi as a cause. "Crashes randomly after 1-2 days" isn't going to be reproducible here. One thing I note is that in the trace, both reads and writes are simultaneously queued after a period of infrequent activity. Are there FIO settings that cause large UAS queue depths for both read and write (32 tags for each), and does this make things crash faster? |
I'm trying with this, but nothing yet:
I/O depth is 1024 and I'd assume for @P33M unfortunately I didn't find a way to crash the system with FIO. I tried various IO depths ranging from 32 to 1024. Also experimented with numjobs, but the system stayed stable even with 128 jobs. I was absolutely hammering the disk but not a single UAS error was in dmesg. The Docker containers I use put a different load on it. The heaviest one is Grafana with Prometheus. It writes at 400 KB/s in average because Prometheus collects the data samples every 15 seconds. So it's not a constant load, but small bursts. I also tested the system by moving Prometheus's data to a network share and the USB controller still crashed. |
I have met same problems. I connected a 3.0 hub with two 3.0 udisks to my broad and tried "dd if=/run/media/sdb1/b.log of=/run/media/sda1/11.log bs=2MB count=1024". And then problem happend. But I replace the 3.0 hub with a 2.0 hub, it works well. |
here is the log:
|
@yangxiao989 can you show us the output of |
Sry, I don't have a raspberrypi. I just google for the similiar err and then found this page. But when I tried 'lsusb' after xhci died, only usb controller can be seen while hub or udisks were dismissed. |
This repo and tracker is specifically for Raspberry Pi problems only. |
I'm also suffering from this.
Also switched over to USB 2, which isn't a huge loss in my case, since it's pretty much only used for a Nextcloud instance. Would still prefer a proper solution, mind you :). Sadly the log didn't get captured, since it affects my root file system (ZFS), so the best I can do is a screenshot:
|
Im having exactly the same problem:
I found that i have the same ASM1153, but in my case, i had this enclosure for more than a year on the my raspberry pi 4 working without any problem, let me explain: First i had a Raspberry Pi 4 Model B Rev 1.1 4GB using this enclosure alone for more than 4 years without any issues, then like 3 years ago, i attatched a 10TB Seagate external drive, the two lived together without any problem during all these years. 3 months ago, i switched my raspberry to this one: Raspberry Pi 4 Model B Rev 1.5 with 8GB, i just switched everything and everything worked as before. Past week i needed to add a new ssd, so, in order to use all 3 disk with the usb 3 ports of raspsberry pi i bought this hub: https://www.waveshare.com/wiki/USB3.2-Gen1-HUB-2IN-4OUT and as soon as i attatched this hub with the 3rd disk i have been experienced the problem on this issue. My tests:
Im using Official Pi 4 power supply also Here are my command outputs: lsusb -tv:
sudo rpi-eeprom-update:
uname -a:
sudo lshw:
sudo usb-devices:
|
Goodday! It's seems i solve the problem aflter replacing short usb cable to long usb cable. I have ssd usb drive UPDATE: with short usb cable read speed ~270МB/s(i have odroid c4), write speed ~45MB/s; with long sub cable i have 32MB/s read and write speed. Seems there is a bug on high data transfer rate... |
I am getting the same on a Pi5 on Bookworm. I had similar issues back in 2020 when I migrated my NAS to a Pi4 but after upgrading from Kernel 4.x to 5 I've enjoyed issue free usage of my Pi4 as a NAS with 3 external drives permanently plugged into it, and others added at times for adhoc backups - one directly on the Pi4's USB3 port, and the rest via a TP-Link UH700 with external power that I bought for this, plugged into the other Pi4's USB3 port, with no unplanned downtime throughout this period. I am currently in the process of rebuilding my NAS on a Pi5 but after having the new OS and apps installed, configured and running, my first interactions have been unsuccessful. As I was backing up one of the drives to another one, both connected via the hub, I consistently got the hub drives to fully disconnect. The main error seems to be:
and then a set of scary looking errors that, to the best of my understanding, are just a consequence of the drives from the hub no longer being accessible by the system. SmartCTL tests seem to suggest they're all still ok, even after a handful of failed attempts at this. After this error, the remaining mountpoints from the hub drives remain inaccessible (though some of them still show as mounted). The drive that gets disconnected first is the one that isn't being used at the time of the tests - it's not one I'm backing up. The entire behaviors - and troubleshooting steps - are similar to the others' here. This seems to be a potential regression between kernel 5.x and 6.x. What have I done so far:
Additional context: Imaged from the RPiOS Lite 64-bit image from the site. Fully updated and upgraded, including EEPROM. uname:
Drives labels and mountpoints: /dev/sda - connected to the Pi directly, HDD, via the USB 3.0 port I was copying HDDCOPY to HDDCOPY-EXT4 via rsync. The Pi runs perfectly fine if I'm copying between two drives on different USB3.0 ports, or between 3.0 and 2.0 via the hub. If I'm copying via 2 drives connected to the hub, which then connects to the USB3.0 port this happens. To be clear, I'm making full backups of 4TB drives, which does take a while, so the issue can also be related to either data throughput, data volume, or just simply time. The rsync operation doesn't fail during a dry-run, only in a proper sync. It also always seems to fail at a similar point in time. I have not checked for CPU usage or memory, but I'll test that shortly, just to make sure there's nothing else at play. All drives are EXT4. lsusb:
sample logs:
As mentioned, I had no issues for years on my Pi4, and the drives are completely fine - and run well via USB2.0. I've exhausted my troubleshooting options. I am tempted to go back to my Pi4 for my NAS for the time being, though I'd love to start running it on a Pi5. Thank you in advance for your time, and happy to provide more logs or test things out. |
-- removed for noise reduction, updating #5753 |
Same thing is happening to my Pi 5 with m.2 nvme usb enclosure as boot drive, I can consistently reproduce it by running a docker image
|
@zhuoyang Can you move this to #5753 where the Pi5 specific issues are being discussed? Also, looking at the logs, can you share the entire dmesg log? I'm seeing references to the UAS driver being used - if so, have you tried adding quirks to your HDDs first per this post? |
I am not sure if I want to move to #5753 as that issue seems to be related to USB hub whereas I am not using any USB hub show logs
EDIT: adding quirk seems to work, but will continue to monitor and see if it happens again, I am actually seeing it happens even when using USB 2, but didn't managed to capture the dmesg |
Got it - apologies for the suggestion to move it to the other issue, it seemed that they wanted to keep this Pi4 related. That being said, USB3.0 issues are fairly known with the Pi4 bus, and adding quirks is normally the recommendation. See if it works. The way to see if it's properly enabled is to check in lsusb -tv if the driver is uas (non quirks) or usb-storage (quirks, more compatible). Best of luck. |
I have this very same problem after swapping Rpi3 to Rpi4 2G. I have 3 custom boxes with I-Tec 16-port USB hubs, all ports filled with 4-5T USB disks (Seagate, WD, Intenso, Toshiba...). With Rpi3s, all 3 boxes have been working fine for ~2 years 24/7. After swapping to Rpi4 on one of them, errors like 'xhci_hcd 0000:01:00.0: HC died; cleaning up' show up either during boot or some time after.
|
Please check #5753 and see if it's the same. |
Same issue seems to happen here, I couldn't figure out why a SSD drive connected via a SATA to USB adapter was constantly disconnecting. One thing that might be of note is that it only happened when I upgraded to debian 12. lsusb -tv
sudo rpi-eeprom-update
uname -a
|
Thanks for the the quick reply! An update: I get the following error show in dmesg dmesg
I will now be installing the following firmware So far after a reboot all the containers come up while connected to an usb 3 port. uname -a
Unfortunately it didn't help and it crashed also only about half an hour later. dmesg
|
Facing same issue and having tried a lot of setups by now, including:
SSD is a Kingston A400 240Gb USB 2.0 and quirks for USB 3 are working just fine, no disconnects experienced My errors and logs are basically the same as above - happy to post any further info as needed |
I can approve this. On Debian 11 i don't get this error and everything seems working. This statement saved my life :) I was digging into this issue for hours. |
Have anybody tried the workaround @Nullvoid3771 mentioned? I don't want to touch my setup since it's stable for a while now. 😅 |
Polling and interrupts are distinct mechanisms in computer systems for managing communication and synchronization between devices. Polling requires continuously monitoring a device's status, whereas interrupts enable devices to signal the system when immediate attention is required. IRQpolling well it is more demanding will keep your devices status always checked so there is no accidental disconnection should it timeout due to sleep states or bugs that forced a disconnection. The others in grub usbcore.autosuspend=-1 systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target are power management setting to prevent your usb devices from sleeping. This is again more power demanding, but should keep them from timing out due to sleep states. I can’t guarantee these will fix your issue, but as troubleshooting you could try them and just as easily undo them. Unfortunately if the issue is within the bios, drivers or the device itself it might be hard to fix, if you’ve tried different cables, and making sure the device is getting the right power. |
I was experiencing the same issue with some generic usb 3.0 to sata enclosure and Raspberry Pi 4 8GB. Now it works like a dream. |
Just note that using usb-storage.quirks= binds a usb driver which may have slower speeds. :u = IGNORE_UAS (don’t bind to the uas driver) The uas driver is the high speed driver for storage. Ignoring uas I believe drops speeds to usb 2.0 speeds. Different usb id’s here http://www.linux-usb.org/usb.ids dwc_otg.lpm_enable=0. turns off Link Power Management (LPM) dwc_otg.speed=1 Will lock usb speeds to usb 1.0 speeds. |
Thank you for the feedback and I agree, it was late and I was lazy, so I just put there all there was without testing what is exactly making it work. I just run couple of tests and it seems that if I will just leave:
I can still boot the devices - before it just stuck during the boot, but as you mentioned speeds seems to be downgraded (write: ~130MB/s, read: ~186MB/s). I am guessing that it is still better than being not able to boot from ssd at all. Is there a way to use
|
@RoswellCityUK So from what I can see you’re using a sabrent external dock to attach your ssd/hdd to your PI. Sabrent uses its own chipset which could be the problem. usb-SABRENT_SABRENT_DB9876543214E-0:0 With multiple 4 bay drives they would usually appear like this: usb-SABRENT_SABRENT_DB9876543214E-0:0 I’m a bit surprised it’s telling us the disk model because usually that’s obscured. I believe the Sabrent external docks uses scsi lun to identify the drives. If you only have one drive perhaps try a different external dock that is self powered (not by the PI’s usb) I have a sabrent dock and well I can’t say that it’s the issue per say; it’s stability with UAS on the systems pcie chipset is fickle. I think the usb chipsets just suck in the pi tbh (and in my server). A quick google for raspberry pi uas driver shows a bunch of topics with the same issue. So in that respect it working at all with lower speeds is a plus. But I’m not sure how to fully fix your issue. At most for me I’ve managed to make it happen less often by the steps I used and by limiting the two 3.0 usb ports to only using one port with the dock. For me the usb3.0 pcie card would completely disconnect requiring a reboot so my issue probably isn’t quite the same. |
udev handles hot plug events so setting uas driver then retriggering udev might work idk I can’t test it atm. However, if you're looking to manually trigger udev processing or reapply rules for already connected devices after boot, you can:
This should give you the same behavior as udev during the boot process but on demand after the system is already running. |
Thank you for your response. Will test it later as I am bit busy with work atm, but I have managed to test a bit different usb to sata adapter, and the same issue on clean installation, but also setting quirks is making it bootable again.
With the difference that this time I was able to boot once, but the whole RasPi was bahaving like it was throttling a lot. Normally boot was taking less than a minute when using any SD Cards or USB Flash drive, but with that it took something around 15min. It work for a bit, i managed to connect over ssh and then just suddenly crashed to never boot again fully.
I am keen to think that this is mainly caused by the power requirements as specified in the above logs Will test further and let you updated what was the result. |
Also If you notice it stop working check your dmesg logs to see if it spits out an error as it might provide some information as to what caused the issue. |
I also get errors in UAS and USB-Storage mode, using the USB 2.0 port and the USB 3.0 port in the Raspberry Pi 4B. It keeps resetting and then disappears, also one time my NVMe drive got really hot, I have now read this error behavior can even toast some drives when it fails real bad. I have AXAGON USB-C 3.2 Gen 2 adapter, and Samsung 950 PRO 500G in it, that AXAGON appeared as some "JMicron":
Most likely this adapter is not good for Raspberry Pi 4. |
just sharing my experience, maybe someone finds it useful: i also had these issues and the solution was to use a different USB->SATA adapter/enclosure for my HDDs. originally i bought some 4TB Seagate ("Maxtor M3 Portable") external HDDs and they came with their own USB to SATA adapters within the case, but these seemed very unreliable. USB device ID: so i bought different USB->SATA adapters and found 2 that seem to be working.
^ these are also i also tried a third type of USB enclosure, again from AXAGON, USB device ID: so yeah, this is my experience, unfortunately these things seem to be very picky... edit: also one more thing to add: raspberry pis may not be able to supply enough power to the external HDDs via its USB ports, so im also using a USB Y cable (image link), which has 2 USB A ends, one plugged into the rpi, the other one into a dedicated power source (travel power adapter with 3A current), so that the HDD can actually get enough power for its operation. |
Describe the bug
I have intermittent system crashes with the following setup:
The system boots fine, but crashes randomly after 6-24 hours.
dmesg
(logs captured via serial port) shows that the USB disk disconnects.I've ran SSD speeds tests with dd, and successfully read the whole SSD with 300+ MB/s, also written 20+ GB into a test file without problems.
I'm running a few services in Docker, like PiHole, UniFi Controller, Prometheus + Grafana, Blynk local server, Mosquitto MQTT server.
The SSD has an extra partition for ZFS in addition to the standard boot and rootfs partitions. That ZFS volume stores the Docker containers and their data.
My other Pi 4 configuration (8 GB, rev 1.5) uses the same USB 3.0 case, but it runs stable, never had a crash like that. However, it writes less frequently to the SSD since it doesn't run Docker containers.
Steps to reproduce the behaviour
Unfortunately, there are no fix reproduction steps. I've tried disabling USB AutoSuspend via cmdline.txt, but it didn't help. I've also tried a different power supply without luck.
However, the system doesn't crash if I plug the USB disk into a USB 2.0 port.
Device (s)
Raspberry Pi 4 Mod. B
System
/etc/rpi-issue
vcgencmd version
uname -a
sudo rpi-eeprom-update
lsusb -tv
sudo lsusb -vd 174c:55aa
sudo lshw
sudo gdisk -l /dev/sda
Logs
Additional context
Based on the posts under #4930, the two issues maybe connected.
The text was updated successfully, but these errors were encountered: