Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate HCI Reset When Tearing Down and Re-Creating Transport #476

Closed
hkpeprah opened this issue Apr 26, 2024 · 13 comments
Closed

Duplicate HCI Reset When Tearing Down and Re-Creating Transport #476

hkpeprah opened this issue Apr 26, 2024 · 13 comments

Comments

@hkpeprah
Copy link
Contributor

Summary

I am seeing a weird issue where if I tear down a transport, then re-create it, that the device fails to power on. The following example reproduces it. When the second power_on() is invoked, an error is raised (given below), and the attempt fails.

import asyncio
import logging

import bumble.device
import bumble.transport


async def open_transport(transport_name):
    link = await bumble.transport.open_transport_or_link(transport_name)
    dev = bumble.device.Device.with_hci("bumble", "F0:F1:F2:F3:F4:F5", link.source, link.sink)
    await dev.power_on()
    return link, dev


async def close_transport(link, dev):
    await dev.power_off()
    await dev.host.reset()
    await link.close()


async def main():
    logging.basicConfig(level=logging.DEBUG)
    for _ in range(2):
        link, dev = await open_transport("usb:0a12:0001")
        asyncio.sleep(1)
        await close_transport(link, dev)
    print("Done")


if __name__ == "__main__":
    asyncio.run(main())

Error

DEBUG:bumble.transport.usb:USB Device: Bus 001 Device 048: ID 0a12:0001
DEBUG:bumble.transport.usb:selected endpoints: configuration=1, interface=0, setting=0, acl_in=0x82, acl_out=0x02, events_in=0x81,
DEBUG:bumble.transport.usb:auto-detaching kernel driver
DEBUG:bumble.transport.usb:current configuration = 1
DEBUG:bumble.transport.usb:starting USB event loop
DEBUG:bumble.drivers:Probing driver class: rtk
DEBUG:bumble.drivers.rtk:USB device (0A12, 0001) not in known list
DEBUG:bumble.drivers:Probing driver class: intel
DEBUG:bumble.drivers.intel:USB device (0A12, 0001) not in known list
DEBUG:bumble.host:### HOST -> CONTROLLER: HCI_RESET_COMMAND
DEBUG:bumble.host:### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
DEBUG:bumble.host:### HOST -> CONTROLLER: HCI_READ_LOCAL_SUPPORTED_COMMANDS_COMMAND
DEBUG:bumble.host:### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
WARNING:bumble.host:!!! command result mismatch, expected 0x1002 but got 0xC03
Traceback (most recent call last):
  File "simple.py", line 33, in <module>
    asyncio.run(main())
  File "/opt/pyenv/versions/3.8.13/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/pyenv/versions/3.8.13/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "simple.py", line 26, in main
    link, dev = await open_transport("usb:0a12:0001")
  File "simple.py", line 13, in open_transport
    await dev.power_on()
  File "/home/build/.venvs/lib/python3.8/site-packages/bumble/device.py", line 1799, in power_on
    await self.host.reset()
  File "/home/build/.venvs/lib/python3.8/site-packages/bumble/host.py", line 254, in reset
    response.return_parameters.supported_commands, 'little'
AttributeError: 'int' object has no attribute 'supported_commands'

Workaround

I found that adding a second HCI reset command here led to the failure going away: https://github.com/google/bumble/blob/main/bumble/host.py#L239

        # Send a reset command unless a driver has already done so.
        if reset_needed:
            await self.send_command(hci.HCI_Reset_Command(), check_result=True)
	    await self.send_command(hci.HCI_Reset_Command(), check_result=True)
            self.ready = True

Without the second invocation, it always fails. I also found that adding a sleep seems to workaround the issue as well:

        # Send a reset command unless a driver has already done so.
        if reset_needed:
            await self.send_command(hci.HCI_Reset_Command(), check_result=True)
            await asyncio.sleep(0.5)
            self.ready = True

Environment

  • Operating System: Ubuntu 20.04.6 LTS (Focal Fossa)
  • Bluetooth Dongle: Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode) (0a12:0001)
@hkpeprah
Copy link
Contributor Author

This also works:

	response = await self.send_command(
            hci.HCI_Read_Local_Supported_Commands_Command(), check_result=True
        )

        if isinstance(response.return_parameters, int):
            response = await self.send_command(
                hci.HCI_Read_Local_Supported_Commands_Command(), check_result=True
            )

        self.local_supported_commands = int.from_bytes(
            response.return_parameters.supported_commands, 'little'
        )
DEBUG:bumble.host:### HOST -> CONTROLLER: HCI_RESET_COMMAND
DEBUG:bumble.host:### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
DEBUG:bumble.host:### HOST -> CONTROLLER: HCI_READ_LOCAL_SUPPORTED_COMMANDS_COMMAND
DEBUG:bumble.host:### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
WARNING:bumble.host:!!! command result mismatch, expected 0x1002 but got 0xC03
DEBUG:bumble.host:### HOST -> CONTROLLER: HCI_READ_LOCAL_SUPPORTED_COMMANDS_COMMAND
DEBUG:bumble.host:### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_READ_LOCAL_SUPPORTED_COMMANDS_COMMAND
  return_parameters:
    status:             HCI_SUCCESS

@hkpeprah
Copy link
Contributor Author

Changing open_transport() to the following seems to solve my issue:

async def open_transport(transport_name):
    link = await bumble.transport.open_transport_or_link(transport_name)
    dev = bumble.device.Device.with_hci("bumble", "F0:F1:F2:F3:F4:F5", link.source, link.sink)
    await dev.host.send_command(bumble.hci.HCI_Reset_Command(), check_result=True)
    await dev.power_on()
    return link, dev

I can workaround with this for the time being.

@barbibulle
Copy link
Collaborator

This is quite odd. I tried to replicate this on a USB controller, without success (i.e this exact code sequence works fine for me). What's peculiar in your case is that the controller seems to be sending a stale or extra response, so the HCI_READ_LOCAL_SUPPORTED_COMMANDS_COMMAND command gets the a response for HCI_RESET_COMMAND, which of course won't work.
It's also puzzling that if you reset twice (as in your workaround above, where you have an explicit reset before power_on, which has its own reset), you don't get this behavior anymore.
I'll try with different USB controllers to see if I can replicate this problem on one of them. If not, I'll try to find the same controller as the one you're using.

@barbibulle
Copy link
Collaborator

Followup: I did find a USB dongle that identifies with the same USB identifiers (0a12:0001). It may be a from a different manufacturer (mine reports the product name to be USB2.0-BT, the HCI-level info says Barrot Technology Co.,Ltd, and the dongle has nothing written on it, so it's hard to tell exactly who made it).
With that dongle, I observe a different issue: from a cold start, it seems to work fine. On a warm start (for example, launching an application a second time after the first one ran and closed the device), it hangs at the USB level (the RESET command is sent, but nothing is every received back from the dongle). Exiting out of that state comes back to a working state, and so on. So it works every other time.
I've love to try with more devices, to see if this is just a quirk of my specific no-name-brand device, or a pattern that exists across multiple devices. Do you know which manufacturer/model your USB dongle is?
(I also have the suspicion that the USB identifier 0a12:0001 may be used incorrectly by low-cost USB dongle makers. 0a12 is supposed to be the identifier for Cambridge Silicon Radio Ltd, but I'm pretty sure that's not who made the dongle I have)

@hkpeprah
Copy link
Contributor Author

hkpeprah commented Apr 27, 2024

I also have the suspicion that the USB identifier 0a12:0001 may be used incorrectly by low-cost USB dongle makers. 0a12 is supposed to be the identifier for Cambridge Silicon Radio Ltd, but I'm pretty sure that's not who made the dongle I have.

I've seen this with a lot of dongles. I use a TP-Link UB400 on my Mac machine, which also reports its Vendor ID as 0a12 and Product ID as 0001. Oddly enough, the TP-Link UB500 reports its Vendor ID as 2357 (the TP-Link vendor ID).

I think the CSR we have hooked up to the Linux machine is this one from Adafruit, but I'm honestly not sure. It has no markings whatsoever besides CSR 4.0, which a lot of the random ones from Amazon have as well.

Edit: I will double-check to see what the product name is that it reports.

@hkpeprah
Copy link
Contributor Author

The name it reports is CSR BLE.

@hkpeprah
Copy link
Contributor Author

Found another one that is in use: https://www.amazon.com/dp/B084H8R9PB
This should be the one that results in the double HCI reset being necessary.

@hkpeprah
Copy link
Contributor Author

hkpeprah commented May 23, 2024

I'm seeing a different issue with this BLE 5 dongle on Linux: Edimax (Amazon Link)

[D 2024-05-23 21:06:52.045 __init__:get_driver_for_host:59] Probing driver class: rtk
[D 2024-05-23 21:06:52.045 host:send_hci_packet:511] �[34m### HOST -> CONTROLLER�[0m: �[32mHCI_RESET_COMMAND�[0m
[D 2024-05-23 21:06:52.055 host:on_packet:633] reset not done, ignoring packet from controller
[D 2024-05-23 21:07:02.057 usb:terminate:302] waiting for IN[4] transfer cancellation to be done...
[D 2024-05-23 21:07:02.057 usb:terminate:307] IN[4] transfer cancellation done
[D 2024-05-23 21:07:02.060 usb:terminate:302] waiting for IN[2] transfer cancellation to be done...
[D 2024-05-23 21:07:02.060 usb:run:346] USB event loop done
[D 2024-05-23 21:07:02.060 usb:terminate:307] IN[2] transfer cancellation done

This is after tearing down and re-creating the transport, similar to above. Seems to stall in power_on() (I put a limit of 10s for waiting for the call to return):

    await asyncio.wait_for(dev.power_on(), timeout=self.DEFAULT_POWER_ON_TIMEOUT)
  File "/opt/pyenv/versions/3.8.13/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

@barbibulle
Copy link
Collaborator

I think this controller has a Realtek chip. It may work without firmware loading, but I'd recommend trying to download the Realtek firmware for it, and putting it in a place where the Bumble library can find it and load it (see https://google.github.io/bumble/drivers/realtek.html ). One thing that I've noticed with Realtek-based dongles (as well as others), is that they're not happy when you close the USB connection to them without a reset. But it seems that in your case you are performing a reset before closing, can you confirm? I will get one of those dongles from Amazon and try on my side.

@hkpeprah
Copy link
Contributor Author

hkpeprah commented May 24, 2024

I downloaded the drivers from here: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/rtl_bt/, and pointed by BUMBLE_RTK_FIRMWARE_DIR environment variable to the directory. I also tried the drivers from the Edimax website: https://www.edimax.com/edimax/download/download/data/edimax/global/download/for_home/wireless_adapters/bluetooth/bt-8500. I modified the script slightly:

import asyncio
import logging
import sys

import bumble.device
import bumble.hci
import bumble.transport


async def open_transport(transport_name):
    link = await bumble.transport.open_transport_or_link(transport_name)
    dev = bumble.device.Device.with_hci("bumble", "F0:F1:F2:F3:F4:F5", link.source, link.sink)
    await dev.power_on()
    return link, dev


async def scan(dev):
    await dev.start_scanning(legacy=True, filter_duplicates=True)
    await asyncio.sleep(2)
    await dev.stop_scanning(legacy=True)


async def close_transport(link, dev):
    await dev.host.send_command(bumble.hci.HCI_Reset_Command(), check_result=True)
    await dev.power_off()
    await link.close()


async def main(args=None):
    if args is None:
        args = sys.argv[1:]

    transport_name = args[0] if len(args) else "usb:0a12:0001"
    logging.basicConfig(
        level=logging.DEBUG,
        format="[%(asctime)s] %(levelname)s %(name)s: %(message)s",
        datefmt="%H:%M:%S"
    )
    for _ in range(10):
        link, dev = await open_transport(transport_name)
        await scan(dev)
        await close_transport(link, dev)
    print("Done")


if __name__ == "__main__":
    asyncio.run(main())

But still fails in a similar fashion (using the Edimax):

[19:08:13] DEBUG bumble.host: ### HOST -> CONTROLLER: HCI_RESET_COMMAND
[19:08:13] DEBUG bumble.host: ### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 3
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
[19:08:13] DEBUG bumble.host: ### HOST -> CONTROLLER: HCI_READ_LOCAL_VERSION_INFORMATION_COMMAND
[19:08:13] DEBUG bumble.host: ### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 3
  command_opcode:          HCI_RESET_COMMAND
  return_parameters:       HCI_SUCCESS
[19:08:13] WARNING bumble.host: !!! command result mismatch, expected 0x1001 but got 0xC03

@hkpeprah
Copy link
Contributor Author

I did some more debugging, and found that sometimes the result of the HCI reset is something like this:

[D 2024-05-25 01:52:44.621 __init__:get_driver_for_host:59] Probing driver class: rtk
[D 2024-05-25 01:52:44.621 rtk:driver_info_for_host:495] getting RTK driver info for host
[D 2024-05-25 01:52:44.621 host:send_hci_packet:511] ### HOST -> CONTROLLER: HCI_RESET_COMMAND
[D 2024-05-25 01:52:44.621 usb:on_packet:138] packet queue length: 1
[D 2024-05-25 01:52:44.621 usb:process_queue:174] ACL transfer Control
[D 2024-05-25 01:52:44.630 usb:transfer_callback:261] transfer callback: status=0
[D 2024-05-25 01:52:44.630 usb:transfer_callback:265] packet_type=4
[D 2024-05-25 01:52:44.630 usb:transfer_callback:266] length=158
[D 2024-05-25 01:52:44.631 usb:transfer_callback:267] buffer=bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

Where the transfer.getBuffer() is 158 instances of 0x00. The packet data is passed up to on_packet, which calls hci.HCI_Packet.from_bytes() and throws a ValueError:

    File "/home/ford/github/bumble/bumble/hci.py", line 2022, in from_bytes
      return HCI_Event.from_bytes(packet)
    File "/home/ford/github/bumble/bumble/hci.py", line 4732, in from_bytes
      raise ValueError('invalid packet length')
  ValueError: invalid packet length

@hkpeprah
Copy link
Contributor Author

I managed to capture two of the packets I saw after an HCI reset command:

+# -----------------------------------------------------------------------------
+def test_parser_multiple_packets():
+    sink = Sink()
+    parser = PacketParser(sink)
+
+    data = [
+        0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0E, 0x04, 0x03,
+        0x03, 0x0C, 0x00
+    ]
+    parser.feed_data(data)
+    assert len(sink.packets) == 1
+    assert sink.packets[0] == b'\x04\x0e\x04\x03\x03\x0c\x00'
+
+    data = [
+        0x04, 0x0E, 0xFC, 0x02, 0x14, 0x0C, 0x00, 0x52, 0x54, 0x4B, 0x5F, 0x42,
+        0x54, 0x5F, 0x35, 0x2E, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+        0x00, 0x00, 0x00, 0x00, 0x00, 0x0E, 0x04, 0x03, 0x03, 0x0C, 0x00
+    ]
+    sink.packets = []
+    parser.feed_data(data)
+    assert len(sink.packets) == 1
+    assert sink.packets[0] == b'\x04\x0e\x04\x03\x03\x0c\x00'

@hkpeprah
Copy link
Contributor Author

Closing as this is resolved with 0.0.199. Thanks again @barbibulle .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants