Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird crash (Unknown term type: 100) #1346

Open
aiotter opened this issue Oct 18, 2024 · 21 comments
Open

Weird crash (Unknown term type: 100) #1346

aiotter opened this issue Oct 18, 2024 · 21 comments

Comments

@aiotter
Copy link
Contributor

aiotter commented Oct 18, 2024

I found that the following code often (or more than often, as it seems 100%) crashes with unknown term type error.
Seems really weird because only latest is in abnormal state, but state, in which latest exist, looks like a normal map.

defmodule Modem.PortManager do
  defmodule State do
    defstruct buffer: [""], lock: nil
  end

  @impl :gen_server
  def init(_config) do
    pid = self()
    port = Utils.get_modem_port()
    spawn_link(fn -> loop(pid, port) end)
    {:ok, %State{}}
  end

  defp loop(pid, port) do
    {:ok, read} = :uart.read(port)
    :gen_server.cast(pid, {:append, read})
    loop(pid, port)
  end

  @impl :gen_server
  def handle_cast({:append, new}, %State{buffer: [latest | rest]} = state) when is_binary(new) do
    {current, next} =
      case :binary.split(new, "\r\n") do
        [unfinished_line] -> {unfinished_line, nil}
        [line, ""] -> {line <> "\r\n", nil}
        [line, rest] -> {line <> "\r\n", rest}
      end

    # THIS LINE CRASHES!
    if(!is_binary(latest), do: :erlang.error({:latest, latest, state}))

    buffer =
      case latest do
        <<_::binary-size(byte_size(latest) - 2), "\r\n">> -> [current, latest | rest]
        _ -> [latest <> current | rest]
      end

    if next == nil do
      {:noreply, %State{state | buffer: buffer}}
    else
      handle_cast({:append, next}, %State{state | buffer: buffer})
    end
  end

...

end
CRASH
======
pid: <0.4.0>

Stacktrace:
[{Elixir.Modem.PortManager,handle_cast,2,[]},{esp32init,start,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,26}]},{esp32init,undefined,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,undefined}]},{gen_server,loop,2,[{file,"/atomvm/libs/estdlib/src/gen_server.erl"},{line,539}]}]

cp: #CP<module: 6, label: 110, offset: 24>

x[0]: error
x[1]: {latest,Unknown term type: 100,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.11.0>}}
x[2]: {4,3,75,2,[{6,3153},{0,0},{0,25},{9,1756}],error}

Stack
------

#CP<module: 0, label: 2, offset: 15>
#CP<module: 0, label: 0, offset: 1008206404>
[<<13,10>>]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.11.0>}
#CP<module: 6, label: 110, offset: 24>
[]
[]
[]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.11.0>}
[]
{state,modem_manager,Elixir.Modem.PortManager,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.11.0>}}
#CP<module: 6, label: 138, offset: 0>


Mailbox
--------
{$call,{<0.11.0>,#Ref<0.0.0.427>},read_line}


Monitors
--------
<0.6.0> <---> <0.4.0>
<0.11.0> ---> <0.4.0>


**End Of Crash Report**
bug: found unknown term type: 0x64
Abort in file /atomvm/src/libAtomVM/memory.c at line 560

abort() was called at PC 0x4205ca8b on core 0


Backtrace: 0x40375f7a:0x3fcab8a0 0x403805b5:0x3fcab8c0 0x4038976d:0x3fcab8e0 0x4205ca8b:0x3fcab950 0x4205bf35:0x3fcab9a0 0x4205bfbd:0x3fcab9c0 0x4205a576:0x3fcab9e0 0x4205a67d:0x3fcaba10 0x4208d505:0x3fcaba60 0x4205a263:0x3fcaba80 0x4205a30d:0x3fcabb30 0x420105e1:0x3fcabb50 0x4213d60f:0x3fcabb90 0x403811d9:0x3fcabbc0

When it crashes, the term type number is always the same (100).
I tried to move if(!is_binary(latest), do: :erlang.error({:latest, latest, state})) from current position to the top of the function, then the current error disappears. Instead, I got badarg error.

CRASH
======
pid: <0.4.0>

Stacktrace:
[{Elixir.Modem.PortManager,handle_cast,2,[]},{esp32init,start,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,26}]},{esp32init,undefined,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,undefined}]},{gen_server,loop,2,[{file,"/atomvm/libs/estdlib/src/gen_server.erl"},{line,539}]}]

cp: #CP<module: 6, label: 110, offset: 24>

x[0]: error
x[1]: badarg
x[2]: {4,3,75,2,[{6,3153},{0,0},{0,25},{9,1506}],error}

Stack
------

#CP<module: 0, label: 2, offset: 15>
#CP<module: 0, label: 0, offset: 1008206404>
[<<13,10>>]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.9.0>}
#CP<module: 6, label: 110, offset: 24>
[]
[]
[]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}
[]
{state,modem_manager,Elixir.Modem.PortManager,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}}
#CP<module: 6, label: 138, offset: 0>


Mailbox
--------
{$call,{<0.9.0>,#Ref<0.0.0.498>},read_line}


Monitors
--------
<0.6.0> <---> <0.4.0>
<0.9.0> ---> <0.4.0>


**End Of Crash Report**

After adding rescue clause, I cannot reproduce the error any more. I don't know why at all.

  @impl :gen_server
  def handle_cast({:append, new}, %State{buffer: [latest | rest]} = state) when is_binary(new) do
...
  rescue
    ArgumentError ->
      :erlang.error(:badarg)
  end

Actually, I found the badarg error at first. Trying hard to find out what causes the error, I found that the rescue clause clears the error. I tried more to find out why it removes the error, I happened to find bug: found unknown term type: 0x64.

Sorry that I cannot show all the project because the code includes some confidential information.
The handle_cast function above receives every message from modem via UART. Sometimes it is called extremely frequently.

@petermm
Copy link
Contributor

petermm commented Oct 18, 2024

this code is weird:

    # THIS LINE CRASHES!
    if(!is_binary(latest), do: :erlang.error({:latest, latest, state}))

    buffer =
      case latest do
        <<_::binary-size(byte_size(latest) - 2), "\r\n">> -> [current, latest | rest]
        _ -> [latest <> current | rest]
      end

    if next == nil do
      {:noreply, %State{state | buffer: buffer}}
    else
      handle_cast({:append, next}, %State{state | buffer: buffer})
    end

if(!is_binary(latest), do: :erlang.error({:latest, latest, state})) isn't a return and your code continues, so if latest is not a binary, not only is erlang:error called, but you continue just below with binary stuff..

I suggest:

  if is_binary(latest) do
    buffer =
      case latest do
        <<_::binary-size(byte_size(latest) - 2), "\r\n">> -> [current, latest | rest]
        _ -> [latest <> current | rest]
      end

    if next == nil do
      {:noreply, %State{state | buffer: buffer}}
    else
      handle_cast({:append, next}, %State{state | buffer: buffer})
    end
  else
    :erlang.error({:latest, latest, state})
  end

underlying error I assume to be that you are receiving a bitstring, and not a binary, which is why I assume :erlang.error is crashing.. (I believe bitstring support is not complete #154)

@aiotter
Copy link
Contributor Author

aiotter commented Oct 19, 2024

if(!is_binary(latest), do: :erlang.error({:latest, latest, state})) isn't a return and your code continues, so if latest is not a binary, not only is erlang:error called, but you continue just below with binary stuff..

Is it? In my understanding, :erlang.error/1 raises error type, so evaluating it raises the exception immediately and the following code will not even be evaluated, no matter it crashes or not.

I don't think latest is non-binary bitstring because the buffer is only added by handle_cast({:append, new}, _). As you can see in my code, new is a binary, so latest must be a binary too.

I tried the is_binary(latest) guard to make it sure:

@impl :gen_server
def handle_cast({:append, new}, %State{buffer: [latest | rest]} = state) when is_binary(new) and is_binary(latest) do
  ...

If is_binary(latest) is false, then calling handle_cast above must raise function_clause, however, it actually raises badarg.

@petermm
Copy link
Contributor

petermm commented Oct 19, 2024

You are probably right, not familiar with erlang:error, and never seen this coding style in elixir land.

But think it's needs investigation if a codepath that is never called is crashing..

I'm thinking OOM then maybe? but could be anything, there seems to a very tight loop reading uart..

@aiotter
Copy link
Contributor Author

aiotter commented Oct 19, 2024

Me either. I seldom use raise/throw system on Elixir. The code above is just for debugging.

The reason why I use loop function for reading from UART is because when the reading process dies, we lost control for the port. After this happens, the first try to call :uart.read returns an error :ealready.

:uart.read blocks forever when there is no incoming message. So I don't think you have to worry about the loop being too tight. Actually on one loop, we see a chunk of data like 100 bytes is read from UART, instead of being read only one character at a time.

@UncleGrumpy
Copy link
Collaborator

UncleGrumpy commented Oct 19, 2024

[line, ""] -> {line <> "\r\n", nil}

I am still somewhat a novice to the BEAM, and especially Elixir, but I believe this line is incorrect. Shouldn't you be matching against [line, <<"">>]? I honestly can't tell you why I think this would lead to the error you are seeing, but I do notice that #{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.11.0>} is sitting is the stack below what looks like current term [<<13,10>>].

Please feel free to take this with a grain of salt, I may be entirely incorrect here. I frequently get confused by subtle differences between Erlang and Elixir ;-)

@pguyot
Copy link
Collaborator

pguyot commented Oct 19, 2024

@UncleGrumpy strings in Elixir (as in literals between ") are Erlang binaries. Erlang strings (as in lists of integers) are represented with single quotes or with a sigil.

@UncleGrumpy
Copy link
Collaborator

@pguyot thanks. I need a poster sized cheat sheet of all those common differences to hang on the wall beside me 😂

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

I got some additional results. Let me describe the problem again.

This code crashes with abort and the message bug: found unknown term type: 0x0 (yes, now 0x0 instead of 0x64):

  def handle_cast({:append, new}, state) when is_binary(new) do
    %State{buffer: [latest | rest]} = state

    {current, next} =
      case :binary.split(new, "\r\n") do
        [unfinished_line] -> {unfinished_line, nil}
        [line, ""] -> {line <> "\r\n", nil}
        [line, rest] -> {line <> "\r\n", rest}
      end

    if(!is_binary(latest), do: :erlang.error({:latest, latest, state}))

    ...

  end
crash report
CRASH
======
pid: <0.4.0>

Stacktrace:
[{Elixir.Modem.PortManager,handle_cast,2,[]},{esp32init,start,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,26}]},{esp32init,undefined,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,undefined}]},{gen_server,loop,2,[{file,"/atomvm/libs/estdlib/src/gen_server.erl"},{line,539}]}]

cp: #CP<module: 6, label: 110, offset: 24>

x[0]: error
x[1]: {latest,Unknown term type: 0,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.9.0>}}
x[2]: {4,3,75,2,[{6,3153},{0,0},{0,25},{9,1750}],error}

Stack
------

#CP<module: 0, label: 2, offset: 15>
#CP<module: 0, label: 0, offset: 1008206404>
[<<13,10>>]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.9.0>}
#CP<module: 6, label: 110, offset: 24>
[]
[]
[]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}
[]
{state,modem_manager,Elixir.Modem.PortManager,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}}
#CP<module: 6, label: 138, offset: 0>


Mailbox
--------
{$call,{<0.9.0>,#Ref<0.0.0.416>},read_line}


Monitors
--------
<0.6.0> <---> <0.4.0>
<0.9.0> ---> <0.4.0>


**End Of Crash Report**
bug: found unknown term type: 0x0
Abort in file /atomvm/src/libAtomVM/memory.c at line 560

abort() was called at PC 0x4205ca8b on core 1


Backtrace: 0x40375f7a:0x3fcc0650 0x403805b5:0x3fcc0670 0x4038976d:0x3fcc0690 0x4205ca8b:0x3fcc0700 0x4205bf35:0x3fcc0750 0x4205bfbd:0x3fcc0770 0x4205a576:0x3fcc0790 0x4205a67d:0x3fcc07c0 0x4208d505:0x3fcc0810 0x4205a263:0x3fcc0830 0x4200fe47:0x3fcc08e0 0x42008078:0x3fcc0900 0x403811d9:0x3fcc0920




ELF file SHA256: 07cef789e

Besides this code just raises badarg:

  def handle_cast({:append, new}, state) when is_binary(new) do
    %State{buffer: [latest | rest]} = state
    if(!is_binary(latest), do: :erlang.error({:latest, latest, state}))

    {current, next} =
      case :binary.split(new, "\r\n") do
        [unfinished_line] -> {unfinished_line, nil}
        [line, ""] -> {line <> "\r\n", nil}
        [line, rest] -> {line <> "\r\n", rest}
      end

    # commented out
    # if(!is_binary(latest), do: :erlang.error({:latest, latest, state}))

    ...

  end
crash report
CRASH
======
pid: <0.4.0>

Stacktrace:
[{Elixir.Modem.PortManager,handle_cast,2,[]},{esp32init,start,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,26}]},{esp32init,undefined,0,[{file,"/atomvm/libs/esp32boot/esp32init.erl"},{line,undefined}]},{gen_server,loop,2,[{file,"/atomvm/libs/estdlib/src/gen_server.erl"},{line,539}]}]

cp: #CP<module: 6, label: 110, offset: 24>

x[0]: error
x[1]: badarg
x[2]: {4,3,75,2,[{6,3153},{0,0},{0,25},{9,1502}],error}

Stack
------

#CP<module: 0, label: 2, offset: 15>
#CP<module: 0, label: 0, offset: 1008206404>
[<<13,10>>]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<13,10>>],lock=><0.9.0>}
#CP<module: 6, label: 110, offset: 24>
[]
[]
[]
#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}
[]
{state,modem_manager,Elixir.Modem.PortManager,#{__struct__=>Elixir.Modem.PortManager.State,buffer=>[<<"">>],lock=><0.9.0>}}
#CP<module: 6, label: 138, offset: 0>


Mailbox
--------
{$call,{<0.9.0>,#Ref<0.0.0.501>},read_line}


Monitors
--------
<0.6.0> <---> <0.4.0>
<0.9.0> ---> <0.4.0>


**End Of Crash Report**

So it's really weird but that case clause is affecting. Whether you parse the map in the function clause or at the top of the function also matters (it decides the term type to be 0x64 or 0x0).

Btw, %State{buffer: buffer} = state; [latest | rest] = buffer also crashes with Unknown term type: 0.
You can get the same output with [latest | rest] = Map.get(state, :buffer) and [latest | rest] = :maps.get(:buffer, state).

There seems no chance that :uart.read(port) returns non-binary bitstring because the following code never raise an error.

  defp loop(pid, port) do
    {:ok, read} = :uart.read(port)
    if(!is_binary(read), do: :erlang.error({:read, read}))
    :gen_server.cast(pid, {:append, read})
    loop(pid, port)
  end

@pguyot
Copy link
Collaborator

pguyot commented Oct 20, 2024

Could you please confirm which version of AtomVM you are using?

This is a bug in the VM indeed, some lack of support of encoded terms that are generated by Elixir. I did fix a similar issue recently.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

I'm using forked version of AtomVM (https://github.com/realglobe-Inc/AtomVM/ at f796f0d), which is based on release-0.6 branch.
I'll update the VM to v0.6.5 and see what happens.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

Rebuilt the VM v0.6.5 and tried again now. The issue with bug: found unknown term type: 0x64 still exist.

Boot up log, which includes the VM version
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x4037d476
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce2810,len:0x178c
load:0x403c8700,len:0x4
load:0x403c8704,len:0xcb8
load:0x403cb700,len:0x2db0
entry 0x403c8914
I (26) boot: ESP-IDF v5.3.1 2nd stage bootloader
I (26) boot: compile time Oct 20 2024 09:27:22
I (26) boot: Multicore bootloader
I (29) boot: chip revision: v0.2
I (33) boot.esp32s3: Boot SPI Speed : 80MHz
I (38) boot.esp32s3: SPI Mode       : DIO
I (43) boot.esp32s3: SPI Flash Size : 4MB
I (47) boot: Enabling RNG early entropy source...
I (53) boot: Partition Table:
I (56) boot: ## Label            Usage          Type ST Offset   Length
I (64) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (71) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (78) boot:  2 factory          factory app      00 00 00010000 001c0000
I (86) boot:  3 boot.avm         RF data          01 01 001d0000 00080000
I (93) boot:  4 main.avm         RF data          01 01 00250000 00100000
I (101) boot: End of partition table
I (105) esp_image: segment 0: paddr=00010020 vaddr=3c140020 size=3664ch (222796) map
I (154) esp_image: segment 1: paddr=00046674 vaddr=3fc9e500 size=04e44h ( 20036) load
I (158) esp_image: segment 2: paddr=0004b4c0 vaddr=40374000 size=04b58h ( 19288) load
I (164) esp_image: segment 3: paddr=00050020 vaddr=42000020 size=13e740h (1304384) map
I (403) esp_image: segment 4: paddr=0018e768 vaddr=40378b58 size=158e0h ( 88288) load
I (423) esp_image: segment 5: paddr=001a4050 vaddr=600fe000 size=00100h (   256) load
I (424) esp_image: segment 6: paddr=001a4158 vaddr=600fe100 size=00008h (     8) load
I (439) boot: Loaded app from partition at offset 0x10000
I (440) boot: Disabling RNG early entropy source...
I (452) cpu_start: Multicore app
I (461) cpu_start: Pro cpu start user code
I (461) cpu_start: cpu freq: 160000000 Hz
I (461) app_init: Application information:
I (464) app_init: Project name:     atomvm-esp32
I (469) app_init: App version:      1
I (474) app_init: Compile time:     Oct 20 2024 09:32:57
I (480) app_init: ELF file SHA256:  6310d2b3e...
I (485) app_init: ESP-IDF:          v5.3.1
I (490) efuse_init: Min chip rev:     v0.0
I (495) efuse_init: Max chip rev:     v0.99
I (500) efuse_init: Chip rev:         v0.2
I (504) heap_init: Initializing. RAM available for dynamic allocation:
I (512) heap_init: At 3FCA7E90 len 00041880 (262 KiB): RAM
I (518) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (524) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (530) heap_init: At 600FE108 len 00001EE0 (7 KiB): RTCRAM
I (537) spi_flash: detected chip: gd
I (540) spi_flash: flash io: dio
W (544) spi_flash: Detected size(16384k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
W (558) i2c: This driver is an old driver, please migrate your application code to adapt `driver/i2c_master.h`
I (569) sleep: Configure to isolate all GPIO pins in sleep state
I (575) sleep: Enable automatic switching of GPIO sleep configuration
I (583) main_task: Started on CPU0
I (593) main_task: Calling app_main()

    ###########################################################

       ###    ########  #######  ##     ## ##     ## ##     ##
      ## ##      ##    ##     ## ###   ### ##     ## ###   ###
     ##   ##     ##    ##     ## #### #### ##     ## #### ####
    ##     ##    ##    ##     ## ## ### ## ##     ## ## ### ##
    #########    ##    ##     ## ##     ##  ##   ##  ##     ##
    ##     ##    ##    ##     ## ##     ##   ## ##   ##     ##
    ##     ##    ##     #######  ##     ##    ###    ##     ##

    ###########################################################

I (643) AtomVM: Starting AtomVM revision 0.6.5
I (653) sys: Loaded BEAM partition boot.avm at address 0x1d0000 (size=524288 bytes)
I (673) network_driver: Initialized network interface
I (673) network_driver: Created default event loop
I (683) pp: pp rom version: e7ae62f
I (683) net80211: net80211 rom version: e7ae62f
I (693) wifi:wifi driver task: 3fcb4948, prio:23, stack:6656, core=0
I (693) wifi:wifi firmware version: ccaebfa
I (693) wifi:wifi certification version: v7.0
I (693) wifi:config NVS flash: enabled
I (703) wifi:config nano formating: disabled
I (703) wifi:Init data frame dynamic rx buffer num: 32
I (713) wifi:Init static rx mgmt buffer num: 5
I (713) wifi:Init management short buffer num: 32
I (713) wifi:Init dynamic tx buffer num: 32
I (723) wifi:Init static tx FG buffer num: 2
I (723) wifi:Init static rx buffer size: 1600
I (733) wifi:Init static rx buffer num: 10
I (733) wifi:Init dynamic rx buffer num: 32
I (733) wifi_init: rx ba win: 6
I (743) wifi_init: accept mbox: 6
I (743) wifi_init: tcpip mbox: 32
I (753) wifi_init: udp mbox: 6
I (753) wifi_init: tcp mbox: 6
I (753) wifi_init: tcp tx win: 5760
I (763) wifi_init: tcp rx win: 5760
I (763) wifi_init: tcp mss: 1440
I (773) wifi_init: WiFi IRAM OP enabled
I (773) wifi_init: WiFi RX IRAM OP enabled
I (783) phy_init: phy_version 680,a6008b2,Jun  4 2024,16:41:10
I (823) wifi:ic_enable_sniffer
I (823) AtomVM: Found startup beam esp32init.beam
I (823) AtomVM: Starting esp32init.beam...
---
AtomVM init.

@pguyot
Copy link
Collaborator

pguyot commented Oct 20, 2024

Thank you. Would you mind sharing a minimal example or maybe your .avm?
I tried from your source code to reproduce the crash but failed, even with an ESP32-S3.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

I'm sorry that I cannot provide the whole avm file because part of the logic is related to our business.
I may be able to remove all the bussiness-related code and build a minimal avm file later if you need.

I'm trying to make a minimal reproductive code for the issue, but I'm keep failing.
It's because usually the code works without error, except for a very particular case. I can reproduce the issue 100% when I try to query the result of HTTP GET action from the modem, and the result is fed to UART.
I tried to dump the binary from :uart.read and feed it to handle_cast. But it succeeds all the time without error.

Is there a good way to dump the term in problem so that I can make a simple code to reproduce the issue? I tried :erlang.term_to_binary(latest), but evaluating it crashes immediately with Unknown external term type: 100. I also tried :erlang.term_to_binary(state) and got Unknown external term type: 147.

@pguyot
Copy link
Collaborator

pguyot commented Oct 20, 2024

What you describe looks more and more like a memory corruption issue, which may or may not be related to uart.
You can try to enable DEBUG_HEAP_ALLOC in memory.h in your VM build to see if something pops in.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

Hmm I tried #define DEBUG_HEAP_ALLOC however nothing appeared 😢

@pguyot
Copy link
Collaborator

pguyot commented Oct 20, 2024

esp32 uart_driver seems a little bit broken indeed, especially with multicore (such as S3). Data may arrive through interrupts as well as through calls to read.

Do you want to confirm the bug goes away if you set the schedulers count to 1?

Erlang:

erlang:system_flag(schedulers_online, 1)

Elixir:

:erlang.system_flag(:schedulers_online, 1)

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

I tried :erlang.system_flag(:schedulers_online, 1), however the issue still exist.

I noticed that the following code invokes the issue as soon as they call handle_cast at the first time:

  def handle_cast({:append, new}, %State{buffer: [latest | rest]} = state) when is_binary(new) do
    IO.inspect(:erlang.term_to_binary(state))
    ...
  end

I may be able to create the minimal reproductive code using this.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 20, 2024

I found that the following code always crashes, but I'm not sure that is related to the issue or not. It seems another unrelated issue.

defmodule Entrypoint do
  defmodule State do
    defstruct pid: nil
  end

  def start() do
    state = %State{pid: self()}
    IO.inspect(:erlang.term_to_binary(state))
  end
end
Unknown external term type: 19

It seems that AtomVM crashes when you apply term_to_binary/1 to pid or collection in which pid locates.

@aiotter
Copy link
Contributor Author

aiotter commented Oct 21, 2024

I tried to re-implement binary:split/2 by myself and the issue is now gone.
Is there a possibility of a bug in the Nif of binary:split/2, which accidentally rewrite the region of memory outside of the term? I was unable to find such awful bug though...

  def handle_cast({:append, new}, %State{buffer: [latest | rest]} = state) when is_binary(new) do
    {current, next} =
      case split_by_crlf(new) do
        [unfinished_line] -> {unfinished_line, nil}
        [first_line, rest] -> {first_line, rest}
      end

    buffer =
      case latest do
        <<_::binary-size(byte_size(latest) - 2), "\r\n">> -> [current, latest | rest]
        _ -> [latest <> current | rest]
      end

    if next == nil do
      {:noreply, %State{state | buffer: buffer}}
    else
      handle_cast({:append, next}, %State{state | buffer: buffer})
    end
  end

  defp find_line_end(bin, index \\ 0) do
    if index + 1 < byte_size(bin) do
      case [:binary.at(bin, index), :binary.at(bin, index + 1)] do
        ~c"\r\n" -> index + 1
        _ -> find_line_end(bin, index + 1)
      end
    else
      nil
    end
  end

  defp split_by_crlf(bin) do
    case find_line_end(bin) do
      nil ->
        [bin]

      line_end ->
        [
          :binary.part(bin, 0, line_end + 1),
          :binary.part(bin, line_end + 1, byte_size(bin) - line_end - 1)
        ]
    end
  end

  ...

@pguyot
Copy link
Collaborator

pguyot commented Oct 22, 2024

It seems that AtomVM crashes when you apply term_to_binary/1 to pid or collection in which pid locates.

True. I created #1350 for this. It is unrelated to the error you initally reported here.

@pguyot
Copy link
Collaborator

pguyot commented Oct 22, 2024

Is there a possibility of a bug in the Nif of binary:split/2, which accidentally rewrite the region of memory outside of the term? I was unable to find such awful bug though...

This is possible. I have been a little bit confused by the maths of binary allocation when I last touched this bit, but I didn't think it affected binary:split/2. I'll try to get a deeper look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants