Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpiManager breaks SPI communication on SPI3 of ESP32-S2 #2343

Open
4 tasks done
LennartF22 opened this issue Oct 9, 2024 · 7 comments
Open
4 tasks done

SpiManager breaks SPI communication on SPI3 of ESP32-S2 #2343

LennartF22 opened this issue Oct 9, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@LennartF22
Copy link
Contributor

LennartF22 commented Oct 9, 2024

What happened?

On ESP32-S2, the CMT2300A and W5500 are not working when they run on SPI3.

To Reproduce Bug

Configure a CMT and/or W5500 on an ESP32-S2 running OpenDTU version 24.9.27 or later, but do not configure a nRF (this will cause SPI3 to be used as a shared SPI bus).

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

v24.10.6

What firmware variant (PIO Environment) are you using?

lolin_s2_mini

Anything else?

This only affects the ESP32-S2. What sets it apart from the ESP32, ESP32-S3 and ESP32-C3 is that the DMA channel for SPI3 is shared with the ADC/DAC, while on the ESP32, there are dedicated DMA channels for SPI, and on the ESP32-S3 and ESP32-C3, there is are a handful of general-purpose DMA channels, which can be assign to arbitrary peripherals.

Currently, DMA is automatically used when allocating shared buses in the SpiManager, because DMA is required for the usage of the W5500. Simultaneously, ADC/DAC DMA is probably used somewhere (maybe in the Arduino core?), causing the issue.

As a first hotfix, we should use SPI_DMA_DISABLED here for SPI3 on the ESP32-S2 platform:

ESP_ERROR_CHECK(spi_bus_initialize(host_device, &bus_config, SPI_DMA_CH_AUTO));

This means, that the W5500 would still not work on the ESP32-S2 if it ends up on SPI3 (this depends on initialization order and, for example, whether a nRF, which currently still claims a dedicated bus, is configured), but at least the CMT will work again.

Subsequently, we should add more logic so that for dedicated and shared SPI buses managed via the SpiManager

  1. DMA is only used when it is required for one of the devices on the shared bus (DMA is less performant for small transaction anyways),
  2. SPI3 is prefered over SPI2 for dedicated/shared SPI buses that do not need DMA on the ESP32-S2 and
  3. only SPI2 is used for dedicated/shared SPI buses that need DMA on the ESP32-S2.

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.
@LennartF22 LennartF22 added the bug Something isn't working label Oct 9, 2024
LennartF22 added a commit to LennartF22/OpenDTU that referenced this issue Oct 10, 2024
@LennartF22
Copy link
Contributor Author

@tbnobody I already created a PR with a hotfix. I will also work on a better long-term solution, as mentioned above.

@tbnobody
Copy link
Owner

The new crash might be somehow related to this

@stefan123t
Copy link

stefan123t commented Oct 17, 2024

@tbnobody thanks for the reference, this is a long and rewarding read. My TLDR; resume is that the temp sensor may run into an endless loop by some neglicence to turn it back on in case it was on in the wifi code. There is a fix in the wifi code since v5.1.2+.

I think we have introduced mcu temp reading some time ago and therefor may trigger the loop / watch dog timeout inadvertently.

If we are on pre v5.1.2 there is a temp workaround which never got merged in the esp-idf.
Are we using that v5.1.2+ or a newer version now ?

@tbnobody
Copy link
Owner

Are we using that v5.1.2+ or a newer version now ?

Arduino core with platformio uses 4.x as default... when using version 5.x it requieres ~10% additinal flash (currently the image requires 86% flash). It will not make sense to upgrade 4MB devices to core 5. Therefor we have to find the reason for the problem and why it occours right now.

@stefan123t
Copy link

Can’t we take the same approach as NoNullPtr in the esp-idf thread, ie espressif/esp-idf#8088 (comment)
either for older ESP32’s with 4MB or until we switch to ESP-IDF 5.x ?

As NoNullptr, softhack007 and MartinPatarinski describe it the register needs to be set back to prevent staying in the loop and ultimately triggering the watch dog.

We do not call this so often to determine the ESP temperature, do we ?

@LennartF22
Copy link
Contributor Author

@stefan123t No, not easily at least, because it requires a change to ESP-IDF, and an already built version of that is used with the Arduino core. Presumably we could patch the binary, but that's probably too much of a hack... Alternatively, one might pause the WiFi task while reading the temperature (which takes 100-200us), but even if that works, it's likely not a good idea either.

The rate at which it is called is the MQTT publish interval, and it was added in v24.9.22. Calling it at a lower rate would not fix the issue either, unfortunately.

I can't speak for @tbnobody of course, but I think the only thing we could do right now is to disable the temperature sensor completely (in device info and over MQTT) for the ESP32-S2. It does not pay off to put a considerable amount of work into fixing a feature that presumably is not needed by many users and is more like a "nice to have", on a platform (ESP-S2) that is not used by many users either.

@stefan123t
Copy link

@LennartF22 thanks for sharing your thoughts.

I read that they were waiting for both ESP-IDF changes and the changes to the WiFi binary which is included as a part of the ESP-IDF core hardware in binary only. Apparently they had to wait for Espressif to update the latter binaries before the change to ESP-IDF would be possible.

Yes, disabling the WiFi for some time is usually a bad idea because it may switch context into the WiFi core which would probably trigger the watch dog.

But disabling the ESP temperature altogether would be another fix besides reducing the interval with which we call the temperature sensor andor fencing the temperature sensor reading with some register (p)reset code.

I understood however that this effects also ESP32S3 maybe to a lesser extent / frequency ?

Benichou34 added a commit to Benichou34/OpenDTU that referenced this issue Dec 14, 2024
* massage file handling

* fix comment

* Update patch_apply.py

* webapp: Update dependencies

* Add serial prefix 1410 to HMS_2CH inverters

This is related to tbnobody#2235 and fixes tbnobody#2230

* Output WiFi disconnect reason in console

* Upgrade ESPAsyncWebServer from 3.1.2 to 3.2.0

* Upgrade olikraus/U8g2 from 2.35.19 to 2.35.21

* Upgrade arkhipenko/TaskScheduler from git #testing to 3.8.5

* webapp: update dependencies

* Feature: Add support for HERF 1 channel inverters

* Upgrade ESPAsyncWebServer from 3.2.0 to 3.3.1

* Upgrade olikraus/U8g2 from 2.35.21 to 2.35.27

* webapp: Update dependencies

* webapp: Upgrade tsconfig node18 to node22

* webapp: Parse version string event if update search is not allowed

* issue template: asks for firmware variant

* actions: use setup-node@v4 as v3 causes warning

the "Yarn Linting" action causes a warning to appear about a deprecated
Node version. switch to actions/setup-node@v4, which is already in use
by the action building the web app for the firmware, to avoid this
warning.

* actions: switch to node version 20 for linting

use version consistent with the version used when building the web
application.

* actions: run yarn prettier to check web app formatting

* actions: fix a typo

* changelogs: group webapp-related changes

* Doc: Remove inverter list and add a link to the documentation

This reduces redundant effort when a inverter is added.

* Upgrade olikraus/U8g2 from 2.35.27 to 2.35.30

* webapp: Update dependencies

* webapp: add app.js.gz

* actions: enable corepack to use fixed version of yarn

this allows us to fix the version of yarn, the Node.js package manager,
to a particular version. using corepack is the recommended way to use
yarn these days.

* webapp: Fix html error in eventlog

* Fix: WebApp was not reloaded after firmware update

With the upgrade from ESPAsyncWebServer to 3.3.1 it seems that something has changed. Have to trigger the reboot from the main context.

* Update bblanchon/ArduinoJson from 7.1.0 to 7.2.0

* webapp: add app.js.gz

* Github Action: Update node version from 20 to 22

* Publish ESP heap and temperature details on MQTT

I noticed that some useful ESP stats are missing on the MQTT feed, so this adds:

- ESP temperature
- ESP heap stats (size, free, minFree, maxAlloc)

* Fix: Wrong topic in home assistant auto discovery for maxalloc and minfree

* Fix: Saving DTU config values just returned "Values are missing"

* Publish temperature only if its not NAN

* Feature: Inverter radio statistics (rx/tx statistics)

The  statistics are shown in the WebApp and published via MQTT.
Statistics are reset at midnight.

* Added icon to radio statistics

* webapp: add app.js.gz

* Fix: Unable to CMT transmit power in WebApp

The pa_level was sent as string instead of a number.

fixes tbnobody#2299

* Fix: Restart was triggered before all website data was sent

This led to the effect that e.g. the confirmation messages where  not shown.

It is somehow related to ESPAsyncWebServer 3.3.0

* webapp: Update dependencies

* webapp: Fix data type for all range inputs

* webapp: add app.js.gz

* Decrease restart delay to 1 second

This prevents a reload of the webapp (during firmware update) before the esp is online again

* Optimize MQTT subscription handling

* Move inverter housekeeping tasks inside the InverterAbstract class

* Feature: Allow reset of radio statistics via mqtt

* Feature: Publish Radio statistics to home assistant

* MQTT Hass: Change char* to String&

* MQTT Hass: Rename caption parameter to name

* MQTT Hass: Change parameter order for publishInverterSensor

* MQTT Hass: Change parameter order for publishDtuSensor

* MQTT Hass: Make publish methods static

* MQTT Hass: Change parameter order for publishDtuBinarySensor

* MQTT Hass: Change parameter order for publishInverterButton

* MQTT Hass: Change parameter order for publishInverterNumber

* MQTT Hass: Harmonise parameter names

* MQTT Hass: Remove no more required checks

* MQTT Hass: Move publishBinarySensor logic into separate method

* MQTT Hass: Reorder binary  sensor methods

* MQTT Hass: Move publishSensor logic into separate method

* MQTT Hass: Move yield into the publish method

* MQTT Hass: Add device_type and category to publishInverterBinarySensor

* MQTT Hass: Reorder defines

* MQTT Hass: Move serialization and allocation check into own method

* MQTT Hass: Append dtu prefix topic for each single sensor

* Feature: Publish YieldTotal, YieldDay and Power of all inverters to Home Assistant

* MQTT Hass: Implement category as enum instead of String

* MQTT Hass: Implement device class as enum instead of String

* MQTT Hass: Implement method to add common metadata to json output

* Remove unnecessary CMT SPI inversions

* Fix cs_ena_posttrans calculation

* Remove unnecessary delays

* Implement W5500 support

* Add SpiManager library

* Optimize CMT FIFO access

* Change cmt_spi3 implementation from C to C++

* Add Arduino SPI translation

* Use SpiManager for nRF, CMT and W5500

* Use shared SPI bus for CMT and W5500

* Only use a single SPI device for CMT

* Feature: Allow reset of radio statistics via WebApp

* webapp: Update dependencies

* Embed current branch into building process

* Slight adjustments to github bug_report template

* Upgrade github actions/checkout to v4

* GitHub Build Action: Automatically generate littlefs image

If a data directory exists, the content of this directory will be placed in the littlefs image and embedded into the factory.bin file

* Fix: Only count RF RX packets when packets where sent

This mainly occours after a reset of  the statistics that receive count is higher then transmit count

* webapp: Apply auto format

* webapp: Update dependencies

* Simplify network callback handling

* Simplify inverter handling

* webapp: add app.js.gz

* Apply license headers and automatic code formatting to SpiManager

* Apply automatic code formatting

* Added device profile for OpenDTU Fusion v2 PoE

* increase chunkSizeWarningLimit for webapp build (tbnobody#1287)

increase from 500k (default) to 1024k in order to get rid of the warning messages.

* Rename NetworkEventCb to DtuNetworkEventCb to prevent further upgrade issues

* Add default values for ethernet pins in case they are not defined for a specific board

* Take care of different signature of ETH.begin method in Arduino Core 3.x

* Added required include to work with IDF 5

* Update espressif32 from 6.8.1 to 6.9.0

* webapp: Update dependencies

* webapp: add app.js.gz

* issue template: fix typo

* Add connection check for W5500 before full initialization

* Prevent warning on GPIO ISR service registration

* Adjust name of OpenDTU Fusion v2 PoE build environment

* Add device profiles for OpenDTU Fusion v2 PoE with displays

* Fix: avoid deprecated setAuthentication() to fix memory exhaustion

with ESPAsyncWebServer 3.3.0, the setAuthentication() method became
deprecated and a replacement method was provided which acts as a shim
and uses the new middleware-based approach to setup authentication. in
order to eventually apply a changed "read-only access allowed" setting,
the setAuthentication() method was called periodically. the shim
implementation each time allocates a new AuthenticationMiddleware and
adds it to the chain of middlewares, eventually exhausting the memory.

we now use the new middleware-based approach ourselves and only add the
respective AuthenticatonMiddleware instance once to the respective
websocket server instance.

a regression where enabling unauthenticated read-only access is not
applied until reboot is also fixed. all the AuthenticationMiddleware
instances were never removed from the chain of middlewares when calling
setAuthentication("", "").

* Fix: force websocket clients to authenticate

when changing the security settings (disabling read-only access or
changing the password), existing websocket connections are now closed,
forcing the respective clients to authenticate (with the new password).
otherwise, existing websocket clients keep connected even though the
security settings now expect authentication with a (changed) password.

* webapp: Update dependencies

* Upgrade ESPAsyncWebServer from 3.3.1 to 3.3.7

* Remove icon because device_class is set

* Remove unused DEVICE_CLASS_TEMP

* Fix: Add state_class to several Home Assistant sensors

state_class was added to yieldtotal, yieldday ac power and temperature for the whole dtu

closes: tbnobody#2324

* Update UpgradePartition.md

Fixed typo

* Feature: Show RSSI of last received packet  in radio stats

The value is also published via MQTT

* Upgrade ESPAsyncWebServer from 3.3.7 to 3.3.11

* webapp: Update dependencies

* Rename NetworkEventCbList_t to DtuNetworkEventCbList_t for further upgrades

* Replace format strings by platform independent macros

* webapp: Update dependencies

* webapp: add app.js.gz

* webapp: Fix eslint issues

* Remove EMAC related code for devices that don't have one

* Initialize the last rssi value with -127 instead of 0 to indicate a non existing connection of no data was received yet

* Fix: "Equal brightness" in LED settings does not work correctly

fixes: tbnobody#2332

* Upgrade ESPAsyncWebServer from 3.3.11 to 3.3.12

* webapp: add app.js.gz

* webapp: pin assignment: hide unsupported pins

if the pin_mapping.json includes unsupported pins, e.g., `eth` pins on
an ESP32-S3, the whole category should still be hidden in the device
manager.

* webapp: Update dependencies

* Don't set TX timeout to 0 anymore for HW/USB CDC

Due to a change in the Espressif Arduino core, the TX timeout for the HW CDC
(used in the ESP32-S3, for example) must not be set to 0, as otherwise, an
integer underflow occurs.

Removing the TX timeout is not necessary anymore anyways, because it is now
detected when CDC is not active, and attempts to write will return immediately
until the host read something again. Only when the transmit buffer becomes
full initially, the default timeout of just 100ms takes effect once.

For USB CDC (used with the ESP32-S2, for example), the timeout is not relevant
either.

* Feature: show task details in system info view

shows whether or not known tasks are alive, and in particular shows how
much of the respective stack is still available.

* Hotfix to not use DMA on SPI3 of ESP32-S2

See issue tbnobody#2343.

* Upgrade ESPAsyncWebServer from 3.3.12 to 3.3.13

* Fix: Correct output of wifi disconnect reason code

* webapp: Update dependencies

* Upgrade ESPAsyncWebServer from 3.3.13 to 3.3.14

* Upgrade ESPAsyncWebServer from 3.3.14 to 3.3.15

* Upgrade olikraus/U8g2 from 2.35.30 to 2.36.2

* webapp: Update dependencies

* Upgrade ESPAsyncWebServer from 3.3.15 to 3.3.16

* webapp: add app.js.gz

* Fix: cpplint errors

* Update nrf24/RF24 from 1.4.9 to 1.4.10

* Upgrade ESPAsyncWebServer from 3.3.16 to 3.3.17

* Rename config API to file API

* Refactor file handling API and add endpoint to delete files

* Feature: Refactor config management interface

* webapp: Use global AlertResponse interface

* Add API endpoint to retrieve custom languages and complete language pack

* webapp: Allow upload of language packs

* Feature: Allow custom language pack for webapp

* Feature: Added spanish language pack

* Feature: Added italian language pack

* Move lookup for translation path to separate method

* Check if language pack metadata are valid

* webapp: Added global reboot wait screen

* webapp: Rename interface to prevent lint errors

* add and use configuration write guard

the configuration write guard is now required when the configuration
struct shall be mutated. the write guards locks multiple writers against
each other and also, more importantly, makes the writes synchronous to
the main loop. all code running in the main loop can now be sure that
(1) reads from the configuration struct are non-preemtive and (2) the
configuration struct as a whole is in a consistent state when reading
from it.

NOTE that acquiring a write guard from within the main loop's task will
immediately cause a deadlock and the watchdog will trigger a reset. if
writing from inside the main loop should ever become necessary, the
write guard must be updated to only lock the mutex but not wait for a
signal.

* webapp: Fix: WaitRetstartView showed basic auth dialog

* Rewrite display language handling to work with locale strings instead of magic numbers.

This is required to implement further i18n functions using the language packs

* Feature: Implement language pack support for display texts

* Feature: Added spanish display translation

* Feature: Added italian display translation

* Added README.md to lang folder

* Feature: Added device info for HMS-700

* Fix: Take DST into account when recalculating the sunrise sunset time

If it is not considered the correct sunset / sunrise time is only calculated at the next day

Fixes: tbnobody#2377

* Feature: Validate JSON before uploading

* webapp: Update dependencies

* Upgrade ESPAsyncWebServer from 3.3.17 to 3.3.21

* Fix: Lint Error

* Fix: skip BOM in JSON files (pin_mapping and config)

based on tbnobody#2387

* webapp: right-align labels for inputs on non-sm viewports

this change tries to achieve a pleasing look of input forms by
right-aligning the texts of labels. the input form now looks similar
to a table, achieving a cleaner look, especially for forms where the
labels have varying text lenghts.

* webapp: last table row shall have no bottom border

similar to the first row which has no border at the top.

* webapp: remove table's bottom margin

we don't need a margin at the bottom of tables in general. not sure why
this is even a thing in bootstrap. this change, in particular, makes the
space between a table and a parent card symmetric on all sides.

* webapp: add gap between inverter selectors

* webapp: avoid inline style in inverter channel info card

* webapp: equalize style of cards with tables in live view

this change adjusts the style of cards showing tables such that they
look the same as inverter channel info tables.

* webapp: use reasonable name for radio stats accordion

* webapp: align table headers with card headers

set the left margin of table header cells to the same marging the card
header use, such that the text align on the same axis.

* webapp: apply card-table class to info view cards

the cards in all information views still used a div.card-body around the
table, which added a margin on all sides of the table. to achieve a
unified look, these cards and tables now look the same as the inverter
channel cards.

* webapp: adjust look of tables in accordions to live view cards

this is relevant for the radio statistics table, as well as the tables
in the grid profile modal.

* webapp: beautify radio statistics reset button

it would be nice to have this in the header of the accordion, which is
hard, but doable. however, clicking the button then also toggles the
accordion, which is unacceptable. preventing that seems non-trivial, as
the @click.stop() is not enough. also, nesting interactive elements is
simply bad practice. the button can also go to the right of header, with
reasonable effort, but the corner radii are then messed up and would
need to react interactively (accordion collapsed or not), which is also
a pain.

we now "float" the reset button to the right, add a nice icon, and give
the button some space so it at least looks like it belongs there.

* webapp: fix inverter "add" and "save order" button positions

the source tells us that the buttons are supposed to be on the right of
tha card, but the CSS broke at some point.

* webapp: optimize spacing on bottom of cards

if the last child in a card (div.card > div.card-body) adds bottom
marging, we don't want the card to add more space through its
padding-bottom. most cards have children that add sufficient space
at the bottom anyways.

* webapp: MQTT: use v-if in favor of v-show

if we hide elements (which is done using style="display:none;"), they
are still part of the DOM and mess with CSS rules that shall apply to
the last element of a card or the last row of a table.

* webapp: MQTT: no login with cert if TLS disabled

in the settings view we hide the "login with cert" setting while TLS is
disabled, so we should also hide that info in the info view when TLS is
disabled.

* webapp: avoid inline style for inverter channel info value

* webapp: properly space alert with hint for hostname

* webapp: optimize look of firmware update cards

* webapp: inverter advanced tab needs space at the top

this avoids the input text box from colliding with the tab navigation
bottom border.

* webapp: optimize look of login page

improve spacing and align login buton to the right, where all our
buttons are.

* webapp: optimize body bottom padding and length

long forms, when scrolled to the bottom, would leave no space between
the bottom of the viewport and the buttons, which is unpleasent.

short views would still createa large (high) body, for apparently no
reason.

* webapp: consistently use no colon in form labels

there are no colons for table headers as well. some form labels had no
colon already, so this change uses a unified look among form labels.

* webapp: optimize placement of device profile doc buttons

* remove empty container for device profile links. if a device profile
  has no links, no buttons are generated, but a row was still part of
  the DOM, adding spurious space between the select and the alert with
  the hint.

* webapp: show pin mapping categories as cards

on a desktop browser, this approach allows to display all categories at
once. we also increase readability as the values are much closer to
their label. previously, the values were far to the right of the screen
and it was unpleasent to read which value belonged to which setting. the
grouping of values per category was also not very well conceived.

by using cards, we also avoid some styling issues, namely the use of
rowspan, which caused a spurious table cell border at the end of the old
table layout.

* webapp: device manager: optimize cards for tab nav

the top border of the card was breaking the design of the tabs, where
the active tab would be "visually connected" to the content. also, the
rounded border at the top did not blend in with the navbar's bottom
border.

* webapp: always scroll up when navigating to another view

* webapp: fix inverter selection button breaking

on small viewports, the icon and the inverter label would be displayed
in two lines. this change keeps the icon and the label tied together in
any case, and the icon is centered vertically around the label.

* keep console.log() when serving webapp

the removal of console and debugger statements by esbuild even when not
building for production seems to be a regression, as these were
definitely working in the past.

this change uses the command parameter to configure esbuild to either
keep or indeed remove the respective statements. they are only kept if
command is not "serve".

to avoid having to indent everything in defineConfig() by one block, the
return statement and closing curly brace were added "inline".

* Remove not required include

* Replace multiline print by printf

* Remove not required include

* webapp: declare emitted event in FormFooter component

fixes an annoying warning (visible in the browser console):

[Vue warn]: Extraneous non-emits event listeners (reload) were passed to
component but could not be automatically inherited because component
renders fragment or text root nodes. If the listener is intended to be a
component custom event listener only, declare it using the "emits"
option.

* webapp: Update dependencies

* Upgrade ESPAsyncWebServer from 3.3.21 to 3.3.22

* Fix lint errors

* Build factory.bin in every compile attempt

This is required to apply changes  which are maybe only related to  the data directory.

* webapp: add app.js.gz

* Merge with v24.11.7

* Fix cpplint

---------

Co-authored-by: Marc-Philip <[email protected]>
Co-authored-by: Thomas Basler <[email protected]>
Co-authored-by: Bernhard Kirchen <[email protected]>
Co-authored-by: Tobias Diedrich <[email protected]>
Co-authored-by: LennartF22 <[email protected]>
Co-authored-by: vaterlangen <[email protected]>
Co-authored-by: mbo18 <[email protected]>
Co-authored-by: janrombold <[email protected]>
Co-authored-by: CommanderRedYT <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants