Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NmtBase.state should always return a str #500

Closed
erlend-aasland opened this issue Jul 8, 2024 · 11 comments · Fixed by #506
Closed

NmtBase.state should always return a str #500

erlend-aasland opened this issue Jul 8, 2024 · 11 comments · Fixed by #506

Comments

@erlend-aasland
Copy link
Contributor

Problem

NmtBase.state is documented and annotated as always returning a string, however there is a code path that may return an int:

canopen/canopen/nmt.py

Lines 74 to 92 in 62e9c1f

@property
def state(self) -> str:
"""Attribute to get or set node's state as a string.
Can be one of:
- 'INITIALISING'
- 'PRE-OPERATIONAL'
- 'STOPPED'
- 'OPERATIONAL'
- 'SLEEP'
- 'STANDBY'
- 'RESET'
- 'RESET COMMUNICATION'
"""
if self._state in NMT_STATES:
return NMT_STATES[self._state]
else:
return self._state

The code path in question is highly unlikely; the NmtBase._state member is set in the following places:

  • NmtBase.__init__: explicitly set to 0
  • NmtBase.on_command: set iff the new state is a valid/known state
  • NmtBase.send_command: set iff the new state is a valid/known state
  • NmtMaster.on_heartbeat: set to whatever was decoded from the heartbeat message, which should be fine for any compliant device

Possible solutions

In NmtMaster.on_heartbeat, only set _state if it is a valid state; if it's not, log.error(...) or raise an exception.

In NmtBase.state, either raise an exception if ._state is not a valid state, or return something a la f"UNKNOWN STATE {self._state}".

Draft diff
diff --git a/canopen/nmt.py b/canopen/nmt.py
index 401ad15..1316a5c 100644
--- a/canopen/nmt.py
+++ b/canopen/nmt.py
@@ -86,10 +86,8 @@ class NmtBase:
         - 'RESET'
         - 'RESET COMMUNICATION'
         """
-        if self._state in NMT_STATES:
-            return NMT_STATES[self._state]
-        else:
-            return self._state
+        assert self._state in NMT_STATES:
+        return NMT_STATES[self._state]
 
     @state.setter
     def state(self, new_state: str):
@@ -122,6 +120,12 @@ class NmtMaster(NmtBase):
             logger.debug("Received heartbeat can-id %d, state is %d", can_id, new_state)
             for callback in self._callbacks:
                 callback(new_state)
+            if new_state not in NMT_STATES:
+                log.error(
+                    "Received heartbeat can-id %d with invalid state %d",
+                    can_id, new_state
+                )
+                return
             if new_state == 0:
                 # Boot-up, will go to PRE-OPERATIONAL automatically
                 self._state = 127
@sveinse
Copy link
Collaborator

sveinse commented Jul 8, 2024

Awesome. I've found the same thing while type annotating, so this discussion is on my list 👍

Now, the core question is: Does the canopen standard allow for other/custom states than what's defined in the standard? I can't remember on the top of my mind if it does.

@erlend-aasland
Copy link
Contributor Author

Now, the core question is: Does the canopen standard allow for other/custom states than what's defined in the standard? I can't remember on the top of my mind if it does.

Standard compliant devices are probably ok; but non-standard compliant devices do exist. I've got a few of them in the project I've inherited. The device in question does not heed NMT commands (apparently they dropped NMT support from the firmware, because the could not get it to work), it does not support SDO (including configuring PDOs), nor does it support the SYNC protocol; all it does is to continuously stream hard-coded PDOs from a hard-coded OPERATIONAL state.

Hence, I think the core question is: what to do with non-compliant and buggy devices?

@erlend-aasland
Copy link
Contributor Author

To answer myself: I believe the most sane solution is to discard unknown states, similar to my PoC patch in #500 (comment)

@sveinse
Copy link
Collaborator

sveinse commented Jul 8, 2024

To answer myself: I believe the most sane solution is to discard unknown states, similar to my PoC patch in #500 (comment)

I'm conflicted about it. By discarding any unknown states, it can never be used to communicate with nodes that have some non-standard behavior (for whatever reason).

@erlend-aasland
Copy link
Contributor Author

That's a valid concern. Draft for an alternative patch:

diff --git a/canopen/nmt.py b/canopen/nmt.py
index 401ad15..d8951a7 100644
--- a/canopen/nmt.py
+++ b/canopen/nmt.py
@@ -89,7 +89,7 @@ class NmtBase:
         if self._state in NMT_STATES:
             return NMT_STATES[self._state]
         else:
-            return self._state
+            return str(self._state)
 
     @state.setter
     def state(self, new_state: str):
@@ -122,6 +122,11 @@ class NmtMaster(NmtBase):
             logger.debug("Received heartbeat can-id %d, state is %d", can_id, new_state)
             for callback in self._callbacks:
                 callback(new_state)
+            if new_state not in NMT_STATES:
+                log.warning(
+                    "Received heartbeat can-id %d with invalid state %d",
+                    can_id, new_state
+                )
             if new_state == 0:
                 # Boot-up, will go to PRE-OPERATIONAL automatically
                 self._state = 127

@erlend-aasland
Copy link
Contributor Author

Related: the conditional return in NmtBase.state could probably be sped up by using a try...except clause, but I'm not sure it's worth it, unless .state is expected to be a part of a hot loop.

@acolomb
Copy link
Collaborator

acolomb commented Jul 8, 2024

The reason for passing out unkown values as integers is to maximize forward compatibility. The CANopen standard may not define more NMT states now, but what happens when another state is added in a later revision? Or by a buggy / noncompliant device.

I think it's perfectly fine to return something like "UNKNOWN STATE 123" in these cases. The log warning is too verbose, as that would lead to the logs being flooded with every heartbeat transmission. We already have the numeric values in a debug message, that's enough to troubleshoot.

As we don't really handle any behavioral changes based on NMT state, but just convert it to a string for the application to consume, I think we're fine returning these UNKNOWN strings.

Feel free to propose a patch including the try... except change. It is an exceptional condition, and we can avoid the duplicate effort of key in m and m[key]. The alternative m.get(k, default=...) is not any better because it would always calculate the default value expression. I don't expect any of this to be used in hot code-paths, but still more pythonic to handle an exception here IMHO.

@erlend-aasland
Copy link
Contributor Author

I agree regarding the log warning. I'll create a PR with my draft diff from #500 (comment), with the optimisation applied.

@erlend-aasland
Copy link
Contributor Author

Well, first, let me shave some yak; I'd prefer if NMT stuff was covered well by the test suite before changing the behaviour. Not to mention that some of the existing tests depend on hard-coded timings (I've observed flakiness while running the test suite locally).

@acolomb
Copy link
Collaborator

acolomb commented Jul 8, 2024

Hm, since this is only affecting devices that return a non-standardized state number, I think we're free to change whatever. But alright, let's look at tests first. I'm very grateful that you put such an emphasis on that part, as I personally could never really warm up with the whole unit testing topic.

@erlend-aasland
Copy link
Contributor Author

FTR; I'll split the test improvements into several PRs, for the reviewers convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants