Revert "dtc: Consider one-character strings as strings" #121

flto · 2023-10-22T21:44:14Z

This reverts commit 9d7888c.

This commit broke decompilation in certain cases. For example:
regulator-min-microvolt = <0x00324b00>;
is now decompiled (incorrectly) as:
regulator-min-microvolt = "\02K";

This reverts commit 9d7888c. This commit broke decompilation in certain cases. For example: regulator-min-microvolt = <0x00324b00>; is now decompiled (incorrectly) as: regulator-min-microvolt = "\02K"; Signed-off-by: Jonathan Marek <[email protected]>

dgibson · 2023-10-23T00:19:38Z

Decompilation of dtbs, like most decompilations, is an approximate process. Although we can guarantee we get something that will recompile into the same dtb, we can't guarantee we'll have the same dts as the original: the type information that's implied in the dts is simply lost along the way. In the dts format "\02K" and <0x00324b00> are different ways of expressing exactly the same bytes (so are [00324b00], /bits/ 16 <0x0032 0x4b00> or "", "2K", or even /bits/ 16 <0x0032>, "K").

The function affected here, guess_value_type() is exactly that - a guess at a likely original type. There's no way for it to be correct 100% of the time.

So in order to make a change here, it's not sufficient to say it's "wrong" for some examples, you need to make the case that the cases it now gets "right" (i.e. more readable / less surprising) are more common or more important than the cases it now gets "wrong" (less readable / more surprising).

flto · 2023-10-23T01:41:32Z

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

dgibson · 2023-10-23T02:05:58Z

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

Oh, right, good point.

But, reverting this patch is only papering over the problem. I'm pretty sure there exist other examples that will trigger the same problem even with the revert in place. The real bug here is in write_propval_string. It's meaning for its output to be interpreted as '\0' then '2' then 'K' then '\0', which would come out to the correct bytes. However, because there happens to be a digit after the \0 it's instead interpreted as '\02'.

I guess we need to have write_propval_string look ahead to the next byte, and if it's a valid octal digit, we need to emit \x00 (or \000) instead of just \0. The lookahead will be kind of a pain, but it needs to be done. I supposed we could just emit \x00 all the time, but I think that ugliness in the output is worse than the internal ugliness of the lookahead.

flto · 2023-10-23T12:03:35Z

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code), and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

dgibson · 2023-10-23T14:33:59Z

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code),

We still need to fix write_propval_string for the case where we have type information telling us to output a particular section as a string. Though, yes, usually breaking strings on \0 is a good idea.

and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

Yes, we already use hex escapes for most characters.

dgibson · 2023-10-24T02:55:06Z

Oh dear. I had a closer look into this and it's a real can of worms.

We've been handling \xXX escapes wrong (as in, not like most C compilers) forever.
The compiler and printf(1) allow any number of hex digits after, not just a fixed 2
Which in turn means our escaped output is also wrong where we have something needing a \x escape followed by a hex digit
There's no direct way to force the termination of an escape, so ugly as they are octal escapes are actually safer than hex escapes, since they will always terminate after three characters.

dgibson force-pushed the main branch from 83d011c to ae97d97 Compare March 6, 2024 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "dtc: Consider one-character strings as strings" #121

Revert "dtc: Consider one-character strings as strings" #121

flto commented Oct 22, 2023

dgibson commented Oct 23, 2023

flto commented Oct 23, 2023

dgibson commented Oct 23, 2023 •

edited

Loading

flto commented Oct 23, 2023 •

edited

Loading

dgibson commented Oct 23, 2023

dgibson commented Oct 24, 2023

Revert "dtc: Consider one-character strings as strings" #121

Are you sure you want to change the base?

Revert "dtc: Consider one-character strings as strings" #121

Conversation

flto commented Oct 22, 2023

dgibson commented Oct 23, 2023

flto commented Oct 23, 2023

dgibson commented Oct 23, 2023 • edited Loading

flto commented Oct 23, 2023 • edited Loading

dgibson commented Oct 23, 2023

dgibson commented Oct 24, 2023

dgibson commented Oct 23, 2023 •

edited

Loading

flto commented Oct 23, 2023 •

edited

Loading