Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "dtc: Consider one-character strings as strings" #121

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flto
Copy link

@flto flto commented Oct 22, 2023

This reverts commit 9d7888c.

This commit broke decompilation in certain cases. For example:
regulator-min-microvolt = <0x00324b00>;
is now decompiled (incorrectly) as:
regulator-min-microvolt = "\02K";

This reverts commit 9d7888c.

This commit broke decompilation in certain cases. For example:
   regulator-min-microvolt = <0x00324b00>;
is now decompiled (incorrectly) as:
   regulator-min-microvolt = "\02K";

Signed-off-by: Jonathan Marek <[email protected]>
@dgibson
Copy link
Owner

dgibson commented Oct 23, 2023

Decompilation of dtbs, like most decompilations, is an approximate process. Although we can guarantee we get something that will recompile into the same dtb, we can't guarantee we'll have the same dts as the original: the type information that's implied in the dts is simply lost along the way. In the dts format "\02K" and <0x00324b00> are different ways of expressing exactly the same bytes (so are [00324b00], /bits/ 16 <0x0032 0x4b00> or "", "2K", or even /bits/ 16 <0x0032>, "K").

The function affected here, guess_value_type() is exactly that - a guess at a likely original type. There's no way for it to be correct 100% of the time.

So in order to make a change here, it's not sufficient to say it's "wrong" for some examples, you need to make the case that the cases it now gets "right" (i.e. more readable / less surprising) are more common or more important than the cases it now gets "wrong" (less readable / more surprising).

@flto
Copy link
Author

flto commented Oct 23, 2023

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

@dgibson
Copy link
Owner

dgibson commented Oct 23, 2023

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

Oh, right, good point.

But, reverting this patch is only papering over the problem. I'm pretty sure there exist other examples that will trigger the same problem even with the revert in place. The real bug here is in write_propval_string. It's meaning for its output to be interpreted as '\0' then '2' then 'K' then '\0', which would come out to the correct bytes. However, because there happens to be a digit after the \0 it's instead interpreted as '\02'.

I guess we need to have write_propval_string look ahead to the next byte, and if it's a valid octal digit, we need to emit \x00 (or \000) instead of just \0. The lookahead will be kind of a pain, but it needs to be done. I supposed we could just emit \x00 all the time, but I think that ugliness in the output is worse than the internal ugliness of the lookahead.

@flto
Copy link
Author

flto commented Oct 23, 2023

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code), and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

@dgibson
Copy link
Owner

dgibson commented Oct 23, 2023

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code),

We still need to fix write_propval_string for the case where we have type information telling us to output a particular section as a string. Though, yes, usually breaking strings on \0 is a good idea.

and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

Yes, we already use hex escapes for most characters.

@dgibson
Copy link
Owner

dgibson commented Oct 24, 2023

Oh dear. I had a closer look into this and it's a real can of worms.

  • We've been handling \xXX escapes wrong (as in, not like most C compilers) forever.
  • The compiler and printf(1) allow any number of hex digits after, not just a fixed 2
  • Which in turn means our escaped output is also wrong where we have something needing a \x escape followed by a hex digit
  • There's no direct way to force the termination of an escape, so ugly as they are octal escapes are actually safer than hex escapes, since they will always terminate after three characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants