Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing size of nvflag in struct Namval breaks things #1038

Open
krader1961 opened this issue Dec 1, 2018 · 12 comments
Open

Changing size of nvflag in struct Namval breaks things #1038

krader1961 opened this issue Dec 1, 2018 · 12 comments
Assignees
Labels
blocker This issue is a blocker for another issue. bug

Comments

@krader1961
Copy link
Contributor

The discussion about support for JSON in issue #820 notes that symbols NV_JSON and NV_JSON_LAST are aliases for other symbols. Which is dangerous and shouldn't be necessary. So I opened PR #1037 to start work on eliminating such aliases. The next step in that cleanup is to change the definition of struct Namval member nvflag to be wider than an unsigned short. However turning it into a 32 bit unsigned int breaks unit tests. Even with no changes to the bit field symbols.

That should be impossible. That is, the size of that bit field should have zero effect on the behavior of the code as long as the size is wide enough to accommodate all bit values stored in that structure member. And converting from unsigned short (16 bits) to uint32_t (32 bits) is guaranteed to satisfy that constraint. This diff breaks 18 unit tests on macOS that pass without this change:

diff --git a/src/cmd/ksh93/include/nval.h b/src/cmd/ksh93/include/nval.h
index 66462dd9..ee293a5b 100644
--- a/src/cmd/ksh93/include/nval.h
+++ b/src/cmd/ksh93/include/nval.h
@@ -104,7 +104,7 @@ struct Namdecl {
 struct Namval {
     Dtlink_t nvlink;        // space for cdt links
     char *nvname;           // pointer to name of the node
-    unsigned short nvflag;  // attributes
+    unsigned int nvflag;  // attributes
 #if _ast_sizeof_pointer >= 8
     uint32_t nvsize;  // size or base
 #else

Something is seriously wrong with the code that allocates or uses a struct Namval. The aforementioned change should have zero effect on the behavior of the code other than the amount of memory used.

@krader1961 krader1961 added the bug label Dec 1, 2018
@krader1961 krader1961 self-assigned this Dec 1, 2018
@krader1961
Copy link
Contributor Author

krader1961 commented Dec 1, 2018

Running this on macOS results in the "double-free" ASAN failure shown below:

meson -DASAN=true
ninja >x 2>&1
meson test --setup=asan -t4 exit

The ast_realloc() call and subsequent attempt to free the same storage a second time is highly suggestive of a miscalculation of the size of a namval structure. Which is a problem we've seen in other contexts. And is why I opened issue #1002.

==26382==ERROR: AddressSanitizer: attempting double-free on 0x607000003350 in thread T0:
    #0 0x10451c1bd in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x551bd)
    #1 0x1042bd32b in ast_free vmbusy.c:50
    #2 0x103e5adbf in b_cd cd_pwd.c:298
    #3 0x1040f80fb in sh_exec xec.c:1102
    #4 0x104103965 in sh_exec xec.c:1792
    #5 0x10400bf0b in exfile main.c:515
    #6 0x104010082 in sh_main main.c:313
    #7 0x103e57525 in main pmain.c:41
    #8 0x7fff79e9508c in start (libdyld.dylib:x86_64+0x1708c)

0x607000003350 is located 0 bytes inside of 74-byte region [0x607000003350,0x60700000339a)
freed by thread T0 here:
    #0 0x10451c387 in wrap_realloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x55387)
    #1 0x1042bd233 in ast_realloc vmbusy.c:40
    #2 0x104031c63 in nv_putval name.c:1692
    #3 0x103e5ab65 in b_cd cd_pwd.c:292
    #4 0x1040f80fb in sh_exec xec.c:1102
    #5 0x104103965 in sh_exec xec.c:1792
    #6 0x10400bf0b in exfile main.c:515
    #7 0x104010082 in sh_main main.c:313
    #8 0x103e57525 in main pmain.c:41
    #9 0x7fff79e9508c in start (libdyld.dylib:x86_64+0x1708c)

previously allocated by thread T0 here:
    #0 0x10451609c in wrap_strdup (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4f09c)
    #1 0x1040db1a5 in sh_subshell subshell.c:526
    #2 0x103fedbff in comsubst macro.c:2005
    #3 0x103ff04fa in varsub macro.c:1126
    #4 0x103fe1536 in copyto macro.c:599
    #5 0x103fdde44 in sh_mactrim macro.c:186
    #6 0x104014dcb in sh_setlist name.c:284
    #7 0x1040f4117 in sh_exec xec.c:931
    #8 0x10400bf0b in exfile main.c:515
    #9 0x104010082 in sh_main main.c:313
    #10 0x103e57525 in main pmain.c:41
    #11 0x7fff79e9508c in start (libdyld.dylib:x86_64+0x1708c)

@krader1961
Copy link
Contributor Author

I verified that changing the size of the struct Namval member nvflag breaks the code even without the changes in PR #1037. Something is seriously FUBAR involving how this structure is used.

@krader1961 krader1961 added the blocker This issue is a blocker for another issue. label Dec 1, 2018
@krader1961
Copy link
Contributor Author

Very curious. Adding padding to either side of the nvflag member does not cause any problems. So it isn't the size of the struct or the offset of the nvflag member that is the issue. Looking at all the places nvflag is used or modified I don't see any obvious reason changing its size (i.e., width) would cause these failures. Especially since none of the values assigned to it have been changed.

@krader1961
Copy link
Contributor Author

krader1961 commented Dec 2, 2018

Bingo! I added a strategic printf to line 292 in cd_pwd.c to display the pwdnod->nvflag value. When that struct member is an unsigned short it is either 0x3200 or 0x2200. When it is an unsigned int it is 0x103200 or 0x102200. WTF! The 0x100000 is this symbol:

#define NV_NOFAIL 0x100000   // return 0 on failure, no msg

Which is only meant to be passed as a flag to nv_open(). It is not meant to get merged into the nvflag member of a Namval_t. I wonder how many bugs of this nature exist in the code. I'll be surprised if this is the only example. Also, there has to be a related bug because the presence of that unexpected bit in the pwdnod->nvflag value should not have affected whether it was freed. That is, it appears some other portion of the code is incorrectly determining if the NV_NOFREE bit is set. Equivalently, since an otherwise undefined nvflag bit was set it should not have affected the behavior.

@krader1961
Copy link
Contributor Author

OMFG! See the setall() function in src/cmd/ksh93/bltins/typeset.c. The code deliberately conflates symbols that can be safely stored in a nvflag member of a Namval_t with symbols that cannot. It simply assumes the latter values won't be stored because they cannot be represented in the nvflag type (i.e., an unsigned short) and will thus be silently dropped by the compiler when it does the assignment. That is, truncating an unsigned int to an unsigned short. This is far too clever.

Worse is there are two non-nvflag symbols that are valid in an unsigned short:

#define NV_ADD 8             // add node if not found
#define NV_IDENT 0x80        // name must be identifier

The former corresponds to NV_UTOL while the latter is currently not used as a nvflag bit value.

There are likely other places in the code which assumes such value truncations on assignment will silently strip unwanted bits.

@krader1961
Copy link
Contributor Author

The former corresponds to NV_UTOL while the latter is currently not used as a nvflag bit value.

That turns out not to be true. There are at least three places, and probably more, where NV_IDENT is explicitly used as a bit in the nvflag member of Namval_t. See the src/cmd/ksh93/sh/subshell.c module. Those uses have nothing to do with the original intent of this symbol to require that the nv_open() function validate the name is a valid identifier. So once again we see a deliberate overloading of a symbol solely because it was convenient to do so despite it making the meaning of the code less clear.

@krader1961
Copy link
Contributor Author

krader1961 commented Dec 3, 2018

This code is a white hot mess. Consider this definition src/cmd/ksh93/include/name.h:

#define NV_BLTIN (NV_NOPRINT | NV_EXPORT)

Yet its related symbol is in src/cmd/ksh93/include/nval.h:

#define NV_BLTINOPT NV_ZFILL  // mark builtins in libcmd

Those two symbols should be in the same header (nval.h). And that is true for all other NV_ symbols in name.h such as NV_NOPRINT. And none of them should be defined in terms of unrelated bit values.

@krader1961
Copy link
Contributor Author

Another OMFG bogosity. This symbol is meant to only be used as a flag to nv_open():

#define NV_NOREF NV_REF       // don't follow reference

See man src/cmd/ksh93/nval.3. It is not meant to be a valid nvflag value. But simply redefining it to #define NV_NOREF (1 << 24), so it is no longer a valid nvflag bit, breaks the code. I can't even begin to wrap my mind around why the person who wrote this code thought overloading NV_REF this way was a good idea. This is also true of #define NV_ASSIGN NV_NOFREE and probably other symbols.

This explains why so many bugs are being found by ASAN, Coverity Scan, and debug malloc implementations. The writers of this code were being far too clever. The situation wouldn't be nearly as bad had they used proper interfaces that validated their arguments rather than preprocessor macros.

@krader1961
Copy link
Contributor Author

FWIW, My intent is to widen nvflag from an unsigned short to an unsigned 32-bit int. And make every symbolic value valid for that struct member to have bit 31 set. Which will make it possible to distinguish between values valid as a Namval_t attribute from those meant to be used in calls to nv_open(), etc. but not stored directly in the nvflag struct member. This is far from perfect and will allow many aliasing bugs to slip through. But it would be a significant step towards properly separating these name spaces.

@krader1961
Copy link
Contributor Author

krader1961 commented Apr 28, 2019

I confirmed that the only places that modify (struct Namval).nvflag are these three functions: nv_setattr(), nv_offattr(), nv_onattr(). So I instrumented those functions. The first two never attempt to merge illegal bits into the bit field. The latter, however, always does so. And when it attempts to turn on specific attrs the only one ever present is NV_EXPORT; and even that is often not present meaning the call is a no-op. The problematic calls are all from here:

nv_onattr(np, flags & NV_ATTRIBUTES);

Note the masking with NV_ATTRIBUTES defined here:

#define NV_ATTRIBUTES \
(~(NV_NOSCOPE | NV_ARRAY | NV_NOARRAY | NV_IDENT | NV_ASSIGN | NV_REF | NV_VARNAME | NV_STATIC))

That is the sole use of the symbol. The intent appears to be to ensure those specific attrs are not turned when the aforementioned nv_onattr() call occurs. Why? I have no idea and I'll bet if you asked the old AST team they probably couldn't give the correct answer. Also, note that NV_NOSCOPE, NV_NOARRAY, NV_ASSIGN, NV_VARNAME, and NV_STATIC all refer to bits that cannot be represented in an unsigned short. The only symbols in that list which can be present in the nvflag structure member is NV_ARRAY, NV_IDENT (an alias for NV_MISC) and NV_REF. So the correct definition is

#define NV_ATTRIBUTES ~(NV_ARRAY | NV_IDENT | NV_REF)

That only some of the NV_* symbols that are not valid in an nvflag are in that definition means it is broken in two ways: 1) those invalid symbols aren't necessary due to the implicit truncation to 16 bits, and 2) if those invalid symbols are necessary then quite a few more need to be included.

@krader1961
Copy link
Contributor Author

I pondered the code I mentioned in my previous comment some more. It is clear the only reason it also masks out the symbols that are otherwise legal in a nvflag is because they are aliased to symbols that only have meaning when calling nv_open(). For example, #define NV_NOREF NV_REF. This is why APIs like nv_open() need to be changed so as not to conflate the two sets of symbols.

@krader1961
Copy link
Contributor Author

krader1961 commented Apr 28, 2019

Also, NV_ARRAY is never present in the flags value when nv_onattr(np, flags & NV_ATTRIBUTES) is executed. So it is almost a certainty that at some point in the past (but not now) NV_ARRAY was aliased to a symbol that was only meaningful for the behavior of nv_open() rather than being intended to be stored in nvflag. Yet another example of bit-rot in the code and why mixing (or conflating) namespaces in this fashion is a really bad idea.

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Dec 26, 2024
This commit refactors the local builtin to take advantage of some
changes I intend to submit later as separate pull requests. These are:

- np->nvflags has been expanded from 16-bit to 32-bit. This fixes
  the longstanding limitation detailed in a previous ksh2020 bug:
  att#1038.
  The fix consists of masking out nv_open-centric flags with
  explicit uses of the new NV_OPENMASK, rather than implicitly and
  obtusely abusing the limitations of unsigned short types.
  Here, I take advantage of it for storing NV_DYNAMIC in np->nvflags
  and using it from there, which is a safer and more simple choice
  long term.

- A bugfix from ksh93v- 2012-08-24 has been backported to fix
  a potential segfault that is triggered under ASan when changing
  the sizes of np->nvsize and np->nvflags to uint32_t. This cannot
  be triggered on the dev branch, but I will submit this individually
  soon enough. (I would have already submitted it had I not been
  forced to deal with a broken PSU shortly after submitting ksh93#805;
  0 out of 10 experience don't recommend).

- Updated comment regarding pitfalls in the ksh93v- and current
  implementation of local.

In any case I intend to separate out various fixes from this branch and
submit them as separate pull requests, if only to eventually reduce the
scope of this one to just the local builtin and nothing else (the
sh_funscope refactor for instance fixes an out of bounds bug that has
been on the dev branch for far too long). Unfortunately my life has
been quite hellish lately, so there is no ETA yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is a blocker for another issue. bug
Projects
None yet
Development

No branches or pull requests

1 participant