Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug safety feature: runtime undefined value detection #211

Open
andrewrk opened this issue Nov 8, 2016 · 6 comments
Open

debug safety feature: runtime undefined value detection #211

andrewrk opened this issue Nov 8, 2016 · 6 comments
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Nov 8, 2016

If the programmer initializes a variable to undefined or otherwise sets the value to undefined anywhere, we can secretly make the type of the variable a maybe type and have a bit to keep track of whether it is undefined at any given point in time. Then if the programmer tries to use a value which is undefined, we detect it with a runtime check, and crash with a stack trace.

@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Nov 8, 2016
@andrewrk andrewrk added this to the 0.3.0 milestone May 7, 2017
@andrewrk andrewrk modified the milestones: 0.3.0, 0.2.0 Sep 17, 2017
@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Oct 20, 2017
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018
@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. accepted This proposal is planned. and removed enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Feb 3, 2019
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Feb 3, 2019
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Aug 28, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 10, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020
@andrewrk andrewrk added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Oct 9, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@tbodt
Copy link

tbodt commented Nov 19, 2021

I wonder if you could make use of "undefined" not UB. In what I'm imagining, the bit pattern would be unspecified, i.e. whatever is in the stack or register or whatever, but LLVM would not optimize out things that use the undefined data, and would not have permission to reformat your hard drive.

@matu3ba
Copy link
Contributor

matu3ba commented Jan 3, 2023

I wonder if you could make use of "undefined" not UB.

This does not work in general unless you opt out of performance: https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html https://www.ralfj.de/blog/2021/11/24/ub-necessary.html

Zigs compilation modes for safety checks are exactly this with varying degree of performance (Debug vs ReleaseSafe).

@Jarred-Sumner
Copy link
Contributor

I think this or something like it would've saved me 5 hrs yesterday

https://twitter.com/jarredsumner/status/1618174627888664576

@mlugg
Copy link
Member

mlugg commented May 5, 2023

Just opened a duplicate with some concrete representations, so I'll post those here:


Our goal here is to identify any type with an unused value which we can reserve to mean undefined.

bool can use a padding bit.

A type uX or iX where X is not power of two or is under 8 (i.e. X = 8, 16, 32, ...) can use a padding bit.

An exhuastive enum with unused tags can have a dummy tag added to represent undefined. A non-exhaustive enum defers to the above rule for its tag type.

A struct should not itself be marked as undefined, but rather all of its fields (where possible) should be recursively marked as such. A union or union(enum) can have its tag set to undefined where possible, or have an extra bit otherwise. There's nothing we can (consistently) do for a packed struct, packed union, extern struct, or extern union.

An undefined array is equivalent to an array full of undefined values.

We can't do anything about "standard" (ABI-allowed) nullable pointers, but slices could be made larger if necessary. For non-nullable pointers and slices, we could use the null pointer value.

Other optionals can use a padding bit.

Error sets can use a special tag (maybe maxInt(u16) or similar) to represent undefined. Error unions could do the same.

Vectors, like arrays, can set their elements to undefined, but of course this will only work for base types which are an int type with a non-power-of-two number of bits (at least 8).

I don't know how async frames are represented, but we can surely just add an extra bit if necessary.

That leaves the following non-zero-bit runtime types which we can't represent undefined for:

  • uX/iX for X = 8, 16, 32, 64, ...
  • f32/f64/f80
  • [*c]T, ?*T, ?[*]T
  • packed struct, packed union
  • extern struct, extern union
  • non-exhaustive (or exhaustive-with-all-tags-used) enum(T) where T is one of the int types above
  • any array or vector of the above

That's actually not bad! Most "interesting" types can represent undefined. Even if we don't want to increase the size of anything compared to today, that only excludes a few more types: some unions, nullable slices, and some non-pointer optionals. (Related to the last case: #104).

@tecanec
Copy link
Contributor

tecanec commented Jun 22, 2023

Some processes do not need their input to be defined.

For example, let's say we're memcopying a bunch of stuff. Some of that stuff may be undefined, but we know it won't be read at the new location, so it does not matter. Throwing an error here would only serve to obstruct the programmer by preventing them from using undefined when appropriate.

More generally, let's say we have a large array of stuff that needs processing. Let's assume that this array may contain undefined data, but that we're able to confirm that there's no danger in processing these undefined entries, and that it is more efficient to process these entries regardless than it is to skip or filter them out. If we care significantly about the performance of such a loop, then we don't want an error whenever the data we process happens to be undefined.

The core of the problem is that the very purpose of undefined is to say "I don't care", and it is difficult for the compiler to know when the programmer starts caring, if ever. There are very few cases where the programmer must care, but those are limited to cases where undefined data could directly cause a crash (such as pointer dereferencing and while-loop conditions). Otherwise, the programmer may well have figured out exactly how undefined inputs would affect a process and manually deemed it safe. One also has to consider cases where the data is only partially undefined, and where the non-undefined part makes the data safe. (For example, it should be perfectly safe to read from an 256-entry lookup table when the index is an undefined u8.)

In addition to all of this, we've also got cases where we may want to read the data regardless of it being undefined, such as print-debugging.

I am not totally against some manner of anti-undefined run-time checks as long as they do not hinder such use-cases, but I do believe that such additions would require more changes to the language itself than simply requiring these checks, perhaps in a manner similar to the likes of @volatileCast and @constCast that remove safety checks while being a no-op at run-time even in debug builds. Alternatively, we may start distinguishing between undefined as temporarily uninitialized data versus safe-to-use placeholder data.

@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jun 29, 2023
@ifreund
Copy link
Member

ifreund commented Jul 5, 2023

This is closely related to #63, but that issue doesn't seem to have been linked here yet so I thought I should drop a comment doing so.

#63 (comment) brings up an alternative and more general approach than the one proposed here, though that approach also has its own drawbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
Status: To do
Development

No branches or pull requests

7 participants