A Guide to Undefined Behavior in C and C++ (2010)

A Guide to Undefined Behavior in C and C++ (2010)

(blog.regehr.org)

by GarethX

jasonthorsness

12h

This is an area the newer languages get right - I don’t think Rust or Go has any undefined behavior? I wish they would have some kind of super strict mode for C or C++ where compilation fails unless you somehow fix things at the call sites to tell the compiler the behavior you want explicitly.

zyedidia

11h

I think data races can cause undefined behavior in Go, which can cause memory safety to break down. See https://research.swtch.com/gorace for details.

jjmarr

11h

> I wish they would have some kind of super strict mode for C or C++ where compilation fails unless you somehow fix things at the call sites to tell the compiler the behavior you want explicitly.

The C++ language committee _does not_ want to add more annotations to increase memory safety.

anon-3988

11h

Not even annotations. The committee standardize this https://en.cppreference.com/w/cpp/container/span/operator_at

So they clearly doesn't care so there's no point convincing them.

maxlybbert

The committee tends to also provide bounds checked interfaces ( https://en.cppreference.com/w/cpp/container/span/at ). But that requires people read the documentation, and based on the number of people who I see write "std::endl" when they really want "'\n'", I don't have much hope for that ("std::endl" both sends '\n' to the stream, and flushes it; people are often surprised about the stream getting flushed).

AlotOfReading

We've known since the 80s that programmers almost always choose the more ergonomic interface and telling people they're holding it wrong doesn't scale.

Besides, throwing an exception is a terrible way to do range checking. There's a huge number of projects out there banning exceptions that would benefit from safe interfaces.

shiomiru

> Besides, throwing an exception is a terrible way to do range checking. There's a huge number of projects out there banning exceptions that would benefit from safe interfaces.

I thought "banning exceptions" would mean fno-exceptions, turning the throw into an abort - that's pretty good for a systems programming language, no?

What other way would you propose?

anon-3988

Make it explicit that the API can fail. Panicking is not acceptable in systems programming. Force me to handle all the cases.

imtringued

I'm not trying to be provocative, but I genuinely am not seeing what the benefit of this "doctrine" is if the end result is a bunch of if else statements that do nothing, but bubble the error up and then exit the process anyway.

Panicking is in no way different from say Undefined Behavior, with the exception that panicking tends to be "loud" and therefore fixed promptly.

tialaramex

11h

Several programming languages can testify to the fact than a Benevolent Dictator For Life is not a panacea. Several more than testify that having a Committee to design the language is likewise not a panacea. Perhaps uniquely C++ can clarify for us that both is in fact worse than either alone.

pjmlp

Same applies to C.

tialaramex

Do you see Brian and Dennis dominating WG14 meetings? Nope. They moved on, Bjarne Stroustrup never did. After his initial prototypes and his first book about C++ he's written lots more books and papers, he's lectured classes, he's given huge numbers of talks, all of them about his baby, C++. If you ask WG21 people directly they'll insist he's just one vote (ah yes, JTC1 consensus "voting") but for example WG21 says it will heed the "advice" of its Direction Group, a self-selecting handful of people which is dedicated to following advice from a book written by Bjarne and weirdly always giving exactly the same advice as Bjarne, which makes sense because its most notable member is Bjarne but this advice is signed "The Direction Group" ...

It's like being surprised that the UN Security Council keeps making decisions which favour Russia.

pjmlp

It doesn't change the fact C is equally a design by committee language with all the negativity it entails.

In fact, WG14 very clearly has acted against Dennis when he submitted papers that could have improved C's safer.

Maybe his fat pointers proposal was not good enough, but apparently is wasn't something worth improving upon either.

C authors indeed moved on, first with Alef (which granted had a few design issues), Limbo and finally Go, as C as being driven by WG14 was no longer their thing, C on Plan 9 isn't even C89 compliant.

steveklabnik

12h

Safe Rust has no undefined behavior. Unsafe Rust does.

vlovich123

cough std::env::set_var cough :D.

whytevuhuni

std::env::set_var [1] has already been changed to unsafe in the 2024 edition of the compiler [2].

So yeah, such things exist, but what's important is what the compiler devs choose to do once such issues are found. The C++ compiler devs say "That's an unfortunate case that cannot be fixed." The Rust devs say "That's a bug, here's the issue link."

[1] https://doc.rust-lang.org/std/env/fn.set_var.html

[2] https://doc.rust-lang.org/edition-guide/rust-2024/newly-unsa...

ultimaweapon

To be fair the UB caused by this function come from the underlying C implementation and this function already marked as unsafe on 2024 edition.

pjmlp

Older languages as well, those that weren't a copy-paste from C with extras.

Modula-2, Ada, Object Pascal, Eiffel, Delphi,...

almostgotcaught

11h

you realize UB is basically an escape hatch from the standard for compilers right? it's not like a flaw in the language, it's gaps negotiated by the standards committee (well for the most part i guess). so the reason new languages don't have UB is because new languages don't have multiple implementations (go definitely doesn't, does anyone use rust-gcc?).

Dylan16807

11h

You only need implementation-defined behavior to grease the wheels of multiple implementations. You don't need the gaping void of undefined for that use.

There's a big difference between "it'll be some number, not promising which one" and "the program loses definition and anything can break, often even retroactively".

almostgotcaught

11h

Potato potato. My point is UB isn't an accident it's intentional. Mind you I'm not saying it's great, just that it's not some kind of slipup.

maxlybbert

"Implementation defined" and "undefined" are different things.

On my laptop, sizeof(long) is 8; that's implementation defined. It could be different on my phone, or my desktop, or my work laptop.

Undefined means, roughly, "doing X is considered nonsensical, and the compiler does not have to do anything reasonable with code that does X." In "Design and Evolution of C++," Stroustrup says that undefined behavior applies to things that should be errors, but that for some reason the committee doesn't think the compiler will necessarily be able to catch. When he came up with new ideas for the language, he would often have to choose between making a convoluted rule that his compiler could reliably enforce, or a simple rule that his compiler couldn't always give a sensible error message for.

For instance, the original compilers relied on the system's linker. If the compiler could interact with the linker, it could perhaps detect violations of the One Definition Rule ( https://en.cppreference.com/w/cpp/language/definition ), but since the linker might have been written by a completely different company, and it's acceptable for different source files to be compiled by different compilers (and even be written in other languages -- https://en.cppreference.com/w/cpp/language/language_linkage ) and put together by the linker, and it's common for binary libraries to be sold without source, there's no guarantee that the compiler will ever have the information necessary to detect a violation of the One Definition Rule. So the committee says that violations create a nonsense program, which isn't required to behave in any particular way.

Dylan16807

Do you think my last sentence is describing things incorrectly? I don't really understand how you could take that depiction and call it "potato potato".

wolvesechoes

"go definitely doesn't"

What? gccgo, TinyGo, and GopherJS.

zombot

The most horrifying aspect of UB is that it can affect your program without the instructions triggering it ever being executed. And many greenhorns don't know that or even believe it to be false. So the effects of Dunning-Kruger may be more severe in C(++) than in other languages.

guimplen

12h

The first example (signed integer overflow) is no longer valid in newer standards of C. Now it should use the two-complement semantics and no UB.

Rusky

11h

I believe they only standardized the two's-complement representation (so casts to unsigned have a more specific behavior, for example) but they did not make overflow defined.

LegionMammal978

10h

Yeah, signed integer overflow is as UB as ever. I've heard the primary reason for it is to avoid the possibility of wraparound on 'for (int i = 0; i < length; i++)' loops where the 'length' is bigger than an int. (Of course, the more straightforward option would be to use proper types like size_t for all your indices, but it's a classic tradition to use nothing but char and int, and people judge compilers based on existing code.)

vlovich123

> I've heard the primary reason for it is to avoid the possibility of wraparound on

Making it UB doesn’t fix that in any way that I can think of.

anttihaapala

What it means is that since i as the variable is monotonically increasing, an array indexing operation that is in the loop body can be replaced with an incrementing pointer instead, which eliminates quite a lot of code. An example here: https://pvs-studio.com/en/blog/posts/cpp/0374/

ForTheKidz

ptrdiff_t is also useful in this case if signed semantics are desired.

mwkaufma

Despite the high frequency that alarmist "formatting you HDD" is cited in discussing UB, I've never seen it happen. Surely there exist real examples of catastrophic failures which could actually teach us something, beyond making a hyperbolic point.

vlovich123

It was intentionally hyperbolic tongue in cheek and understood to be as such. The reason is to fight through the discounting people (at least at the time) had that UB was just a segfault or something. Here’s a kernel exploit that was a result of UB [1]. It’s not hard to imagine that hypothetically UB in the kernel could result in “just so” corruption that would call the “format your HDD routine” even if in practice it’s extremely unlikely (& forensically it would basically be impossible to prove that it was UB that caused it).

https://lwn.net/Articles/342330/

AlotOfReading

One example is control flow bending [0], which uses carefully crafted undefined behavior to "bend" the CFG into arbitrary, turing complete shapes despite CFI protections. The author abused this to implement tic tac toe in a single call to printf for a prior obfuscated C contest [1].

Of course, that misses the real point that "formatting your HDD" is simply an allowed possibility rather than a factual statement on the consequences.

[0] https://www.usenix.org/conference/usenixsecurity15/technical...

[1] https://www.ioccc.org/2020/carlini/index.html

fsckboy

13h

my opinion as a very experienced C system programmer:

there must be better sources to guide people than a poorly written and infantilizing article from 15 years ago.

jcranmer

My experience is that self-described "very experienced C system programmers" are simultaneously the people who are most in need of a good explainer on undefined behavior and the most likely ones to throw a conniption fit halfway through and stop reading, for the hallmark of a good explainer on UB is that it will explain that a) it exists for a reason; b) no, just doing a "little" UB isn't safe; and c) it's not the compiler's fault that things go awry when you do UB, it's the programmer's fault.

One of the blog posts I've long queued up for writing is "In defense of undefined behavior." It's only half-written, though, but the gist is justifying UB by pointing out that you can't optimize C code with it (via an example using pointer provenance), then pointing out why uninitialized values look weirder than you think by reference to the effects of system libraries, and then I would actually walk through why specification authors should reach for undefined behavior in various places.

raphlinus

Oh hey, I also have "in defense of undefined behavior" in the queue of blog posts I'd like to write some time, with that exact title. What a coincidence. That said, it's unlikely to get written as I have things that are more specific to my actual research ahead of it.

One of the things I'd want to say is that UB is a useful and accurate way to model what happens when, say, a program writes over memory used by the allocator. Languages like Odin might try to pretend they don't have UB, but in my opinion it's impossible to get there just by disabling certain compiler optimizations (see https://news.ycombinator.com/item?id=32800814 for an argument about this).

I see UB as essentially a proof obligation, to be discharged in some other way. A really good way is to have UB in the intermediate representation, and compile a safe language into it (with unsafe escape hatches when needed). But there are other ways, including formal methods, rigorous testing, or just being a really smart solo programmer who's learned how to avoid UB and doesn't have to work in a team.

Feel free to send me your draft.

pjmlp

Using protective gear, or making cars safer for crashes, also slows down physics versus not using them at all, yet lifes are saved every year where people would otherwise die or be crippled.

As someone that rather prefers Wirth culture on programming languages, UB at the expense of safety isn't a clear win, that is why we end up with security exploits or hardware mitigations for UB based optmizations gone too far.

vlovich123

Raph, I think you may be using a different definition of UB than what compiler authors are using? As I understand it in the language sense of the word, UB technically allows the compiler to interpret the code however it wants. To me utility in UB are relying on some kind of well-defined behavior to result which would imply that you are either just relying on today’s behavior OR you are doing something that’s non-deterministic but not violating language rules? Or some intermediate definition where it’s both violating language rules but no future version of the compiler is likely to be able to detect the UB and change behavior?

UB is very useful for compiler authors because they can apply very useful optimizations with “illegal” code and then emit illegal code constructs when they want those optimizations to apply. I have a hard time understanding how that’s useful to language users though.

staunton

12h

Being a very experienced programmer, I'm sure you know many such sources. Can you share any?

camel-cdr

The C standard Annex J has a list of undefined behavior: https://port70.net/~nsz/c/c99/n1256.pre.html#J

rberg

10h

Agreed. Running with a basketball is very much possible, I'm unsure as to why John thinks otherwise.

ultrarunner

11h

Perhaps you could draw on your wealth of experience to write one. I’d love to read it!

pjmlp

If only folks would write code in a way that infantilizing article from 15 years ago aren't as actual as ever.

imtringued

In my opinion it's not infantilizing enough. If you are a C developer and have never heard of model checking, then you are grossly incompetent and should never be allowed near a computer.

Crafted by Rajat

Source Code

hckrnws

A Guide to Undefined Behavior in C and C++ (2010)