tsunami

Thoughts on Assumptions and Programming

Luke Breuer
2008-12-26 05:24 UTC

Essay
A few years ago, I was thinking about how to explain debugging to those just starting to program. It's very interesting to observe people trying to debug with no prior experience; many will step through code until sunrise instead of setting breakpoints, few will examine the call stack, etc. Debugging was so second-nature to me that explaining it off the cuff was a bit difficult. Then I [think I] figured it out: debugging is figuring out how you screwed up and in particular, what assumption(s) you made which are invalid.

Code is littered with assumptions, ranging from assuming malloc returns a valid pointer to assuming a parameter is non-null to assuming that two nodes in a browser DOM are connected in a particular way. We assume the network is up and with a latency below a certain threshold, we assume that files exist and we have permission to write to them, and so forth. We assume a ton, and many assumptions are implicit. I postulate that these implicit assumptions make debugging hard and make understanding code hard.

What I'm talking about isn't entirely new: we have stuff like auto_ptr, array bounds checking, and design by contract (more) programming. The Ada language was developed in part to verify quite a few assumptions. Unit testing exists to verify assumptions.

I like C#. One of the things I really like are structured error handling and the ease of which I can
throw new InvalidOperationException("You can't do that because of this, you buffoon!");
Yes, I know other languages can do this; the thing I like about C# (actually, .NET in general) is that it's used religiously. Unlike C-style return codes, one has to explicitly swallow exceptions to avoid having them abort program execution. No more checking the return value of malloc. Maybe the ASP.NET exception page isn't the prettiest, but it's really, really valuable if you're trying to figure out why the page errored out. Now, things can still be confusing if a null value was able to propagate through objects undetected or a case-sensitive comparison failed and caused a [seemingly] unrelated problem, but that's not a problem of unstructured error handling, it's a problem of implicit assumptions.

I'm curious to know what your practices are for making assumptions explicit. Some methods I can think of off the top of my head are throwing exceptions, assertions, unit tests, and comments. I'm not too big on comments, since they can go out of date and they aren't logically binding (the compiler doesn't know anything about them). I do like the idea of contract-based programming, but I haven't checked it out that much (see Spec#). I do need to get into unit testing, although I'm not completely sure how to do it with javascript on one end and a database on the other; I've read about mock objects and whatnot, I just haven't done anything solid with them. What would a language would like like if one were to elegantly and explicitly state many assumptions?

Assumptions can't always be expressed as compilable code. A good example would be the assumption that a certain program depends on a database's data structure to be a certain way. Actually, that could be compiled into the program or put in a configuration file, but one also has assumptions on valid value ranges, assumptions that only exist under certain circumstances, and whatnot. Sometimes this information can be thought of as going in a data dictionary, but it seems that what I'm describing might be a superset. I think it would be really, really cool to be able to ask, "what objects depend on this object?" This can be done to some extent (foreign key constraints, searching through trigger/stored procedure code), but that's limited.

I've made a few points but mostly the above have been musings; I hope they'll stir some invigorating discussion.
Links
what can go wrong and how to prevent it.
Criticism of the essay
  • too many programming languages mentioned
  • no audience
  • sounds like an appeal to lack of experience
Notes
  • know how software can fail ("bug/error classes")