Bad error messages

Luke Breuer
2009-02-10 22:28 UTC

Bad error messages cause countless hours of lost productivity on the part of software users; the time saved by software developers by not thoroughly dealing with failure states is absolutely minimal in comparison. I would consider this a form of market failure, or some other term unbeknownst to me.
Case study
There are many different reasons that computer A cannot communicate with computer B.
  • some link in the chain of computers between A and B is failing.
    • firewall issues (or any filtering)
    • a plug physically disconnected
  • DNS lookup fails (DNS server unaccessible, name not found, etc.)
  • data are cached somewhere between A and B (or at A and B), resulting in dated data
    • HTTP debugging proxies can be invaluable for developing websites

Software is typically atrocious at effectively communicating as much as it knows about failure conditions. Due in part to the limitations of thirty-two bits of information, error information is often lost when it propagates through the call stack.
Second case study: wireless networks
Determining the integrity of a wireless network connection with built-in Windows XP tools is an abomination. There is no signal strength v. time graph that can be observed while walking around a building, holding a laptop in one's hands. There is no built-in utility to determine that while the device has an IP address, it cannot actually communicate with the gateway, or cannot communicate with DNS servers, etc.
Software systems
I find that Microsoft's .NET framework, in general, has much better error messages than any other software system/framework/whatever that I have used. In addition to what the CIL metadata structure allows, I think that the patterns set forth in useful structured error handling are the most valuable part of .NET.

However, I'm confused why Microsoft hasn't fully described all the things to do with exceptions; Oren Eini documents that ex.ToString() should reveal all exception information; I don't recall seeing this in Microsoft's best practice documentation.

An example of how Microsoft still has a lot of progress to make: error handling, among other things in SSIS, is opaque and lossy in terms of providing the developer relevant debug information (source).

some hope?