tsunami

C# 4.0 proposal: non-nullable reference types

Luke Breuer
2008-01-06 11:10 UTC
tags: c#-4.0

Intro
Non-nullable reference types make code cleaner, clearer, and less error-prone. They would be an extremely valuable addition to the C# language. Lack of their existence has effected much code that could be elegantly elided with crystal-clear syntax. Finding nulls has long been a feature of static analysis bug checkers. (sample Google search) Implementation directly in the C# language is a logical next step in its evolution as a language that encourages a pit of success.
Links
Example 1: method arguments
Originally created to demonstrate a point, ZeroWidthSplit grew to be more than that.
static IEnumerable<string> ZeroWidthSplit(string splitThis, Regex atStartOfMatch)
{
    if (splitThis == null)
        throw new ArgumentNullException("splitThis");
    if (atStartOfMatch == null)
        throw new ArgumentNullException("atStartOfMatch");

    return new[] { 0 }
        .Concat(atStartOfMatch
            .Matches(splitThis)
            .Cast<Match>()
            .Select(m => m.Index)
        )
        .Concat(new[] { splitThis.Length })
        .SelfJoinByOffset(1)
        .Where(t => t.Second > t.First)
        .Select(t => splitThis.Substring(t.First, t.Second - t.First));
}
With non-nullable reference types, the highlit code goes away:
IEnumerable<string> ZeroWidthSplit(string! splitThis, Regex! atStartOfMatch)
{
    return new[] { 0 }
        .Concat(atStartOfMatch
            .Matches(splitThis)
            .Cast<Match>()
            .Select(m => m.Index)
        )
        .Concat(new[] { splitThis.Length })
        .SelfJoinByOffset(1)
        .Where(t => t.Second > t.First)
        .Select(t => splitThis.Substring(t.First, t.Second - t.First));
}
Example 2: IEnumerable
IEnumerable<string> EvilMethod()
{
    return null;
}

void NaiveMethod()
{
    foreach (string s in EvilMethod())
    { }
} 

void DefensivelyCodedMethod()
{
    // YUCK: ?? Enumerable.Empty<string>()
    foreach (string s in EvilMethod() ?? Enumerable.Empty<string>())
    { }
}
Some prose
This proposal does not advocate completely banning nulls from C# or any such nonsense. (The language Nice does ban nulls.) The premise is that in many circumstances, null is semantically equivalent to Count == 0 for sequences and "" for strings. When I say semantically equivalent, I refer to a method which returns the same value/has the same effect for both values (e.g. "" and null).

In other words, methods often don't care whether they get a zero-length string or a null string — the meaning to the method is the same. This is why string.IsNullOrEmpty was introduced. I advocate making null illegal when, and only when, a single other value is semantically equivalent to null. I claim the null-check code above is unnecessary cruft, that the following version is better both because it requires fewer characters and better communicates intent.
Design by Contract
Design by contract (DbC) involves proving and/or ensuring properties of code. Spec# is a Microsoft research project which provides DbC for C# code, at least 2.0 via Visual Studio 2005. Non-nullable ref types are one example of a contract. DbC goes beyond the simple code generation exemplified above: the compiler would actually verify that the parameters passed to ZeroWidthSplit cannot be null.

Verification can get tricky: depending on how powerful one's static analysis is, significant cruft (in the form of explicit null-checks) might be required to actually realize the desired compiler errors. I do not know how good a job the Microsoft C# team could do in this respect. There are certainly problems with inter-language operation and the desirability to avoid any sort of null-checks from a performance perspective. I would like to see some articulation on this issue.
My proposal
I propose that ! result in the sort of highlit code above. In addition, the compiler should perform static analysis of code involving types (at least arguments and local variables) decorated with !: unless significant cruft results, it should verify that decorated variables/arguments never be null by the time they are first accessed, if the access does not assume non-nullness*. Definite assignment has been in the C# spec from the beginning: if a local variable is accessed before it is assigned to, an error results.

* highlit lines would generate error messages due to the possibility of nullness
void Examples()
{
    string! s;

    InMethod(s);        // (error due to definite assignment requirement)

    s = null;

    InMethod(s);
    InMethod(s.Length);

    int len1 = s != null ? s.Length : -1;
    int len2 = (s ?? "").Length
    int len3 = s.Length;

    if (s != null)
        InMethod(s);

    OutMethod(s);

    string! s1, s2;

    int len4 = (s1 ?? s2).Length;
    int len5 = (s1 ?? s2 ?? "").Length;
}

void InMethod(string! s)
{ }

void OutMethod(out string! s)
{
    s = "yay"; 
}