tsunami

tsunami item names

Luke Breuer
2008-12-26 04:40 UTC

Introduction
Items in tsunami all have names. However, since names change, the URL will almost always have a number near/at the end. This is because it is completely unacceptable for URLs to break. Completely. Unacceptable.

If a better solution can be found, do note it. Otherwise, the current scheme will be adhered to.
Uglyness: .aspx
The .aspx will get lost when I move to ASP.NET MVC.
Technical

Item names in tsunami have a few restrictions. This is due to the nature of characters allowed in URLs, by ASP.NET. These restrictions can be enforced two ways: limit characters allowed in item names, or remove/replace the restricted characters so that they do not show up in item URLs.

(UrlScan: %systemroot%\system32\inetsrv\urlscan\urlscan.ini)

The below covers all character codes from 0 through 255.
Invalid characters
  • character codes < 32
  • character codes >= 127
  • &?|\
    • banned by UrlScan (periods are too by default, but that was changed for tsunami)
  • # (when escaped as %23)
  • %*:<>
    • ASP.NET responds with HTTP 400 Bad Request
  • "
    • ASP.NET responds with ArgumentException "Illegal characters in path."
Reserved characters
  • #/
    • both are used in paths
  • _
    • spaces are converted to underscores, so underscores shouldn't be used
Valid characters
  • [space]
    • converted to underscore for URLs
  • a-z
  • A-Z
  • 0-9
  • !$'()+,-.;=@[]^_`{}~
Wikipedia Percent-encoding valid characters
  • a-z
  • A-Z
  • 0-9
  • -_.~
  • !#$%&'()*+,/:;=?@[]
    • reserved, as in they have special meanings in certain contexts
Summary
wiki characters were categorized according to the Percent-encoding article. questionable characters were tested to be OK, but may not be good to use as they apparently don't follow the URI spec. ASP.NET rest characters are restricted due to ASP.NET/URLScan even though the spec seems to allow them.
             0000000000000000-000000000-00000000-111111
             3333333444444444-555666666-99999999-222222
             3456789012345678-789012345-01234567-234567
             !"#$%&'()*+,-./0-9:;<=>?@A-Z[\]^_`a-z{|}~¦
wiki legal               xx x-x       x-x    x x-x   x
wiki reserve x xxxxxxxxxx  x - xx x xx - x x    -
exp. legal   x  x  xxx xxxx x-x x x  xx-xx xxxxx-xx xx
exp. illegal  xx xx   x    x - x x xx  -  x     -  x
questionable                 -         -    x x - x x
ASP.NET rest   x xx   x    x - x    x  -        -
Created from URL Grammar code.