Thoughts on Information Classification

Luke Breuer
2009-01-15 19:10 UTC

Memories of library card catalogs are fading these days, but those who remember them will recall a strict hierarchy of topics and subtopics. The taxonomy used is tightly controlled and applied with care. Unfortunately, hierarchical classification breaks down in any significantly complex system. The desire arises to place a given node under multiple parent nodes; sometimes the decision made is quite arbitrary. My personal case-in-point is webmark, the bookmark manager I wrote. It is organized into strictly hierarchical categories. Let's say I have the categories "learning" and "programming". What happens when I find a website which discusses how programmers tend to learn? The system breaks down. Over time, I have stopped using explicit categories with webmark and have reverted to tags — now everything goes into the random category.

One of the contemporary solutions it to use keywords; the term du jour is tag. Google uses the term label. The idea is that instead of using categories nested in what might be a deep hierarchy, one simply applies one or more terms to each item, such as "programming" and "learning". Then one can easily look for all items on programming, all items on learning, or all items involving both. The system works fairly well.

One major problem with tagging is that it also breaks down: what exactly is the definition of each term used? It turns out that people use the same words for different meanings, making the sharing of tags a bit tricky. My solution is tsunami. It still allows for tags, as they are quite useful. However, it also allows one to create a more powerful version of a tag: a full-blown item. This allows one to define the item and then link it to other items. It is an attempt to organically grow a flexible taxonomy so that the benefits of having well defined terms can be combined with the ability to avoid hierarchical constraints.