log in

Data Playground

Luke Breuer
2008-12-25 21:18 UTC

Content below has been taken from tef's work.
specialized data stores
Consider two specialized data stores:
  • MediaWiki
    • contains articles, images, links, and wikilinks
    • every page has a unique title
  • del.icio.us
    • links only
    • has tags
storing atoms
Let us define an atom as the unit of data that is stored. In our specialized data stores, atoms were entire articles for MediaWiki and links for del.icio.us. What we really want is to store absolutely anything. We do not want to have to name things like with most wikis. We do not want to store only certain data types.

Atoms by themselves are of limited use, especially when their number gets large. Therefore, we need metadata. There are several types:
  1. intrinsic
    • dimensions of an image
    • length of text
    • domain of a URL
  2. labels
    • titles
    • tags
    • notes
  3. smart labels
    • labels that depend on others (calculated metadata)
    • inferred from existing metadata instead of applied directly

Note that metadata are usually associated with data; width is a property of images that contains a single value. Documents have a title with a string value.

We need to define a query language upon these labels that can be used to define categories. Here are two examples of categories (intentionally in Prolog-ish syntax):
  • europe:- scotland
  • large:- width(X), height(Y), X > 1000, Y > 1000

How to handle embedded links?