Semantic Web(last updated: 20/4/05 10:00 UTC)


Semantic web is the future of the Internet: a more usable web of information.  See bookmarks here relating to semantic web.

The Problem

Information is not as available as it should be.

Beyond Google

Luke Breuer wrote this article for the California Tech, a biweekly newspaper put out by Caltech undergrads.

Ten years ago, you were told to imagine a world where information was networked and available to everyone. Today, the Internet only partially fulfills that promise. Searching the Internet is like shopping in a flea market: it is often hard to find what you want, and even if you find it, its quality is unpredictable. One solution is to include data that describes the content of websites. This is called metadata; an example of its implementation follows.

It is the beginning of term and your desktop just croaked; now you want a laptop. Acting out one of the newer verbs in the English language, you google the phrase "buy laptop" and get approximately ten million results. Fortunately, the first few results seem promising: "Laptop buying guide," "Compare prices and ratings," and "Buy Laptop." Contrast this with a hypothetical search on the semantic web, getting results such as "Laptop reliability vs. price," "Battery life," and "Technical support ratings." While each of Google's top ten results are individual web pages, the top ten results of the semantic web search are compilations from thousands of websites, each of which specifies what the laptop name is, what the price is, and other relevant data, all with metadata. These metadata give semantics, or meaning, to the data.

Before Google introduced its pagerank technology, search engines typically only used the most basic data available: plain text. Google made a breakthrough in searching technology by processing hyperlinks as well: the more pages link to a site, the more likely you will want to look at it. Processing anything more than the text and hyperlinks in a typical website is a task only a sophisticated AI or person can do. Semantic web solves this problem by inserting "invisible" information into websites. For an online store, this invisible information, or metadata, may indicate where the store is located, what it sells, and how much items cost. If hundreds of stores were to use metadata, then a semantic search could return results from all the stores at once, organized based on item type, price, and availability.

One of the largest obstacles to the semantic web approach is volume. Without sufficient data described with metadata, semantic searches cannot return enough useful results. The first Google result contains data compiled from multiple sources; a semantic search would require hundreds if not thousands of sources to outperform Google. The net result is that very few website developers include metadata: it simply does not benefit them.

The solution is to inject the necessary activation energy by providing enough seed data and metadata to make semantic searches useful. Once the value of the semantic web approach has been demonstrated, participation in the revolution is the logical next step. Enter WebMark, a gateway to the semantic web. WebMark is an online collaborative effort to store and describe websites. Once you find a good website, you add a link to WebMark along with a short description and some keywords. Others looking for what you found can first search WebMark and make use of the research you have already done.

I and others have populated WebMark with links to other websites in order to demonstrate its usefulness. Search for almost any class you are currently taking: type the abbreviation with no spaces, (e.g. "ee52"), press enter, and you will find either the course website, or a list of course websites including the one you searched for. Use this small example to experience the value of semantic web. Then, register an account and start adding websites with the appropriate metadata. With your help, the value of semantic web will grow exponentially.