Computers are dumb, and cannot understand the words on this page. A human could figure out that my blog is mostly about economics, but a computer would have a hard time doing this. The semantic web idea is trying to change this, by tagging or marking up content in a standardised way that makes it understandable in some sense to a computer.

One of the problems with implementing this idea is all the work that’s required to tag existing content. Rather than doing this manually, it would be more attractive if there is some automatic tag-generating system. However, there’s a chicken and egg problem, if the computer can’t understand the words in the first place, how can it understand them to tag them?

This is where something like Wikipedia comes in. Wikipedia is a collection of documents that have already been classified by humans into appropriate categories. By parsing the economics pages on Wikipedia, a computer can get some sense of what words are typically associated with economics content. With this “knowledge”, the computer could then scan other websites and determine which are about economics and apply appropriate keywords.

This seems to be a somewhat unexpected potential source of value for a site like Wikipedia. I wonder if it will spur further development of more traditional encyclopedias. Publishers of textbooks and the like may also have some valuable fodder for semantic algorithms.

SemanticHacker has a nice demo on their site of what can be done. Enter some text and it will try to identify what the text is about, and fetch related articles on Wikipedia.

by aaron. Permalink. Comments (0). Comments RSS.