It’s the probably the most public test of the advances in linked open structured data and semantic text analysis, I’m really following closely this tournament pitting IBM’s super-computer Watson against the two most successful Jeopardy champions. I suspect that they’re using the same publicly available data sets that we’re using for constructing Alive.cn.
I wonder, however, why they chose to rely only on electronically fed questions rather than going the final mile and adding a voice recognition interface on top of the system. Voice recognition accuracy has gotten so good these days, but I wonder if the final few percentage mistakes makes a critical difference against the best human players.
There have been some other truly AMAZING projects in this field. Two I’d like to highlight:
- Google Squared: This Google Labs experiment is an amazing mash-up of topic extraction and turning unstructured web data into structured data. Simply type in any category (example: “Chinese Emperors”) and it will bring you up a spreadsheet of items in that category and some properties. Next, you can add your own properties (“Inventions”) and it will automatically fill in the results using searched data from the web converted back into structured data. It’s truly one of the most remarkable things to come out of Google, but a bit more work on it (say, a voice recognition interface) and it could be a mainstream breakthrough.
- OpenCalais Topic Extraction: Another semantic analysis tool that will pull out “topics” automatically and link them against linked open data. Try out the free demo and copy-and-paste a news article. After submitting the article, you’ll see it has linked together topics on the side automatically.
Like I’ve mentioned before, I feel that we’re right on the tipping point in the next several years where there will be advances in knowledge extraction and interpolation that will have a revolutionary effect on everything including how we interact with computing and having exponential advances on data forecasting. Projects like Wikipedia (an unstructured data source) are just the beginning.
P.S. My favorite comment about the Man versus Machine Jeopardy contest: “Why couldn’t they have programmed Watson to use the voice of Sean Connery?”