Companies large and small are betting that the semantic web is ready to party parse. Now that the building blocks and standards are falling into in place, it's time to develop business plans and use cases for a web of deeply linked data. That's the focus of my most recent piece in The Economist.
A key player in this promises to be Reuters, the traditional news service, which opened up its recently acquired text mining service Calais via an open API. Any website can send its raw text data through their natural language engine -- very soon even a lonely blogger. It's a win-win situation, as Calais evangelist Tom Tague told me. Reuters gets to see a lot of unstructured data and can train its algorithms via crowdsourcing. That allows it to build understanding and context -- and over time develop a clearinghouse of meaning. The users meanwhile can piggyback on a scalable technology that returns context in under a second when it's working properly.
That's the plan, but until we get to the point where a service like Calais can understand and make sense of various domains beyond business and finance, we got some ways to go. If it works, though, another highly complex task will have been commoditized. It begs the question what the value proposition of startups such as Radar Networks will be, which have spent years building a homegrown platform to perform similar feats of sorting, connecting and interpreting.
Right now, the whole semantic space is a bustling construction site, according to AdaptiveBlue founder Alex Iskold. He did a splendid job summarizing the key components in a recent post on Read/WriteWeb. Semantic technology can do search one better -- but it's not the killer app by default. Simply hunting for nouns in a standard search engine like Google works well, but it does not yield satisfying results for more complicated queries a human would ask another human, according to Alex. He thinks enterprises will board the semantic train before consumers do, for two reasons. They sit on silos of data that are waiting to be organized and monetized, and companies want to brag about the latest technology to look smarter than the competition. “Solid use cases are still rare, it is more about experimentation.”
Among the newer entrants is Qitera, a semantic engine (in beta) based in San Francisco, founded by three Germans. Joerg and Carlo walked me through their vision of how a web of meaning can be woven by tapping into services such as Freebase and Calais, and I've been testing it for about a month now alongside Twine. Instead of a pure top-down approach, says Qitera's cofounder Joerg Lamprecht, they want to leverage algorithms and human curation. Since it's a closed service right now, it has technophiles on RWW guessing whether it competes head-on with Radar's Twine.
Yes it does. In Qitera, users can build their own syntactic triples (A knows B, B lives in C, A has been on vacation in C and read book D etc.) -- basically drawing almost free-form connections between entities such as people, places, events and so forth, plus use a Firefox plug-in as they surf along. The human touch augments the entities and relationships the service can suck in from other sources -- like a mindmap on steroids. For practical consumer applications, says Carlo, semantic parsing and mark-up needs to go further and include concepts from popular culture: showbiz, sports, shopping. Users, after all, don't care how something works online. They will think "meaning ex machina" is cool when they see a service that understands what an album is and how it relates to a particular label, a promoter or related bands. If users slap those things on, so be it.
Ironically, those questions would all be no-brainers for a record store employee, but those guys have been killed off by the same technology trends that now claim to bring knowledge back in automated form. So let's start making sense!