Monday, October 8, 2007

Semantic Technology / AI 3.0

Originally posted on oldestgeek May 27, 2007 and moved over

It seems to me I've heard that song before
It's from an old familiar score
Sammy Cahn 1942

The Semantic Technology Conference (May 18-22, 2007) at the Fairmont in San Jose had an all too familiar quality. Rather than misquote Yogi Berra, I use the Sammy Cahn lyric as my theme.

The exhibits all seemed to be a set of tools that could answer every question about your information as long as you had defined an “Ontology”(more below). I didn't see an example of any useful applications, a sure symptom of technology looking for a market

Semantic Technology is a term invented or popularized by Tim Berners-Lee, inventor of the worldwide web. I define the WWW as a level of abstraction over the Internet. Berners-Lee puts it another way, "take the hypertext idea and connect it to the TCP and DNS ideas".

Berners-Lee then envisioned the Semantic Web. Quoting Wikipedia, "The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a form that can be understood, interpreted and used by software agents". Can you spot the elephant in the room in this paragraph?

It's “natural language”, the Holy Grail of AI that is also classified as “AI-complete” (in the AI world that means, “We have no idea how to do that”) It's how Captain Kirk talked to the computer and the computer understood him.



Furthermore, Berners-Lee built a model as shown. It's wonderful and logical. There are several layers in the model and they all fit wonderfully and they all can be implemented in a direct manner saving one.

Can you see the elephant in that room?

It's “Ontology”.

Ontology


According to Tom Gruber at Stanford University, the meaning of ontology in the context of computer science is “a description of the concepts and relationships that can exist for an agent or a community of agents.” He goes on to specify that an ontology is generally written, “as a set of definitions of formal vocabulary.”

Think of ordinary people in a business defining an ontology by "creating a description of the concepts and relationships that" describe what they do using a complex layered system that resembles object-oriented programming to create a "set of definitions of formal vocabulary.” Think it will happen? I don't either.

Every time I asked how does one create an ontology, I was shown a tool of mind-numbing complexity, for doing it. Several people admitted that creating an ontology was the crux of the problem. and there was no clear answer to it Nobody has a tool for generating an ontology per se.


Why AI 3.0?


My definition of AI is “pattern matching with a Turing machine”. I studied it as an analyst at Dataquest. We concluded that AI was more like “artificial insanity” and that it's only useful products were the tools created by AI researcher e.g. LISP. I used to say that I wasn't skeptical about AI, I was cynical. Now I'm just somewhat dubious.

The characteristics of an AI technology area/ conference usually are:

* Great technology
* Easy to prototype but doesn't scale up
* VCs can be sold but there are few customers, all bleeding edge types.
* Market; shoots up and then disappears
* Very few reference applications (expert systems folks all pointed to the same 7 applications)
* Useful for super-geeks but impossible for mortals to understand or use
* Inability to take suggestions from non-AI types



So, does AI ever work?


AI works when a great amount of resources are devoted to a clearly defined and limited domain. Computer chess is the prime example. Before IBM devoted its biggest machine to it, an automata El Ajedrecista, was built in 1912 that played an end game. Claude Shannon wrote extensively on the subject in 1950. Since then, some of the best minds in computing, and a lot of PhD theses were devoted to computer chess. My guess is that the effort and research expended on computer chess would cost billions in todays currency. Interestingly, computer chess has been dismissed as not "real" AI by the AI community. I guess if it can't dialog with Jim Kirk, it isn't AI!!

Another example of resources devoted to voice response systems driven by a narrowly defined set of rules. Call the Sears help line for appliances and you will see what I mean. There are computer design tools and other examples of useful AI. All work in a clearly restricted domain.

A big problem is that many AI concepts can be quickly prototyped but never scale up. In the 80s, language translation was an early example. Any skilled and polyglot programmer can whip up an example in a short time. The idea that ate huge chunks of VC dollars was, “It's clearly possible, so if we just devote time and get a bigger and faster system, we'll have something.” What they ended up with was a bad investment. There is a kind of translation available on line that is useful enough to give an idea of what is being said. Try out the translate link next to Google hits.

Another problem for companies in this space is the “black-hole” market. Any interesting new technology will find about 1200-2000 customers amongst researchers, financial institutions, universities, and other bleeding edge types. There are usually two years of such sales, and then, nothing. The black hole has sucked everything up and no light is shining.


Where is Semantic Technology Headed?


So is Semantic Technology doomed to collapse in the face of the scalability problem? be a black-hole market? lose its way in the technology forest? never actually work?

Yes, if it has to sell the tools to business or really wants to do a (full) Tim Berners-Lee Semantic Web. Yes, if it can't scale up (not yet demonstrated) and never mind the rest of the article. Yes if it can't stop saying "ontology" and learn how to talk to mortals/customers.

I can't say if this technology actually works until I see more of it.

It can succeed if it finds domains where a large mass of data exists and for which there is both a high general interest and a good supply of (hopefully free) domain experts.

What are those domains?

Open Ontologies (Using open licenses such as Creative Commons). Pandora is a poster child for great use of open Semantic Technology. The ontology is public and there seems to be a never ending supply of volunteers to update it. There is also a public gene ontology.

Genealogy. There is a huge genealogy database online (Ancestry.com). You can do your own genealogy with some current PC tools and put it on line with many others. I've done so, but I can't figure out how to connect to other peoples work. We're talking really fuzzy data here.

Sports Stats. Also a lot online. How about using sports data and making predictions? Apply to fantasy sports, a narrow but very deep market.

Social Networking. Can it make for more interesting social networking? Link together sites in an interesting way e.g. FaceBook, MySpace et al.

Job hunting/hiring. A lot of factors are considered in looking for a job or for employees.

Public Data. The census bureau, the patent office, and other government agencies have large amounts of data that are well understood and many experts available. Same for property data.

It was obvious that some of the companies exhibiting were working with intelligence agencies. In these and other domains Semantic Technology may find its way.

If such applications appear, create a buzz, are of value, then Semantic Technology will be drawn into organizations that are then able to see the value of the effort. There is always the killer app that appears out of some strange quarter. If I knew what it was, I'd probably be doing that. If the technology (it's not an industry nor a market) progresses as it is now, it will fade from view in two to four years.

I can only wish them well and say,

Vaya con Dios!