Semantic web

The Semantic Web

Not sure Topic Maps should be in this chapter.

Weaving the Web: RDF and OWL

Topic maps use "He" or "She". There is nothing else than quotations. RDF use "I". I say this and I put it here. When other people refer to it, they appropriate the thing that is there, not necessarily what was intended to be said.

The illusion of Topic maps is that there is only one ontology that can be used to represent any kind of knowledge. The illusion of RDF is that there is only one place to express things: the web. Both approaches are interesting, they are complementary but they are both insufficient. We need to do better than this.

RDF is based on the simplistic assertion that once a statement has been made, it is universally valid and exploitable. This hides the fact that the statement may only have useful meaning in the specific context in which it was created and can be meaningless elsewhere.

Semantic Interoperability. The Semantic Maze: A critical view of the Semantic Web

One of the important project currently being developed is known as the Semantic Web. Many research projects, new products, etc., are positioning themselves under this umbrella.

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."¹

The problem with that the second quotation is that it is self-contradictor y. "Well-defined meaning" is an ideal that may not be reachable, or at least there is nothing that provides any guarantee that meaning be well-defined, and there is no possible agreement on what well-defined means. Even a piece of information that may be considered well-defined by some may be considered poorly defined by others. Another contradiction is what it means for computers and people to work in cooperation.

Computers can work in cooperation, but they don't need to rely on well-defined meanings. Computers simply need common connection hubs to link information pieces together, regardless on whether the meaning is well-defined or not. For example, when computers rely on strings of characters to relate terms, they will not only relate subjects that have nothing in common except their names, and they will miss relations between subjects which have much in common but are not designated under the same name.

These two situations are quite common.

Another quotation appearing on the same page of the W3C presentation on the Semantic Web reads as follows:

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.²

Here the word "data" is used instead of "information" with "well-defined meaning". Data is very far from meaning. Promoting data interoperability is a noble goal, but somewhat different from promoting semantic interoperability. It is a much modest objective, which can push further a better automation of computer-driven process, but this is very different from exchanging meaning.

Kim H. Veltman gives this definition of semantic interoperability:

Semantic interoperability is the ability of information systems to exchange information on the basis of shared, pre-established and negotiated meanings of terms and expressions.

According to this definition, semantic interoperability works best in an environment where systems are able to communicate together and if there is a one to one correspondence between statements– essentially, where systems conform to a single ontology. This can be achieved in a variety of cases, as is illustrated by various applications which are done under the Semantic Web umbrella, but it can't generally work for information systems that have not been prepared to interoperate.

The difference between RDF and the Data Projection Model is that RDF uses URLs only as information items in the predicates, whereas the Data Projection Model can use any string of characters. The Data Projection Model enables distinguishing between the signifier (string represented by the addresses) and the signified (the content of an information item). RDF doesn't allow to do that, because RDF is limited to express relations with triples containing just URLs.

RDF is both too simple and too complex. A programmer's paradise, a user nightmare. Drift into mechanism for finding research. Not necessarily useful applications for users. The Web is all there is.

Topic Mapping

The story of Topic Maps, or what gets standardized.

Mapping as an outside overlay over an existing bunch of information items.

Standards as a way to make information networks independent from any proprietary software/format claim.

Insert Slide with 20 possible scenarios for topic maps. Insert part of the Topic Map Book which is reusable. Insert the introduction to XTM in associatedtopics.org. ]
Many things missing

Insert STM (Simple Topic Maps), Insert SWIM.

The RM defines what an association is by expliciting all the components of which it's made of. It explains what a topic is by putting the notion of subject as the core of the model (subject location uniqueness objective, which is central to the whole topic maps paradigm). It describes the names and occurrences as specialized pre-defined assertion types. These assertion types comprise a conceptual model (not a data model) which is what makes Topic maps directly usable. But they also allow to regard other kinds of applications as topic maps.

The main advantage of topic maps is that they provide for merging of diverse information objects. Limiting the merging capability of topic maps is like cutting wings on a bird and then be surprised to observe that the damned thing is not flying any more.

There are other possible models which may be more appropriate than the SAM in various circumstances. I am not sure I won't ever need something else than the things which are currently defined as "occurrences", to take an example.

It's important to enable full interoperability at the software level. All software need to produce interchangeable topic maps, and the RM is a way to make it clear what the application sees after a topic map is processed. This is something which is needed. We need to assess if the way the available software understand topic map information is consistent. What is currently called SAM does that at a shallow level. What is currently called RM does that at a deeper level.

A standard is not a standard if it only reflects the internals of products of 1 or 2 vendors. Nobody is going to take such a standard seriously. It amounts to put clothes around a proprietary solution, and call it non-proprietary, even if 2 or 3 vendors have made some kind of agreement together. If potential users discover that fact, they won't adopt it. Instead, the standard should rely on a powerful, high-level, "neutral" design that accommodates what's there and has the potential to expand it when needed. That's exactly what the RM is providing.

The RM expresses the foundation on which topic maps are built. Without it, TM become a limited perspective imprisoning its users in a narrow, short-term, limited point of view that will become obsolete as soon as something else will show up that will actually perform the job that TMs are supposed to do, only cheaper, better, and more efficiently. Let's take an example: Somebody invents a new standard, renaming the topic map concepts: what we call association becomes "relation", what is called a topic becomes a "knowledge unit", and what is called a scope becomes a "domain". Neat, no? Suppose now that this "new" standard receives endorsement of big players, both institutional and big software corporations, there you go. The market gets what they need, the topic map standard crippled with its own limited-scope vocabulary will disappear from sight. The world will still continue to go on. And the principles on which topic maps were built are still there. And topic map vendors get out of business.

The real world is looking for being able to invest in projects that are intended to be around for several years or decades, not just adopt the last fashionable technical fashion because there are tools available to play with. In other words, it's good to have tools, it's not enough to make a standard work. A standard like this one should not be centered around how tools are exploited. It should impose to the tool vendors a pattern of behavior enabling them to play their part of the game, not necessary all of it.

Even if the standard is done in such a way that there can not be a tool that will cover all of its aspects, the standard will still be valuable. It might even be better to do so.

Tim Berners-Lee, James Hendler, Ora Lassila, The "Semantic Web", Scientific American, May 2001.
] ↩
Semantic Web, World Wide Web Consortium, http://www.w3.org/2001/sw/ ↩