Miscellaneous

Historical notes

Historical threads

The historical thread:

• Luca Pacioli and double entry bookkeeping

• Filippo Brunelleschi and perspective

• Galileo and Classical Relativity

• Gottfried Leibniz and Monadology

• Lavoisier and Indestructibility of Mass

• Paul Adrien Maurice Dirac and quantum states

• Doug Engelbart and the Hypertext Editing System

What does this have to do with us? Perspective results from having projected a 3 dimensional reality into a 2D plane. The image (painting, drawing or photograph) is on a flat surface.
The printing industry has used us to use flat representations as the universal way to transmit information. From that respect, computers have (yet) not changed. We still are viewing information on a flat surface.

[http://en.wikipedia.org/wiki/Leonardo_da_Vinci||http://en.wikipedia.org/wiki/Leonardo_da_Vinci]

The Data Projection Model states that information items only appears within connections with other information items. No information item is ever isolated, and every item is related to other items via binary relations.

Since binary relations are not usually optimal for presenting information in a useful way, it is possible to recompose information models using models that make them appear meaningful. Several different models can be built upon the same information items, enabling to see the same information in multiple perspectives.

The Data Projection Model enables information to be accountable, because any information item is now a hub interconnected with other information items of different nature. By following a thread, it is possible to trace back the origin or the raison d'être of a particular piece of information.

Information may not be interpreted the way it was intended to be at first place. The real question is whether there is a way to avoid it. This question is of the same kind as an advocacy for a unique language in the whole world, or a unique model within an organization, even a major one. The dimension of interpretation is key here and goes together with the fact that multiple
perspectives can co-exist on a given set of information units, on knowledge, on lore, etc. Exactly like in a democracy it is not only accepted, but considered healthy, that different points
of views, even contradictory, co-exist.

Transmitting meaning is one of the most challenging things that exist. It looks simple: let's agree about what we mean, and disclose it to others. This is the basic principle at the core of education, business, and human interaction in general.

Accountability of information can be paralleled with accountability of financial transactions. Double entry bookkeeping, which stands for recording transactions as a double operation of crediting an account and debiting another account,
is considered the foundation of accounting. It enables tracing back the origin of the money which is involved in all
transactions. In order to make information accountable, we can decide, ???

Computerized data being what they are, we better express it in a way that computers can understand. Skipping the level of assembly language, that means, that unless some natural language guessing is built-in, we are pretty much limited to strings of characters. This is the major principle that makes search engines work. You type a word, and the engine crawls the web until the string of characters that constitutes the word is recognized. In reality, processes are more complex than that?.
The principle that is behind any computer representation of data is that there is a common schema that is applied to a number of instances, and of course the bulk of the work is to make sure that a chunk of data belongs to a certain, pre-defined, type.
There is a tendency to create comprehensive, unified, models that span on a multiplicity of different cases. But doing that, we underestimate nuances and the complexity of reality.

This book addresses another problem that companies have often to deal with. This is about the management of the content of information. Information is often organized the classical way, using folders and subfolders, following the paper model. This process is made more formal by using taxonomies, and sometimes ontologies. Such organizational schemes imply that information has to be sorted out following a schema that people should be able to comply with. It usually results after an agreement has been reached and information was modeled in such a way to
maximize usefulness and usability. The ontology layer adds sometimes automatic processing, which uses formal algorithms to handle patterns that can further enhance the usability of such an information architecture. This all seems as a good idea, but it rarely works. Or if it works, it is on the short term. Some day, somebody would introduce a new information item that can't be classified within one of the existing categories. One would think the category classification system would need to be changed, but that often is a very costly operation, not only because the consensus agreed upon has to be changed, but also because several technologies, sometimes quite expensive, which have been built on top of the existing classification system, have to be rethought of and adjusted. This is not always possible. The result of such a situation is for those who lack a proper category within the existing ones, to do some kind of a compromise and to use a category which is not exactly the one they would have wanted, but is not close enough. The problem with that approach, which seems to make sense on the short term, is that it is the beginning of a drift which will progressively make the classification schema used further from reality. At the end, the whole system may crash, because it is too far from what is really wanted.

Information can be altered along the way, by being encoded to fit a particular kind of transmission protocol necessitated by one of the servers. Or it can be filtered, or customized according to specific user profiles, such as the way search engine results page are organized, with "sponsored links" depending on our queries. Information can also be recorded by governments for security-related purposes, by credit bureaus to establish the liability factor for individuals or businesses, but also by identity thieves, or other criminals, who will exploit the data they gather to steal money out of bank accounts, among other nasty things. There is no way for an individual who creates information and transmits it electronically to know what happens to it. This holds for personal data as well as for information which is published on a web site, or sent through email. In other words, a lot of different things happen to information we send, or information we receive. It is not usually not under our control, as individuals, and we usually don't know the details of what is going on. But since all the events happening on the way are caused by humans, it is possible to understand what they are, and some well-equipped institutions are able to trace what is going on in a much more precise way. For example, it is possible, although frequently not easy, to recover from
identity theft because the various events that triggered it can be traced back, in a way similar that bugs can be traced by programmers by following step by step what is going on under the hood.

The multiplicity of information sources is a fact of life; so is the diversity of interpretations of the above. Between the moment of creation of a unit of information is created and the moment where it is used, a diversity of processes occurred. The traditional publishing process by which an intellectual creation by an author gets to a reader still exists, but it only accounts for a part of the information that circulates. Online information may not be easily traceable, but it can be transmitted fast, and therefore is becoming the privileged way of exchanging information.

This book aims at providing solutions for auditability, accountability, and transparency of information-related processes. The solutions are based on the premise that the information world should be flattened first. The book will shows how things can be disclosed in ways that can be processed
and understood. The world view that it contains enables the possibility to have concurrent perspectives applied on the same information without requesting to reach an agreement on meaning prior to do anything with it.

First cast on stone, writing was done on parchment, paper, and now on line. Printing, i.e. an industrial process to duplicate information so that it can be propagated everywhere where it needs to, is now being supplementing by information made available from networks that make it accessible from virtually anywhere.

DPM and Double Entry Bookkeeping

The DPM is merely a notation for binary relations. In that sense, it's not different from RDF. So, what does it bring to the table? Two things:

It simplifies use of RDF, and for that reason it can play the role of a booster for the semantic technologies.
There is something conceptual hidden in the background, and this is what needs to be expressed. The same way that it's possible to say that double entry bookkeeping is the condition for accounting. What is it? It has to do with the concept of accountability itself. Accountability has two meanings: one is "count", the other is "to be responsibl e for". The first meaning is where the balance between credit and debit accounts
must be zeroed. This is where the laws of accounting come into play. The second meaning is where the ability to trace back the trajectory, origins, and interactions between transactions comes into play.

DPM and Topic Maps

Here I need to express conceptually why DPM is the continuation of what topic maps were supposed to be. Topic Maps were originally designed as an application of an abstract hyperlinki ng model, the HyTime hyperlink module. There was nothing in the HyTime module which was specific on topic navigation. Topic Mapping came as a generic way to design electronic indexes, glossaries, tables of contents, in a publishing/library environment. But the big step forward with topic maps came with the realization that information could be managed from outside, (as an aerial view, so to speak), as an overlay. TM opens the manageability of information assets.

DPM can represent the linking part of topic maps, i.e. the semantic space, i.e. nodes in a graph, where nodes are anything that can become a subject, a node is "subjectable". It provides accountability in addition to semantic browsing, as well as a common platform between topic maps, RDF, and other conceptual frameworks to encode semantics.

Miscellaneous notes (not for print)

(old) What is this book for?

Publishing is a technical task but information representation is key. The issue is how can we improve the degree of control of people over the information they own? Sharing information cannot be done without some form of networking: not only in the sense of the pipes (the Internet) but also in the sense of people.

This book will promote the following ideas:

• Bottom-up approach for making sense of information, whatever this information is.

• Multiple perspectives to view the same information.

• And also be an overview of the state in the art in information semantic technologies.

The Web is using us (YouTube: http://www.youtube.com/watch?v=6gmP4nk0EOE)

We need to rethink:

copyright, authorship, identity, rhetorics, ethics, aesthetics, governance, privacy, commerce, love, family, ourselves. [Michael Welsch]

Web 2.0 Buzz words:

Blog

Mash-up

Ajax

Youtube

Flicker

Yahoo

R/A [Richer ... Applications]

Flash

Drag and Drop

Service Oriented Architecture

Feeds/RSS/Web Services
Mash-up

Social Web:

Tagging
Wiki
Podcast
Blogging

(old) Introduction

We are now able to use the various alphabets or ideographic systems existing in the world to represent information on computers. This was not the case until a recent period, and there are still issues of compatibility between systems using different encoding schemes for character sets. In addition, we use different formats. Even when using a universal notation serving as a lingua franca, such as the Extensible Markup Language (XML), we are still using various schemas and diversity comes back even when standard notations and encodings are used.

Lost in Technology

Suddenly I realized how dependent I have become of the information technologies. But I am not alone. If the Internet would be out of order for a while, the amount of disruption throughout the world economy would be gigantic.

The dependency towards information technologies often amounts to a dependency towards information technologists. The complexity of the technicalities has reached such a level that understanding what is going on under the hood is out of reach not only for individual users but also for many of the people in charge of making the decisions about where to go with the information system. But not grasping all details doesn't mean that anyone should lose all hope to have some grasp over the course of events. Information is too important to be left to the technicians. This book aims at providing the intellectual tools needed to reverse the relationship with technology. It is technology that should be at the service of the decision makers, not the contrary. This book is about restituting the ability to think in the domain of information technologies, i.e. to comprehend intellectually what is at stake by means of concepts that can be worked upon, regardless of the actual technology used, and focus on the problems to be solved and the various methods to solve them. Paradoxically, going deeper into the technical details is what allows one to build a higher level of understanding. Chemistry is about combining the constituents of matter, the atoms, into higher level constructs, the molecules, that have properties in their own, that are used to describe perceived behavior of matter. This behavior, without the knowledge of the atomic structure, would remain somewhat arcane. Understanding what the atoms are in the world of information results in enlightening the understanding of how information behaves. It may place us in a better position to resist being distracted by the abundance of ever-changing technological acronyms and the apparent complexity of the technology that is involved and that only a handful of technical people can understand. This book is about giving to those of us who want the necessary baggage to make up our own minds about what matters in terms of what information is and how it is handled.

This book represents a turning point in my career. I have been deeply involved in technology, and somewhat like the infamous Dr. Frankenstein, have contributed in creating a monster which has been "stolen" by technicians whereas it was initially aimed at information owners. The book is a breaking point because it is a way for me to split, sometimes quite painfully, from my former colleagues with whom I have shared a lot and whom I respect, but I lean towards serving the interests of those who own and manage information, rather than those who build products to manage information. But this proposition itself is somewhat counterintuitive. In the universe of information technology, there are product makers and users. A computer user eventually becomes again also a writer, or an accountant, or an engineer, or an architect, or a doctor, or a lawyer, or a government employee, a researcher, etc. When products become really part of our daily life, their "product-ness" disappears and the activity reappears. There was a time when being able to use a telephone was considered a special a skill.

Most users are shopping for products, that would help them solve problems that they are facing. They are ready to pay significant amounts of money provided they see measurable benefits.
Apparently, this makes perfect sense. At the beginning of the personal computer era, some thirty years ago, there was this optimistic feeling that the new world open by computer technology was virtually limitless. Since then, the World Wide Web emerged and is now ubiquitous, but also now that computers have become commodities as widespread as telephones, we have also experienced the limits of concepts such as artificial intelligence (more artificial than intelligent), computer-based learning (which does not replace the human relation between a professor and a student), automatic translation (which rarely produces text readable in the target language), electronic handling of medical records (which may be imprecise when a specific case can not be described using predefined categories).

We are like mice, prisoners in a cage, but our cage is virtual.
The cage resembles the digital wall being erected by the United States at several chosen locations at the Mexican border: cameras, sensors, radars, GPS systems, every piece of technology imaginable that can help seeing where people are, be it during the day or during the night, and create record, identify them, and triggers alerts that would help track them, if they trespass the border. This technique may prevent a handful of individuals to enter into the United States, but it resembles erecting medieval fortresses where boiling oil has been replaced by digital cameras.

What exactly are we leaving aside when we are relying on computers to fulfill our daily tasks? How confident should we be that the problems we are solving are the same as the ones we are uttering? The purpose is to free ourselves, as human beings, from a self-imposed new slavery that limits our potentials to what machines are able to provide us. Since half a century, computers are used by corporations, and since thirty years by the general public, to store, sort, and retrieve information. In order to allow computers to be able to do that, information has to follow certain schemas. These schemas are designed to manipulate predefined forms, also known as databases. Computerized forms look "smarter" than paper forms because they are often programmed as to be able to detect, at creation time, if they are correctly filled or not. For example, if I put my telephone number where the computer expects me to fill my date of birth, the error can be immediately detected. The principle of a computer-based user interface is that the interface has built-in options, usually presented in a menu, and users have to select the options they are interested in. On computer screens, options are presented visually. On computer-driven telephone systems, messages containing options are presented once at a time, and users can interrupt by pressing their selection or uttering it.

What all of these applications have in common is their rigidity.
In order to be formal, information has to be structured a certain way. Even search engines use algorithms that are looking for certain patterns.

How do we know what computers really do? How have they been programmed? We don't. Do we care? Usually, not. The results yielded by computers seem to make sense, most of the times.
Except for some glitches. For example, once I wanted to book a plane ticket on an American airline company web site from Brussels, Belgium to Stockholm, Sweden. All the flights I was finding went either through New York or Chicago. I used another method to book my flight – and switched to another airline company. And there are times when computers provide answers that look right, and are not. And we don't always know it.
Furthermore, we obviously don't get what we don't get. In other words, it is possible that there is a much better answer to our question than the one we found, but we will never know it. We don't even know whether that is the case or not. The reason why we still need education, training, and expertise, is because it is our prior knowledge that will guide us deciding where to find the information we need, and help us determine whether the information we have found is reliable. The specialists - such as doctors, lawyers, engineers, etc. - must be able to look for the kind of information they need in a given context, even if ordinary, and they may want to know what are the features, and the limitations, of the tools that are contributing to help them find a solution.

Note that I am talking here about content experts, not computer science experts. There is no reason why a computer scientist should know better. Printing press operators don't need to understand the content of the books they are producing. Computer scientists are the printers of modern times. They do not own all answers. If it seems they do, we need to change this perception, and put them where they belong: tools, and not masters. What applies to machines also applies to the personnel in charge of operating them. Computer scientists may be in charge of managing the various ways in which information flows, they should not be put in charge of mastering its content.

We have now entered the Information Age, and this move is irreversible. But the changes we have witnessed during those past years are so ubiquitous, so radical and so far-reaching, that we are only starting to measure their consequences, even if many of us have already seen changes in the way we live. Several industries have reorganized their activity: the media industry, the communication industry, the music industry, the banking and financial industry, the healthcare industry, the travel booking industry, among others. The world has become a smaller planet.
It has been flattened. [footnote: See Thomas Friedman, The World is Flat, A Brief History of the Twenty-First Century, Farrar, Straus and Giroux, New York, 2005, and Alex Wright, Glut, Joseph Henry Press, Washington DC, 2007. ]

Horror Stories

Big Brother. No more individual freedom. Privacy is history.
Computers take over. Computer people take over.
Intelligence becomes purely artificial
Government is taken over by computers.
Elections are driven by software. Democracy becomes a joke.
Kids are addicted.

More nuanced reality

Computers have lost their appeal as they have become commodities.
Computers don't even work right.
Schemas don't apply universally. One taxonomy for everyone.
An expensive joke

Room for hope

Much information is available. Some of it is useful.
Social networks help people connect together.
Borders are
Administrative ordeals have been eased.

The Times That Are A'Changin...

We are now overwhelmed with information. There is so much out there, and so much is produced all the time, that the traditional ways civilization was used to cope with it are reaching their limits. Creating, publishing, distributing, searching, finding, storing, recycling, transmitting, regulating, protecting, information everything has changed, or is about to change. Information, in its digital form, is not limited to text, it includes sound, video, images. The hardware devices used to access information are fairly recent, at least as universal means. Telephones, typewriters, books, compact disk players, tape recorders, television sets, cameras, radios, are becoming collectors' items, or may some day only be found in antiques stores. The pace at which this revolution takes place is astounding. The first personal computers only appear on the market in the 1980s, the World Wide Web was developed since 1993, email started to get adopted during the 1990s, and the social networking tools appeared in the 2000 decade. In the 2010 decade, we witness the emergence and generalization of media consumption tools, such as tablets and e-readers that will continue to transform the ways information is accessed. In brief, the situation is not stabilized yet, and we don't even have in place the legal apparatus that enables intellectual property and creative rights to be granted as they were in the past. In a sense, it's still too early to arrive to a comprehensive understanding of what is happening, because history is still being written before our eyes. But it doesn't mean that we should not try to get an understanding of what is happening.

In such a situation, several big companies are occupying the space and control the production of the hardware, the wires -- or wireless devices -- that are used for transmitting information, and are creating isolated continents of information which yield maximum profit, at the expense of the ability to interchange. Even social networks that are based on sharing are creating isolated sharing continents, that are huge, transcend national borders, but are based on the commitment to comply with a given company's policy. As a counterpoint, communities of users are promoting transparency and openness in order to provide an alternative. And the governments of the world, depending on how democratic they are setting up policies and regulations that attempt to protect -- or limit -- freedom of speech, individual privacy, as well as free trade.

In the current situation, which can be characterized as the great wilderness, quasi monopolies emerge, and companies who own a lot of information, including information which is vital to the survival of entire nations, are obviously becoming too big to fail. What if the company who owns software that defines the most universally used text format goes under? What if the company which provides the hardware and software that is indispensable to access an increasingly isolated information-specific content goes under? It would be a disaster comparable, if not only worse, to the destruction of the library in Alexandria, Egypt that contained unique copies that were the testimonies of the Greek civilization. Our dependency to a handful of quasi monopolies can be compared with the railroad or oil barons of the 19th and early 20th centuries. On a historical scale, the situation we are in today is transient. It will fall apart, some way or another, when the society in general will realize what benefits can be driven from leveraging access and handling of information for the common good.

In this context, the technocrats are still heading the field. The number of standards, products, protocols, languages, platforms, methodologies available is overwhelming and is constantly being augmented. This is the reign of acronyms, which are functioning as a barrier to entry in these technical fields. If you don't know the acronyms, you are lost since the beginning. Their role is similar to mathematical formulas in the middle of a text for lay persons that would just show them how unsophisticated they are, and emphasize how smart the people are who can decipher them. Whether the technical discourse makes sense or not is not in question. They live in their own world and don't want to be disturbed. This books aims at breaking this barrier and penetrate into the secret, well-protected universe of information technology in a conceptual way, accessible to people who are curious to understand how things work, so that they can influence the decisions taken without being stopped at the gate by the technicians. This is something that can benefit policy makers, as well as information consumers, whether they are clients of information services, or simple users of a given technology.