The Digital Mess

Maintenance Procrastination

Information is a mess, like many other human activities. The difference is that information processed by computers is often presented as well-organized. However, the reality is that no matter how much effort we spend in organizing it, it constantly evolves, new information is added, old information becomes obsolete, points of view on what information means evolve over time, and information will eventually return to its natural state of mess. And the mess keeps accumulating. It is similar to the place where we live. When we first move in, we make sure that every object has its allocated place. But, even with the best resolutions, we start to spread things all over. We have this unique faculty of not even seeing the mess we create, because we "know" where things are, we remember a chain of events that can be fairly complicated but eventually lead us to the conclusion that a given telephone bill has to be on the table in the hallway surrounded by newspapers, for example. Furthermore, there is a difference between one's personal mess and a mess that is collectively maintained and used by a group of people: a family, for example, or a company. One person's own mess is searchable, but it is more challenging to succeed in searching someone else's mess. The efficiency and the quality of a team work depends on how information is shared, i.e. is organized and retrieved.

Once an information system is designed, it has been carefully thought of by experts who have spent a significant amount of time and energy to make sure that everything fits. Categories are defined, and are used to classify things. Categories collectively comprise a structure, which may be described with schemas, themselves compliant with a number of standards. All is well. As time goes by, it unavoidably happens that new things turn up at unexpected places. One possible way to handle this situation would be to create a new category, but this obvious move is not always possible, for the system may not allow it and the workflow is too complex and it doesn't seem that this small detail is worth all this trouble. Confronted with this kind of delicate situation, people tend to be both flexible and conservative: they would classify in a given category something that doesn't really completely fit in, but does if slightly stretched, and they can make an argument to justify why they did it. Sometimes, people take time to document what they are doing so that it is possible a few years later to understand why things look the way they are. But when they don't, the gap between the original intent of the structure of knowledge and the reality keeps widening over time. Then comes a point where the system becomes disconnected from reality at a point where it can not be reliably used any more. It's time for a big information system overhaul, using updated technologies, and the re-engineering often translates into huge expenditures. If the mess would have been considered as a feature, and not a bug, it would have been possible to design methods to manage it day by day rather than ignoring it until it becomes unbearable.

This old problem has a well-known solution. In a house, it is called housekeeping. The word says it all. Nobody wants to have to move just because a house becomes messy. In addition to regular cleaning, we also need to maintain the house to preserve it. The problem is that cleaning and maintaining activities are considered the most boring, repetitive, unrewarding activities one can imagine. The temptation to ignore them is permanent. The best way to do it is by denying that there is a problem. Why would we need to clean? The mess doesn't accumulate provided we pretend it doesn't exist. The denial of the need for maintenance is very well spread, and at all levels, from the children who don't see why they would need to brush their teeth to the Federal Government refusing to fund the maintenance of bridges as long as they stand, even though they could present a hazard, or New Orleans not having fixed his levee system. This syndrome can be called "maintenance procrastination".

The main reason we despise maintenance is that the net effect of a properly done maintenance job is that nothing happens. When a bridge is properly maintained, it just continues to be there and serve its purpose to allow vehicles to travel back and forth on it. But reaching that result may be quite expensive. If it's easy to denounce the lack of maintenance after a disaster strikes, it is more difficult to prove that without maintenance something is going to fall apart for sure. The return on investment may be a hard sale. The effect of global warming on our planet could also be a catastrophic case of negligence, of deferred maintenance. It is interesting, and disheartening, to notice that in America, these questions hit a nerve and still have a partisan bias that clearly is not found anywhere else in the world. Health insurance, which is also a way to perform personal maintenance, falls into a similar category of unnecessary burden until it's necessary but too late. When disaster strikes, it is too late to negotiate the costs, and people will pay whatever it takes to get out of a painful situation.

Again, the situation is not different with information systems. One of the reasons why day by day maintenance is considered an unnecessary burden, because there is a general perception that technology evolves so rapidly that there is no need to stabilize existing systems over the long term. This may be globally true, but it is temporary. The scaffoldings and mobile homes that constitute today our information systems will some day be grounded on more stable territories and we'll need to switch to taking care of what their real value is, beyond using the most recent tools available on the market. It's time to start to mature and think over the long term, rather than constantly spend a lot on catastrophic, disaster-driven, business models. As technology loses its lust, people get more concerned about what they can do with it, and what they can't do.

Therefore, thinking that systems, if well designed, can last forever without a need for care or maintenance is an delusion. It is based on the naïve belief that the architects of a new solution have taken into account everything which is necessary, while forgetting that even if the best of them can only think about requirements that exist at the time of conception. Sure enough, it is not possible to foresee all changes that can happen as time passed. But the more detailed and "user-friendly" systems are, the more they are vulnerable in the future. When tasks have been analyzed and decomposed using the best practices at the time, the less they can be changed to accommodate new kinds of behaviors. Such systems need to be replaced altogether, with no way of performing gradual, controlled, improvements. These are systems with no removable parts, like cars which would need to be replaced once a tire gets flat, because tires could not be set apart. Computers, especially laptops, are more and more conceived as non-upgradable. If you want to improve them, your only choice is to buy a new one. But these are only the symptoms of an industry in its infancy, and infancy doesn't last forever.

What do computers "understand"?

The digital society operates computers, which work by modifying values of binary digits, i.e. zeros and ones that are stored internally and constitute the computer "memory". A computer sees everything as true or false, black and white. It is thanks to our ability to describe information, at the most elementary level, as a sequence of bits that we are able to make use of the computer technology. The digitization of everything, whether it is text, images, sound, and the ability to rely on digital technologies to store, transmit information worldwide practically instantaneously is the core of this technology. But the fact that information is digitized doesn't mean that our world view should be binary as well.

Left to themselves, computers do not "understand" anything. They sometimes are able to give us the illusion that they understand what we are looking for, how we feel ("sentiment analysis") but this is an artifact due to algorithms that perform calculations, some of which yielding to results which bear some relation with what we are searching. Search engines' results present a listing supposed to contain the references we are looking for. Most of the times, they perform a good job, and as Google put it, if we are "feeling lucky"! However, there is no guarantee that we have not missed the most important information that we were looking for, and which exists, unbeknownst to the algorithms.

The more we are using computers, the more we are confronted to situations where we know we don't get exactly what we are looking for. For example, the computerized answers provided by companies to their customers on the phone often fail to address the very issue people are calling for. It may seem just as an inconvenience, but sometimes the frustration it generated may end up turning off customers, who care about good quality service. It amounts to this tautological statement: automation can only go as far as an automated process can be designed. This is particularly visible in automatic translations. When they are based on recognizing a sentence that is in their database, and has been already translated by a human, the result is very good. When translation algorithms are applies, then anything goes, and the results can be pretty hilarious for those who already know the target language. But they can be misleading for those who have no way to know what the original text was.

One of the motivations to automate information processing is the ability to scale. The amount of information available is so overwhelming that no human group even well organized can realistically process it. Computerized applications go a long way to provide us with information we can use, but they still can't go all the way through, to the point that information makes complete sense.

Too much or too little information?

It is possible to be inundated by information which is redundant, irrelevant or uninteresting, and at the same time be deprived of important and valuable information.¹ TV news broadcasts only select a very small chunk of the information available. A government may choose to classify information, in order to hide it from the public, for good or not so good reasons. It takes smart, educated, people to figure out what relevant information should be made of. The main thing we learn at school is how to learn and how to analyze what we receive. But how does a computer system know how to distinguish what is relevant from what is not?² The only answer to that question is that it depends of the creators of the software used to provide information, for which choices have been made, which are usually not disclosed. For example, Google's search algorithm is proprietary and is not disclosed, and without knowing how it works, it is very difficult to understand the order and ranking of results. It is also impossible to know the results which are not displayed, even the fact that they exist. The situation can be made visible for information owners, who know their sources, and may be surprised to see that the output is not what they expect.

In other words, despite the fact that the amount of information that flows is enormous, information doesn't flow bidirectionally.

Power to the technicians

In the corporate world, decision makers, instead of asking technologists to provide them with what they need, rely on technologists to tell them what they can do. The abdication to technology fulfills the exact contrary of the utopists who saw technology as liberating, freeing us from tedious tasks and repetitive labor. That may have very well happened for a number of jobs, but many of the new jobs that have been created impose a certain level of servility, because the tasks to accomplish must fit into what machines are able to do.

An easy way to obfuscate information without hiding it is to express things in a more complex way than they ought to be. Technicians draw their power from their ability to understand what laypersons can't. The systematic use of acronyms in government or in the corporate world helps people live in isolated silos of knowledge, which separate those who know from those who don't. The decision makers are not always aware of all aspects implied by a choice they make, and have to go with the consequences afterwards. In other words, the lack of transparency sometimes benefits to the technologists, who gain their power from being perceived as knowing more than the rest of the crowd.

Can complex things be expressed simply? It depends what simplicity means. The laws of physics, for example, are an extraordinary simple way to express how nature works. The mere fact that physical laws describe the real world using mathematical expressions is an achievement in itself. According to the law of gravitation, for example, bodies attract each other, and this is why the Earth rotates around the Sun. The explanation is pretty simple but the phenomenon itself is quite complex. Reading Newton's original text is very difficult, because it is full of religious metaphors and constructions that he uses as scaffolding. The closest we are from a moment of innovation, the more complex things look. It's only after a period of time and the fact that an innovation has become common practice that things become simple. When the telephones were first installed in homes, many people needed help to use them, because they found them awkward. The same hold for computers, smart phones. Until personal computers became generalized, only secretaries and professional writers knew how to type on a keyboard. The generations born after a technology is invented find it quite natural, and don't even understand why older generations found it daunting.

Governmental or corporate information technology departments are often overwhelmed by the growing complexity of technologies, and rely on external contractors not only to provide them with tools but also sometimes to outsource the management of their information content. And the decisions on what technology to adopt are sometimes based more on the lobbying power of big technology firms rather than a thorough review of the needs and the tasks to accomplish.

The adoption of technology-based solutions have also had the effect to increase the power of technologists, who decide of the architecture of the systems. It sometimes results into a loss of accountability. The governments are like any other consumer of technological products, they purchase what is available on the market. But the limitations of the products often determine the boundaries of what can be done. The balance of power between users and vendors is of crucial importance. Several powerful companies and organizations, and the government is among those, are able to make vendors create new products adjusted to their specific needs if they do not find any product currently available that satisfies their needs. It is interesting to note however that the government technology czars do not often take advantage of their power, and instead behave as regular users of several products made by big software corporations.

In order to retake our destiny in our own hands, we have to understand how this information society works. This book is about giving confidence that this is not reserved to a minority of technology literates, leaving everyone else on the side. Things are complex, but not out of reach. In order to understand how things work, we have to use our capacity of thinking, and take sometimes a critical view of what is proposed to us, and go beyond the short term and the talk of the≈ day. Let's try.

Old Notes

Found on a notebook

The problem with this approach is that it enables only one perspective of information which is valid at one moment, for a given purpose, be it explicit or implicit. It imposes a worldview, which seems "obvious" to those who created it, but it is far from obvious for those who use it, especially after a certain time, where the authors of htat information model are not available any more, because they have gone elsewhere, have retired, or died. And the world still moves on.

The obsolescence of information models is a phenomenon well known to librarians, who are struggling with classification schemas sometimes older than one century. It is a similar problem for organizations using an information system which was carefully designed only a few years or less thana decade ago, but failed to account for things that were either unknown at the time or considered out of scope or irrelevant at the time.

Another thing that information systems lack is accountability. A lot of information are lost, because it is usually impossible to keep track of the context in which the information has been created, people who have that knowledge leave or retire, without leaving something behind.

One of the major problems in information systems is the so-called distinction between "data" and "metadata". In a book, it would be the distinction between the title page and the content of the book. In electronic information, metadata usually describes the data of creation, the name of the authors, the version of the software used to capture the information, etc. That is valid in a limited scope, which is the book itself. If the book is an item in a collection, then the title of the book and the authors' names are no more metadata but data. They are part of the "instance" which constitutes a library catalog. For the library catalog, the metadata is the structure of the items that collectively constitute that catalog. Now, if some institution, or the World Wide Web, integrates several library catalogs, what was considered metadata for each catalog becomes mere data.

xxx said: "Someone's data is someone else's metadata". This is a matter of perspective. It depends of who looks at it, for which purpose the perspective is the distortion of reality seen from someone's vantage point. Multiple perspectives exist on practically everything. This is what makes the world such a messy place, but this is also what enables freedom. Freedom of speech than the right to express one's point of view, which may differ from the majority's point of view and may sometimes seems crazy to some. And that's OK, especially in a democratic society.

Furthermore, perspective is defined as the theory or the method used when applying a projection that will eventually render three dimensions into two dimensions. For example, a painting or a photograph uses a certain perspective to render the 3-dimensional reality into a 2-dimensional flat surface. Depending on the perspective, the result can be very different. For example, representations of the earth on a map makes continents look very different depending which perspective is chosen. We are so used to some of the perspectives that an unusual one would seem unreal, almost impossible. Take for example a representation of the earth where Antarctica is on top and the North pole is at the bottom. The same is true for text which is seen as if it was in the mirror. This is the way typographers were compoing text during centuries.

Mark Andrejevic, Infoglut: How Too Much Information is Changing the Way We Think and Know, Routledge, 2013. ↩
David Weinberger, Too Big To Know: Rethinking Knowledge Now that the Facts aren't the Facts, Experts are Everywhere, and the Smartest Person in the Room is the Room, Basic Books, 2011. ↩