The Knowledge Organizers
Knowledge Sharing
Concept Mapping
Information Visualization
Topic Mapping
Tacit Knowledge
Categorization
Memory Creation
Context
Topic maps use "He", "She", or "It". There is nothing else than quotations. RDF use "I". I say this and I put it here. When other people refer to it, they appropriate the thing that is there, not necessarily what was intended to be said.
The illusion of Topic maps is that there is only one ontology that can be used to represent any kind of knowledge. The illusion of RDF is that there is only one place to express things: the web. Both approaches are interesting, they are complementary but they are both insufficient. We need to do better than this.
RDF is based on the simplistic assertion that once a statement has been made, it is universally valid and exploitable. This hides the fact that the statement may only have useful meaning in the specific context in which it was created and can be meaningless elsewhere.
How to connect pieces of information?
Information items relevant to the same subject need to be connected. As well as related subjects . When searching for information about a particular subject, it's important to find the maximum number of its occurrences. Occurrences of a subject can be missed because the terms used to designate it are different from one document to the other, or because the subjects are considered different by an author, whereas the person who is looking for information about that subject assimilates subjects that are differentiated in other sources.
The underlying principle: Subject collocation objective1: one location per subject in the topic space. Subject becomes a hub, which concentrates all the links to the information pieces available in relation to that subject. Subject is an organizing principle around which everything else is organized. Topics are forming a map, which is a set of dots (nodes) connected through arcs. All connections express semantic relationships, and it's up to the information owner to describe what these relations are about.
There is no such thing as a hierarchy of subjects. Instead, subjects can be related through associations, some of which may be hierarchical, but the semantic of the associations can be as nuanced as necessary depending of the context.
Advantages: Consistency, economy (avoids redundancy) Collocation avoids having to link to all things which have a certain relationship with a subject.
A subject is an object of discourse. It is anything about which something can be said.
Everything is a subject, even a name. A subject can be named in a variety of ways, depending on the context (for example, which language is used).
Connect things prepared to be connected: ontologies (either upper level, or local, schemas, models, very nicely engineered). Facilitates communication within a well-delimited group. Emergence of a private sphere of discourse for that group.
Merging with the unknown. Prepare for finding things that have not been prepared to be merged.
Inside / outside: Expanding the Document Boundaries.
-
A document. An index is the same as a subject catalog for several documents.
-
Cross-reference and bibliographic reference. A cross-reference is internal, while a bibliographic reference is external.
-
Metadata vs. data. Metadata are data added "outside" the document to qualify it (e.g., to describe its subject).
-
Semantic Markup is "internal metadata". This is information added inside the document that describes its structure, at any level of granularity desired.
Difference between an authority record and a bibliographic record. The first is the catalog entry for a library, the second is a simple reference to the document.
Independence of semantic wrt document type. Subjects are independent of document types: films, books, etc. can be on the same subject. Relationships between subjects are independent of where these subjects occur.
Interrelated Ontologies of many types.
Controlled vocabularies.
For the definition of ontology, see J. Sowa, http://users.bestweb.net/~sowa/ontology/||http://users.bestweb.net/~sowa/ontology/.
Information items relevant to the same subject need to be connected. As well as related subjects. When searching for information about a particular subject, it's important to find the maximum number of its occurrences. Occurrences of a subject can be missed because the terms used to designate it are different from one document to the other, or because the subjects are considered different by an author, whereas the person who is looking for information about that subject assimilates subjects that are differentiated in other sources.
Finding aids express a world view.
The finding aids that we use on a permanent basis today seem to have existed since all times. But the tables of contents, glossaries, indexes, library classifications are quite diverse, and they imply perspectives on the organization of knowledge and world-views. The taxonomies, vocabularies, ontologies that are being created will also reflect a certain world-view. If we want them to be valid over a long period of time, it is preferable to be as explicit as possible and disclose, as much as we can, the principles that have been used to design them.
The finding aids that we use on a permanent basis today seem to have existed since all times. But the tables of contents, glossaries, indexes, library classifications are quite diverse, and they imply perspectives on the organization of knowledge and world-views. The taxonomies, vocabularies, ontologies that are being created will also reflect a certain world-view. If we want them to be valid over a long period of time, it is preferable to be as explicit as possible and disclose, as much as we can, the principles that have been used to design them.
Taxonomies
A taxonomy is a classification system which was originally invented to describe and identify living species. The term has been generalized to describe things or concepts. Taxonomies are based on classes and instances, and are used to ease retrieval of information.
However the taxonomies contain classes that depend on the context and the history of their inception. Looking at taxonomies doesn't necessarily convey a sense of completedness nor consistency. In order to understand how it is organized, we need to know the context in which they have been created. Here are two examples of taxonomies:
A well-known taxonomy is the one organized by https://www.yahoo.com/everything. The "Products and Services" taxonomy is organized as an alphabetic index, with a few high level terms, that contain categories as diverse as "Advertising", "Answers", "Autos", "Beauty", "Celebrity", "Dating", "Developer Network", "Downloads", "Fantasy Sports", "Finance", "Flickr", "Groups", "Help", etc. The terms used in these taxonomy seem quite arbitrary, and reflect probably the most current categories that users are searching on.
Another taxonomy is the Library of Congress classification, 2with the following top-level categories: "General Works", "Philosophy, Psychology, Religion", "Auxiliary Sciences of History", "World History and History of Europe, Asia, Africa, Australia, New Zealand, etc.", "History of the Americas", comprising two categories: Class E and Class F. Class E contains "America" and "United States", while class F contains "United States local history", "British America (including Canada), "Dutch America", "French America", "Latin America, Spanish America", etc.
Another well-known taxonomy is called MeSH (Medical Subject Headings) and is the National Library of Medicine's controlled vocabulary thesaurus.3
Many companies or organizations have developed their internal taxonomies, used to retrieve information based on the categories considered the more relevant to their specific contexts.
Library classifications
A library classification is a system of coding and organizing library material (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject.4 There are three types of library classifications: enumerative, hierarchical and faceted. An enumerative classification system is an alphabetical list of subject headings, a hierarchical classification divides subjects hierarchically, from most general to most specific, and faceted (or analytico-synthetic) divides subjects into mutually exclusive orthogonal facets.
Vocabularies
Vocabularies are lists of terms commonly used within a given community, and that are recognizable as such. Vocabularies can be authority keyword lists, in a library classification systems, lists of terms in use within a given science or industry, or a list of field or element names comprising a structure.
Taxonomies
Taxonomies are classifications of terms by types, which describe a hierarchical organization of knowledge, by class, subclass, superclass. Taxonomies represent a strongly biased view of the domain terms, even if they result from an agreement.
Ontologies
Ontologies represent relationship types which connect terms together, whether they are types defined in a taxonomy, or general terms from a vocabulary. An ontology defines the logic of the connections between information items, because there may be rules which can be built and are dependent on the relationship type. Thesauri are a form of ontologies.
Also means "controlled, structured vocabulary".5
Distinction between Class and Instance, superclass/subclass
Depending what kind of model is used to represent semantics, it may describe a class, an instance, or both. It is possible, that in some contexts, even the distinction between class and instance becomes irrelevant, in a similar move that made the distinction metadata and data seem inapplicable in many cases.
Classification schema facilitate discovery but they can also prevent it from happening.
One or several taxonomies
Three rules are defined for a taxonomy to function: uniqueness, universality, stability6. These rules seem to be obvious, rationale, and self-consistent.
The Challenge of Uniqueness
However, they only work within a limited universe of discourse. Uniqueness can only be achieved by communities of people who know each other, because if they don't even know that the others exist, there is no way to achieve uniqueness. However, it remains an object towards which one should tend.
The Challenge of Universality
Universality is another noble objective but it may be too ambitious and even have negative side effects, because there is no limit to which one discourse should become the only one universally valid, acceptable. However, what defines the boundary of a scientific discipline is the universality of the terms.
The Challenge of Stability
Stability is easier to achieve by printing than by publishing on line. Once a work is published, it's done for ever, it stays as it is, and if something needs to be changed in it, it becomes another edition. Information provided on line can be very easily changed without notice. Its content can have been changed, or it can simply not exist any more.
-
Elaine Svevonius, The Intellectual Foundation of Information Organization, 2000 [NYPL:JBE 01-453]. ↩
-
[National Institute of Health, U.S. National Library of Medicine Medical Subject Headings] (https://www.nlm.nih.gov/pubs/factsheets/mesh.html) ↩
-
The "Nuts and Bolts" of Taxonomy and Classification, compiled by R. Hays Cummins, Interdisciplinary Studies, Miami University, http://jrscience.wcp.muohio.edu/lab/ TaxonomyLab.html ↩