ISO/IEC JTC 1/SC34 N185

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

Title: Preliminary working draft for Topic Map Conceptual Model
Source: JTC1/SC34/WG3, Daniel Rivers-Moore
Project: New Proposal
Project editor: N/A
Status: Preliminary working draft for discussion
Action:
Date: 8 December 2000
Summary:
Distribution: SC34 and Liaisons
Refer to:
Supercedes:
Reply to: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-1896
E-mailk: mailto:[email protected]
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
11 West 42nd Street
New York, NY 10036
Tel: +1 212 642 4976
Fax: +1 212 840 2298
E-mail: [email protected]

XML Topic Maps

Introduction

World, Words and Web

The world contains many things - human beings, animals, plants, inanimate objects, thoughts, beliefs, words, languages, concepts, computer systems, individuals, groups, companies, towns, countries, relationships, messages, documents, books, information, actions, emotions, successful and unsucessful attempts at communication, to name but a few. Any of these things, from the most specific to the most general, from the most concrete to the most abstract, can be talked about or written about. Some, but not all, can be photographed or drawn. Some, but not all, exist within computer systems or can be addressed by those systems.

Words and language are the primary (but not the only) means by which we human beings communicate with one another about the things in the world. We also use gestures, pictures, sounds, works of art and more. But words have a peculiar power because, within a restricted community that shares a common language, they give us the best hope we have (though certainly not the guarantee) of communicating unambiguously.

Computer systems are among the tools we use to communicate about things in the world, to aid us in reasoning about the world, or to act directly on the world to bring about change. The World Wide Web is a computer system, consisting of many subsystems, operated and used by many human beings. We cannot use the Web to send an emotion, a thought, an action, a mountain, a town, a person, or a concept directly to another system or human being. But we can and do use it to send words, pictures, sounds, documents, database tables and other richly structured information objects that describe these things, and may even, in the case of emotions or thoughts, convey them, or in the case of actions, cause them to occur.

Topic Maps and the 'Semantic Web'

Sending words, pictures and structured information objects is very useful and powerful, and being able to do it on a worldwide scale even more so. But there is a sting in the tail. To communicate with one another, we need to convey not just words, but the meanings of those words; not just computer files, but the intent with which and for which those computer files were created. And the wider the community of people, or the number and variety of computer systems, with which we are communicating, the harder it is to ensure that the person or system receiving the words, pictures or other information, will correctly glean from them the meaning intended by the sender.

The 'Semantic Web' is a Web in which meanings can be conveyed and not just words, pictures or data. It does not exist today. Topic Maps can be used as a means towards building a semantic Web, not because they remove the gap between words or information objects and their meanings, but because they recognise that gap for what it is, and work within its limitations. A mountaineer who climbs a peak without crampons or ropes may get there by luck or supreme skill, but to enable the world at large to scale the mountain, it is better to recognise its challenges and provide simple tools to address them, rather than ask people to climb based only on their supreme desire to reach the top!

Peaks and Chasms

To continue the mountaineering image, we can say that the mountain we are trying to climb turns out to be traversed by deep chasms. We want to use information objects in a computer system to communicate something we have in mind to some other person. There are three mountain peaks here, looking at each other over apparently uncrossable chasms. The three mountain peaks are our mind with its thoughts and concepts, the real world things we are thinking of, and the information objects we use as a means of communication. These three domains are different in nature yet deeply connected. Our minds have thoughts that reflect some aspect of the real world, and the information objects describe those aspects of the real world that correspond to our thoughts. When the recipients of our communication interpret an information object, we hope it will evoke in their minds thoughts about the real world that somehow match our own. If it does, then the communication has been successful. Yet we can send neither the thought itself, nor the real world objects about which we are thinking. The thought, the information object used to express it, and the real-world things that both of them represent, are utterly distinct. They inhabit different domains and the gap between them seems unbridgeable.

Seen from another point of view, however, there is only one mountain peak, and no chasms at all. The computer system and the information objects it sends and receives, our minds and all their thoughts, are themselves part of the real world. They too can be thought of, described in information objects and communicated about. The whole system is deeply interconnected; it can point at itself, describe itself and communicate about itself. Yet, in any particular act of communication, a clear distinction must exist between the information object and the information it carries. Otherwise, we fall into the chasms of infinite regress, circular definition and, in the computer system, unending processing loops that hang the computer and leave the user staring at a frozen screen or a big red error message.

The XTM Conceptual Model

With the thoughts and images evoked by the Introduction in our minds, we can now present the XML Topic Maps conceptual model. We do this through a series of information objects (words and diagrams), which aim to be clear and simple enough to convey the XTM conceptual model to a human reader, yet precise and formal enough to provide a basis for XTM implementations that will be able to run in real-world computer systems and use the XTM interchange syntax defined in the XTM DTD.

The diagrams used in this section are 'class diagrams' and 'object diagrams', using the conventions of the Unified Modelling Language (UML). In a class diagram, each rectangle represents a class of objects (a kind of thing that can exist), and the words in the rectangl are the name of that class. The lines and arrows between the rectangles represent relationships that exist or can exist between instances of those classes (individual things of those kinds). In an object diagram, each rectangle represents an individual object, and the words in the recangle are the name of the individual, followed by a colon, followed by the name of the class of which it is an instance. The lines between the rectangles represent relationships that exist between those individual objects.

A Resource is an addressable object

The first diagram is extremely simple. It shows a single rectangle labeled 'Resource', and states that the defining characteristic of a Resource is that it is addressable. This means that it is possible for a computer system to determine whether the things referred to by two Resource references are or are not the same. Examples of Resources are records in database files, electronic documents, images and sounds, strings of characters, and XML elements and attributes. These are all things that can exist within a computer system. They are 'addressable' in the sense that the system can retrieve them and make deterministic comparisons between them to establish their identity or difference.

Resource (class diagram)

Subjects of discourse may or may not be addressable

Most of the things of interest in the real world are not addressable in the sense described above. To determine whether two references to things such as human feelings, physical objects, mountains, people or countries refer to the same thing requires understanding of the real world as it exists outside the confines of the computer system. Any or all of these things may be a Subject of discourse, but they are not addressable and so do not count as Resources. On the other hand, we sometimes want to talk about a particular electronic document, character string or database record. These things are indeed addressable Resources and may also become a Subject of discourse. This diagram shows that the Subject class has two subclasses, Adressable Subject and Non-addressable Subject, and that Addressable Subject is also a subclass of Resource.

Types of Subject (class diagram)

Topics reify Subjects within a computer system

A Topic is a Resource that is used to stand in for a Subject within the computer system. A Topic can be manipulated and reasoned about, and can have statements made about it. It acts as a proxy within the system for the physical or abstract, addressable or non-addredssable real-world thing that is its Subject. In this sense, it is said to 'reify' the Subject, meaning that it makes the Subject 'real' from the point of view of the system. This diagram shows that a Topic is a Resource that reifies exactly one Subject. The direction of the horizontal arrow indicates that the Topic provides an indication of what its Subject is, but that the converse is not the case. The '0..*' label indcates that any Subject may be reified by zero or more Topics. The comment explains that the ideal case is for each Subject to be reified by no more than one Topic.

A Topic refies a Subject (class diagram)

Resources can describe Subjects

Though the Subject of a Topic may not be addressable within the system, it is always possible to provide a human-interpretable description of it. The term 'Subject Descriptor' is used to denote a Resource whose human-interpretable content is capable of conveying a clear and unambiguous indication of which particular physical or abstract, addressable or non-addressable real-world thing is the Subject of the Topic. This diagram shows that a Subject Descriptor is a Resource that provides a definitive description of a Subject. The '0..*' label indcates that any Subject may have zero or more Subject Descriptors.

Subject Descriptor (class diagram)

Only Addressable Subjects can be referenced directly

In most cases, the Subject of a Topic can only be referenced through the use of human-interpretable Subject Descriptors. However, in the special case where the Subject of the Topic is a Resource, it can be referenced directly. This diagram brings together and amplifies the main points of the previous two diagrams. It shows, as we saw before, that a Topic reifies a Subject and that a Subject Descriptor is a Resource that indicates what the Subject is. It also shows that a Topic may reference any number of Subject Descriptors, and that if the Subject is a Resource (an Addressable Subject), the Topic can reference it directly.

Referencing the Subject (class diagram)

A Topic Map consists of one or more sets of Topics

A Topic Map may bring together Topics that are defined by syntactic constructs (such as XTM-conforming XML Topic elements) in many Resources. All those Topics that are defined within a given Resource can be referred to as a Topic Set. When the Topics from different Topic Sets are b combined to make up the Topic Map, those that are found to have the same Subject are merged into a single Topic. The number of Topics in the Topic Map may thus be less than the sum of the numbers of Topics in the constituent Topic Sets.

Topic Map and Topic Set (class diagram)

Relationships are applicable within defined Scopes

Relationships among things rarely apply in all circumstances or for all time. Here we introduce the notion of Scope, which is best described as the context within which a particular relationship pertains. A Scope comprises a set of Topics which place limits on the validity of the relationship. For example, the relationship of "ally" between two countries may be limited to a particular time period or a particular conflict, or both. XTM allows a relationship to be asserted, but constrained by being associated with a Scope consisting of one or more Topics. The meaning of this is that the relationship only applies within the context of all the Topics belonging to the Scope. This diagram shows that a Scope consists of zero or more Topics, and that Topics may be added to the Scope to limit the context that the Scope defines.

A Scope is a set of Topics (class diagram)

A Topic may have only one Base Name within a given Scope

One important relationship is that between a Topic and its name or names. A Topic may have many names, applicable in different contexts. The notion of Base Name is of a relationship involving a String, known as the Base Name String, a Topic, and a Scope that defines the context within which that String is considered to be a name for that Topic. There is a constraining rule that only one Topic may have a given Base Name within a given Scope. This means that the combination of Base Name and Scope can be used to identify a Topic uniquely.

Base Name within Scope (class diagram)

Associations relate Topics within a Scope

Topics may be related to one another as players of Roles in an Association. An Association has one Role for each way in which Topics may be involved in it. For each Role, there may be zero or more Topics that play that Role. The Association is itself a Topic whose Subject is the relationship between the Subjects of the Topics that are players of its Roles, and each Role is a Topic whose Subject is the role played in the relationship by those Subjects. The Scope, if present, serves to limit the context within which the Association is valid.

Association between Topics (class diagram)

Association Templates define classes of Association

An Association Template defines a class of Associations characterized by the Roles that it has and the classes of thing that can play those Roles. The Association Template is itself an Association whose Roles are in one-to-one correspondence with the Roles of the instance Associations that conform to it, and the players of whose Roles are classes of which players of the corresponding Roles in the instance Associations must be instances. Any player of a Role in the instance Association must be an instance of at least one class that itself is a player of the corresponding Role in the Template Association. An Association Template for marriage, for example, may have two Roles, 'may-be-husband', played the class of men, and 'may-be-husband', played the class of women. Any instance Association that conforms to this template would have a husband role, whose player must be an instance of the class that plays the role of 'may-be-husband' in the AssociationTemplate, and a wife role, whose player must be an instance of the class that plays the role of 'may-be-wife' in the AssociationTemplate. In other words, the husband in a marriage that conforms to this template must be a man, and the wife must be a woman. In a different society, the 'may-be-husband' Role in the marriage Association Template may have two players, the class of men and the class of boys, and the the 'may-be-wife' Role in the marriage Association Template may have two players, the class of women and the class of girls. This would mean that the husband in a marriage that conforms to this template must be either a man or a boy, and the wife must be either a woman or a girl.

Association Template (class diagram)

Topic Occurrences are Associations between Topics and Resources

Association templates can be used in very powerful ways, to build structures of meaningful relationships between Topics and Resources. Several important Association templates will be defined in the remainder of this specification. However, we shall begin with one that is fundamental to the notion of Topic Maps. This is the TopicOccurrence template. It is structured as follows: The TopicOccurrence Association has two roles, a TopicRole and an OccurrenceRole. The TopicRole may be played by any Topic at all, but the OccurrenceRole must be played by a Topic whose Subect is a Resource. The meaning of this is that a TopicOccurrence associates a Topic with a Resource. The Resource is one that is relevant to the Topic in some way, and is known as an occurrence of the Topic. An Association that uses the TopicOccurrence Template as its template may itself have a class-instance association with another Topic whose subject is an 'occurrence type'. Examples of occurrence type might be definition, mention, or description, meaning that the Resource in question defines, mentions or describes the Topic of which it is an occurrence..

Class-Instance Associations between Topics may be asserted

TBD

Class-Subclass Associations between Topics may be asserted

TBD

Other published association templates

TBD

Mapping to the ISO 13250 SGML Syntax

TBD