TITLE: | Topic Maps -- Reference Model Use Cases |
SOURCE: | Patrick Durusau, Steven R. Newcomb |
PROJECT: | Topic Maps |
PROJECT EDITORS: | Michel Biezunski, Martin Bryan, Steven R. Newcomb |
STATUS: | Editor's Draft, Revision 1.7 |
ACTION: | For review and comment |
DATE: | 3 November 2003 |
SUMMARY: | |
DISTRIBUTION: | SC34 and Liaisons |
REFER TO: | |
SUPERCEDES: | |
REPLY TO: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:[email protected] http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Mr. G. Ken Holman (ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada) Crane Softwrights Ltd. Box 266 Kars, Ontario K0A-2E0 CANADA Telephone: +1 613 489 0999 Fax: +1 613 489 0995 E-mail: [email protected] |
3 November 2003
0 | Introduction |
The Topic Maps -- Reference Model (TMRM) provides a systematic way to make explicit the subjects and relationships that were implicit in the HyTime and XML interchange syntaxes in ISO 13250:2002. TMRM does not extend ISO 13250:2002, but provides an information model that enables meaningful construction, description, evaluation and comparison of syntaxes, data models, and processing models, and of the design choices that such models embody. The TMRM does not constrain syntaxes, data models, or processing models, but it does enable disclosure of certain essential aspects of each such model.
For example, ISO 13250:2002 requires a topic link (<topic>) element to have a unique ID attribute (id), and a <topic> may also optionally have a subject identity attribute (identity, 5.2.1). From the standpoint of the information model that is the TMRM, the property values of these different attributes are viewed quite differently from each other:
id: The unique ID attribute is simply an addressing convenience of the original HyTime syntax. Although most syntaxes will have similar addressing requirements, no addressing semantics are built into the TMRM. The TMRM offers no guidance about data addressing, just as the TMRM is agnostic on all other application design questions.
identity: The subject identity attribute, on the other hand, is an example of a kind of idea that is the central focus of the TMRM: the idea of using data to identify a subject in some arbitrary but specific way. According to the TMRM, every topic has at least one such "subject identity discrimination property (SIDP)", and the structure and interpretation of each such property must be declared.
In preparation for the remainder of this discussion, we define two terms:
An information model is a model of all the possible semantic components of a topic map, including the relationships between them. It is an idealized, quintessential model that constrains neither knowledge applications nor their implementations. Under the information model that is the TMRM, designers of topic map applications are free to decide which subjects will be treated as topics for the purposes of merger. For example, some applications, such as content management applications, may treat information objects as topics, so they can be managed and indexed like other subjects. Other applications may not want to manage such subjects, while still others may wish to make only certain ones (but not others) available for service as role players in relationships. Information models are different from data models; data models are application-specific. They necessarily represent choices about which subjects will be reified as topics, and how the subjects of those topics will be specified and internally represented. These design choices collectively determine, for example, which information objects will be treated as topics for the purpose of merger. The information model that is the TMRM enables the designers of data models to systematically disclose the choices that their models embody. The TMRM's systematic approach reveals, for example, whether information objects are treated as topics.
An assertion is a set of subjects that collectively specify a relationship between two or more subjects. ISO 13250 used the more natural term association to describe relationships between topics, but its definition excluded certain specific kinds of relationships among topics that are defined by 13250, including, for example, occurrences and scopes. Rather than overload the term association with a different usage in the TMRM, assertion is used for the cases described as associations in ISO 13250: 2002, and also for all the other relationship constructs of 13250, including those that 13250 excludes from the term associations. The use of the term assertion, and the single model for assertions provided by the TMRM, allow the TMRM to provide a unified information model of topic maps that is neutral with respect to all models of interchange syntaxes, data, and topic map processing.
The disengagement of the information model of topic maps from any particular syntax, data model or processing model is not merely a theoretical or abstract exercise; it allows topic maps to serve important practical purposes. For example, the proposed data model for topic maps, TMDM, explicitly disallows the merger of locator items (6.8), a design decision which is perfectly suitable for some applications. But disallowing the merger of locator items may not fit the requirements of applications that need to provide for the creation of reverse indexes. The TMRM does not constrain any data model to provide such merger, but it does enable users to intelligently modify a topic map data model to meet their requirements. It does so by providing an information model that makes the relationships between the information objects of a topic map syntax or data model explicit.
It should be emphasized that the TMRM is not and should not be construed as a syntax, data model or processing model for topic maps. It is an explication of an information model that was obscured by the syntaxes used in the original efforts of to the topic map community to formulate interchange syntaxes for topic maps.
1 | TMRM Use Cases |
The following use cases illustrate the utility of the TMRM for authors and users of topic maps, as well as for the topic map community that is creating software for such authors and users.
1.1 | Disclosure: Part 1 |
It has been explicitly noted in the proposed TMDM (6.1) that "Applications are therefore allowed to merge topics as they see fit". In the current specification of ISO/IEC 13250:2002 merger (or the appearance of merger) is on the basis of identity attributes, but the structures of the referents ("subject descriptors") of identity attributes are not defined, nor is a mechanism provided for disclosing those structures for any given topic map.
Use Case 1: Changing Software: A user of topic map software wishes to change system vendors. The user can provide the inputs from which the topic map was made, and the final result, as produced by the existing system. Since both the current standard and the proposed TMDM allow for application based decisions on merger, the question arises: How is the user to provide a prospective vendor with the merging rules used by their current software?
While ISO 13250:2002 and the proposed TMDM (6.8 Merging Locator Items) specify some merger rules, and both note that others are possible, both leave critical information undisclosed, and neither provides a basis for disclosing such information.
The TMRM offers a mechanism that enables the documentation of merger rules. Declarations of SIDPs (subject identity discrimination properties) allow vendors, authors and users of topic maps to ascertain exactly how the subject of a topic is determined to be the same as or different from the subjects of other topics.
Once one steps beyond the merging rules associated with the interchange syntaxes defined in ISO 13250, or beyond the merging rules associated with the information items of the TMDM, there is no uniform or standard basis for disclosure of such additional merging rules, and no vocabulary with which to disclose them. The TMRM provides an explicit theory of the general context in which merging occurs, and a uniform nomenclature for the components of that context, independent of any particular syntax, data or processing model. Mechanisms based upon that nomenclature for disclosure are beyond the purview of the TMRM but should be considered by SC34/WG3 for inclusion in the restatement of ISO 13250.
1.2 | Disclosure: Part 2 |
The TMRM provides a means of denoting "other" (i.e., non-subject-discriminating) properties of subjects. In the absence of the TMRM, there is no nomenclature for disclosing the structures, semantics and applications of such "other" properties.
Use Case 2: Merging Diverse Topic Maps: A user is faced with the task of merging two topic maps. Those two topic maps go beyond the proposed TMDM in that both have properties that would be described as SIDP or OP properties in TMRM terms, and nothing in the TMDM corresponds to those properties. In order to have meaningful merger, the user must determine which properties are intended to govern merging operations, so it is vital to distinguish between the subject identity discriminating properties (SIDPs) and the other properties (OPs). By allowing this distinction to be disclosed, the TMRM gives the user the ability to make the merger on an informed and rational basis, in full knowledge of the techniques used by the sources of those topic maps to make the inferences that governed their own merging operations.
Use Case 3: Merging on Unauthorized Bases: A user is faced with the task of merging two topic maps, but wishes to depart from the standard merging rules of the proposed TMDM. The user wants more merging to occur than would occur under the TMDM, in the hope that a smaller, more interconnected, more useful topic map will be the result. Accordingly, the user decides that merging should occur on the basis of weaker evidence than the TMDM requires. Some of the topics in one of the topic maps have the following properties:
name
aliases
street address
military service number
social security number
postal code
and the second topic map has topics with the following properties:
crime
date of crime
suspect aliases
postal code
The user wants to merge topics on the basis that aliases alone are sufficient for discriminating subjects. How does the user define the desired merger? How can the nature of the merger that was done be communicated to the users of the topic map that results from that merger? The TMRM provides the nomenclature necessary for such definitions and communications.
1.3 | Documenting Data Models |
Just as the needs of users vary, so do their world views, and so do the data models that they use to interact with their information. It is to be expected that users will build data models that yield the advantages of topic maps, but without conforming to the proposed TMDM.
Use Case 4a: Unreified Geographic Coordinates: Users of geographic information probably will not normally think of the coordinates of a particular geographic location as being subjects in the sense that that term is used in the topic map community. They are probably more likely to think of the geographic coordinates of a location as properties of the location, and to represent the location (considered as a fully-privileged subject) as a topic. The proposed TMDM does not recognize or provide for such properties, so such a topic map would require extension of the TMDM, in order to support the user's view of the user's own data.
Use Case 4b: Reified Geographic Coordinate Reports: It is also the case, however, that for some purposes, the reports of geographic coordinates themselves may be regarded as fully privileged subjects, and they may need to be represented as topics, so that assertions can be made about them, and so that all the information concerning each coordinate report will be accessible from a single merged topic. (This may be required in order to capture and manage metadata about the coordinate report.) In this case, a special SIDP class for topics whose subjects are reports of geographic coordinates is needed.
While it is true that all of the foregoing could be done by simply extending the TMDM in some unspecified way, it is also true that there is currently a lack of a nomenclature by which to disclose either the extension to the proposed TMDM or the treatment of these properties as topics. The TMRM provides the nomenclature upon which disclosure of extensions to the proposed TMDM could be made.
2 | Concluding Remarks |
The TMRM meets the above-described use cases by providing a standard nomenclature and definition of the basic components of a topic map in a way that is divorced from any syntax, data model, or processing model. Rather than defining a list of properties of topics, as does the proposed TMDM, the TMRM provides general semantic/information mechanisms, including SIDPs, OPs, and assertions, that can be used to define the essential features of any topic map application, including applications that would require extensions to the TMDM. It is not reasonable to expect the TMDM to meet all user requirements in all applications of topic maps. If it were possible for a single universal data model to exist, any such model would probably have been patented long ago.
The TMRM defines the components of an assertion, and the properties of assertion components. However, the TMRM does not require that any syntax, data model, processing model, or implementation make all such components visible to or manipulable by users, nor does it require that there be underlying support for making all of these components visible or manipulable. It does enable disclosure of the design choices made by the creators of syntaxes, data models, processing models and implementations. By reference to the TMRM, the authorized semantic interpretations of a topic map can be unambiguously, comprehensively, and accurately communicated.
Any proposed syntax, data model or processing model for topic maps can be defined in TMRM terms. The TMRM is designed in such a way as to allow such definitions, once made, to provide significant benefits to users of topic maps.