ISO/IEC JTC 1/SC34 N0235
ISO/IEC JTC 1/SC34
Information Technology --
Document Description and Processing Languages
Title: |
Discussion Paper on ISO/IEC 13250:2000 Topic Maps - Defect Report |
Source: |
ISO/IEC JTC 1 / SC 34 / WG 3 |
Project: |
|
Project editor: |
Michel Biezunski and Steve Newcomb |
Status: |
|
Action: |
For review and comment. |
Date: |
25 July 2001 |
Summary: |
This document is distributed for review and discussion at the WG 3 meeting on 11 August 2001 in Montréal, Quebec. The intent is to provide guidance in the development of the full defect report of 13250. |
Distribution: |
SC34 and Liaisons |
Refer to: |
|
Supercedes: |
|
Reply to: |
Dr. James David Mason |
ISO/IEC 13250:2000: How Two
Syntaxes Can Make One Standard
July 24, 2001
Michel Biezunski and Steven R. Newcomb
Introduction
------------------------------------------
The ISO/IEC 13250:2000 "Topic Maps" International Standard, which
seems about to integrate a second interchange syntax, the XTM DTD, does not
explain to
what degree, and exactly how, the two syntaxes are functionally
equivalent. The standard should explain this.
How to Describe the Semantic Commonalities of the Syntaxes?
------------------------------------------
One might think that there are two ways to formalize the semantic commonalities
of the two syntaxes:
(1) Describe a rigorous syntactic transformation
process that will show how instances of one
syntax can be transformed into instances of the
other syntax, or
(2) Describe how instances of each syntax can be
transformed into instances of the common
underlying model (which could be, but need not
be, a syntactic model), and describe how
instances of the underlying model can be
transformed into instances of each syntax.
The first approach might seem easier, at least superficially. However, if
we select this solution, we are focusing on just two syntaxes, instead of recognizing
the fact that information that has the character of topic map information may
be expressed in
many different notations. It is highly desirable to be able to federate
all kinds of "finding information", not just the finding information
that happens to be expressed in one of only two syntaxes. For example, it
would be inappropriate to exclude instances of RDF or
NewsML from the possibility of being understood as interchangeable topic map
documents, with their information becoming directly available to topic map
application software. If we adopt the first approach, RDF and NewsML
instances would be only indirectly available, by means of some sort of
syntactic transformation into the form of a syntactic topic map, which would
then, in turn, be parsable as a topic map
and made available to topic map applications. The extra overhead and
inconvenience of this transformation would be a barrier for RDF and NewsML
instances.
Unlike the first approach, the second approach will be applicable to any number
of notations, although the ISO 13250 standard would only actually apply the
approach
to the two syntaxes. The second approach is more ambitious in the sense
that it requires that the underlying foundational model be made explicit, and
it will make topic map applications far more ubiquitous and omnivorous over the
long term.
The Difference Between Topic Map Syntax and Topic Map Information
------------------------------------------
The structure of the topic maps that are represented for interchange in either
the existing HyTime-based syntax of 13250, or in the newly-contributed XTM syntax,
is *not* identical to the syntactic structures of the documents used to
interchange them. Therefore, neither 13250-based nor XTM-based topic map
documents are "ready-to-use" by application-specific logic. In other
words, a syntactically represented topic map doesn't reflect exactly what a
topic map software application would be expected to understand from it. Before a topic map software application can
be expected
to perform its application-specific functions, generic processing -- processing
that must be performed in order to understand the topic map that an interchangeable
instance of that topic map is designed to represent -- to make the topic map
"ready-to-use".
From an economic standpoint, there are significant advantages in using a
distinct software module that implements this generic processing, commonly
called a
"topic map engine" or a "topic map parser". We urge that
the term "topic map parsing" be reserved to mean all of the aspects
of "topic map processing" that are required to be done by all topic
map software that takes, as input, interchangeable topic maps that
conform to either the HyTime-based or XTM-based syntaxes. We urge that
the term "topic map processing" be used generically, so that it can
be used to refer to any kind of processing, including both topic map parsing
(as just defined) and application-specific
processing of ready-to-use topic maps.
Four rules must be applied by all topic map parsers:
-- the subject-based merging rule
-- the name-based merging rule
-- the node-demander rule
-- the no-redundancy rule
These rules are already implicit in 13250. We propose that 13250 should
emphasize their definitions and to explain their ramifications. These explanations
will be invaluable to users of the standard who need to create conventions for
the understanding of instances
of various (both ISO and non-ISO) notations as sources of topic map
information.
We urge that 13250 should fully explain and constrain the topic maps parsing
process, but only to the extent of describing the rules and goals of the
parsing process, and how these rules and goals are to be applied in the case of
each of the two syntaxes. For
the Topic Maps software industry, this is the least-constraining approach that
is consistent with 13250's goal of facilitating universal and accurate understanding
of Topic Maps information. This approach allows software vendors to
compete on the grounds of
product differentiation, without unduly increasing the cost of merging
disparate topic maps emanating from multiple, differently-specialized software applications.
Two Underlying
Models Have Been Proposed
------------------------------------------
Two different underlying models, both expressed in terms of how XTM instances
should be understood by topic map parsers, have been contributed to the discussion.
Both deserve serious attention.
- An "XML Infoset"-like model, called "A Topic Map
Data Model", has been proposed by Lars Marius
Garshol.
- A "Processing Model for XTM 1.0" has been proposed
by Michel Biezunski and Steven R. Newcomb.
The two proposals do not necessarily contradict each other, and the advantages
and drawbacks of each of them should be studied.
The underlying model that will be adopted by ISO must clarify how specific
applications of Topic Maps can be defined and identified.
The documents that are available for study include:
- Lars Marius Garshol, "A Topic Map Data Model -- An
infoset-based proposal",
http://www.ontopia.net/topicmaps/materials/proc-model.html
- Michel Biezunski and Steven R. Newcomb,
"Topicmaps.net's Processing Model for XTM 1.0,
version 1.0.1" [now sometimes called "PMTM4"],
http://www.topicmaps.net/pmtm4.htm
Other materials offer help in understanding PMTM4:
- Biezunski/Newcomb, "The Structure of Topic Maps
Foundations," http://www.topicmaps.net/struct.htm
- Biezunski/Newcomb, "A Topic Maps Graph in XML,
http://www.topicmaps.net/simpleTMGraph3.htm and
http://www.topicmaps.net/simpleTMGraph3.dtd.
- Biezunski/Newcomb, "An API to a Topic Maps Graphs
in XML", http://www.topicmaps.net/TMGraphAPI3.htm
and http://www.topicmaps.net/TMGraphAPI3.dtd
The decisions that will be taken on these issues will influence the work that
need to be done to complete the work in progress for a topic map query language
as well as the one for a topic map constraint language.
We encourage the members of the ISO working group WG3 to read these documents
and to send questions and comments to the newly created mailing list for discussion.
(The subscription server is http://www.isotopicmaps.org/mailman/listinfo/sc34wg3 )