TITLE: | Guide to the topic map standards |
SOURCE: | SC34 |
PROJECT: | ISO 13250 |
PROJECT EDITOR: | M. Biezunski, S. Newcomb, and M. Bryan |
STATUS: | |
ACTION: | For information |
DATE: | 2002-06-23 |
DISTRIBUTION: | JTC1, SC34 and Liaisons |
REFER TO: | |
SUPERSEDES: | N278 |
REPLY TO: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Bldg. 9113, M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-18964 Network: [email protected] http://www.y12.doe.gov/sgml/sc34/ ftp://ftp.y12.doe.gov/pub/sgml/sc34/ Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 11 West 42nd Street New York, NY 10036 Tel: +1 212 642 4976 Fax: +1 212 840 2298 Email: [email protected] |
This document is a guide to the current topic maps standardization activities. It describes what is currently being done, the problems that need to be solved, and how those problems came to be. (In the opposite order, for ease of understanding.)
It is hoped that this guide will enable outsiders to the process to understand what is happening, and make it easier for them to contribute to the process.
The topic maps work started out within the International Organization for Standardization (ISO), in a part of it today known as SC 34 (SC is short for subcommittee). This subcommittee works with SGML, DSSSL, HyTime, font standards, topic maps, the new XML schema language framework called DSDL, and other things. SC34 is divided into three working groups (WGs), and the topic maps work is done by WG3.
The first substantial result of the topic maps effort was ISO 13250:2000, an ISO standard that defined a syntax for topic maps. This syntax was an SGML DTD, which used the ISO 10744 HyTime standard for linking and addressing, and so the syntax is known as HyTM (short for HyTime Topic Maps). When HyTM was completed, there were three known issues with the syntax.
In order to resolve these issues and adapt topic maps to the web the TopicMaps.Org organization was set up to create a new topic map syntax based on XML and URIs. The syntax TopicMaps.Org created is known as XTM (XML Topic Maps), and solves the problems described above. Today, the HyTM syntax is rarely used, as most people use XTM, precisely for these three reasons.
In October 2001 the XTM DTD was accepted into ISO 13250, and so the second edition of ISO 13250 now contains two syntaxes: HyTM and XTM.
Some problems remain, however. The current ISO 13250 defines two interchange syntaxes (XTM and HyTM), but does not explain how they relate to one another. There are a number of non-trivial differences between the syntaxes, which is what makes this a problem. For example, the structure of topic names is different in the two syntaxes. In HyTM the structure of names is as shown below.
<topname scope="..."> <basename scope="...">...</basename> <dispname scope="...">...</dispname> <sortname scope="...">...</sortname> </topname>
In XTM, however, the structure is as shown below.
<baseName> <scope>...</scope> <baseNameString>...</baseNameString> <variant> <parameters>...</parameters> <variantName>...</variantName> </variant> </baseName>
The problem is how to relate display and sort names to variant names, and also how the different ways to specify scope match up. This is just one example of the differences between the two syntaxes, and given these differences, it is not obvious how to map between them. This is a problem, since implementors are likely to choose different approaches, and this is likely to cause interoperability problems.
Another problem is that both syntax specifications in the current ISO 13250 fail to specify what implementations are to do in a number of situations. The basics of what implementations are supposed to do are clear, but there are a number of places where the specifications are not clear on what is supposed to happen. In some of these cases developers have interpreted the specification text differently, and this causes interoperability problems. If different implementations interpret the same topic map differently topic map applications may only work with a single implementation, which defeats the purpose of having a standard in the first place.
ISO SC34 has also resolved to create two new topic map standards:
Both of these standards need to explain how the constructs in them are evaluated, but the existing ISO 13250 does not provide a suitable basis for such definitions. For example, when TMQL defines the "find all base names of topic X in scope Y"-operator it needs to explain carefully and formally what that operator does. This could be done in terms of the XTM syntax, but it would then be difficult to see how to apply it to the HyTM syntax. The explanation would also become very involved, as XTM provides many different ways to express the same thing, and merging of topics within the topic map must be performed before queries can be done.
So while the community is generally satisfied with the two syntaxes, their specifications are in need of improvement on three counts:
ISO SC34's solution to this is the topic map data model work that was started in May 2001, and is now beginning to produce tangible results, in the form of N0298R1 and N0299. (See also the SAM home page.) The TMQL and TMCL work is currently waiting for the data model work before continuing, as both depend on the outcome of that work.
ISO SC34's current plan is to revise ISO 13250 into a multi-part standard that resolves the problems described in the previous section. A key part of this new edition of the standard will be what is known as the Standard Application Model (SAM), a formal data model for topic maps. This model will be based on the same formalism as the XML Information Set. It will define the allowed structure of topic maps, as well as how to perform key operations such as merging and duplicate removal. The SAM is what will allow SC34 to solve the problems with the interpretations of the specifications, relate HyTM and XTM to one another, and create a foundation for TMQL and TMCL.
The problem with the interpretation of the ISO 13250:2000 and XTM 1.0 specifications will be solved by writing new specifications for the HyTM and XTM syntaxes based on the SAM. The new versions of the syntax specifications will describe how to build an instance of the SAM model from a document in a given syntax, but will not change the syntaxes themselves. That is, they will say such things as "for each <topic> element in the document, create a topic item", "for each <baseName> child of the <topic> element, create a base name item and add it to the [base names] property of the corresponding topic item," and so on.
This will be done more formally than in the examples above, and in a way that leaves much less room for interpretation. Rewriting the syntax specifications in this way will also solve the problem of how to relate the XTM syntax to HyTM, and vice versa. The SAM will now serve as a common point of reference for the two syntaxes, and comparison of parts of the syntaxes can be done by comparing the SAM models they create.
This solution will work even for new topic map syntaxes, should any new syntaxes be created in the future, and it provides a way to relate non-standard topic map syntaxes (such as LTM and AsTMa) to the standard ones. It also provides a way to make mappings from syntaxes that do not directly represent topic maps, but closely related information, such as NewsML and XFML.
The SAM provides a much more suitable basis for TMQL and TMCL, since it unites the different syntaxes and provides a much more convenient basis for operator definitions. Defined using the SAM the "find all base names of topic X in scope Y"-operator would become something like "traverse the [base names] property of topic item X and return all base name items whose [scope] property contains topic item Y". (In practice the definition is likely to be somewhat different, but this is the basic idea.) TMQL and TMCL will then also be applicable to any topic map syntax that has a mapping to the SAM model.
Although the new specifications will be clearer than the previous versions it will still be necessary to verify that implementations actually do conform to the specifications. This is best done by creating a conformance test suite, much like those already created for XML and XSLT. It is easy to create a set of topic map documents in the XTM and HyTM syntaxes, but harder to define what their correct interpretation is.
One way to do it is to create a so-called canonical syntax. In this syntax, every logically equivalent topic map would be represented as exactly the same sequence of bytes. This means that in order to see how a topic map engine interprets an XTM file, one could import that file into the engine, and then export it using the canonical syntax. The test suite could then consist of a set of XTM and HyTM documents with their corresponding canonical representations, and conformance testing could be automated.
The new ISO 13250 standard is going to contain just such a Canonical Topic Map syntax. It is expected that a conformance test suite will be developed, either within OASIS or within ISO, once the necessary infrastructure is in place. There also exists an early proposal for such a canonical syntax.
The new ISO 13250 will also include a model known as the Reference Model, which is a more abstract graph model of topic maps. In this model, names and occurrence resources turn into nodes on the same level as topics, and they are related to their topics using an association-like structure of nodes and arcs. The result is a model that uses fewer constructs than the SAM, and which can be extended without changing the metamodel.
The Reference Model provides a mechanism for explaining the relationships between different knowledge representations, such as topic maps, RDF, and KIF. This will make it easier for topic maps to interoperate with these other knowledge representations.
It is planned that the SAM part of the standard will include a normative mapping of the SAM to the Reference Model. The TMQL and TMCL standards will thus relate to the Reference Model through the SAM. Obviously, it is very important that the SAM and the RM are consistent, and much work will go into ensuring that this is the case.
Below is shown a conceptual diagram of the relationships between the different parts of the new ISO 13250, as well as TMQL and TMCL:
The parts of the new ISO 13250 standard will be:
There is currently no clear timeframe for the finalization of these specifications.
In order for topic maps created by different parties to merge correctly it is crucial that these parties use the same identifiers for their topics. This is unlikely to happen by itself, however, and therefore three Technical Committees (TCs) have been formed within OASIS, in order to work on something called published subjects. These are URIs and descriptions for concepts considered important by some publisher.
The three OASIS TCs are:
The published subjects activity within OASIS will layer on top of specifications produced by ISO SC34, and will not in any way interfere with what SC34 is doing.