ISO/IEC JTC 1/SC34 N0395
Title: |
Canonical XTM |
Source: |
Lars Marius Garshol, Steve Pepper, JTC1/SC34 |
Project: |
ISO 13250 |
Project editor: |
Steven R. Newcomb, Michel Biezunski, Martin Bryan |
Status: |
First committee draft |
Action: |
For review and comment |
Date: |
2003-04-04 |
Summary: |
|
Distribution: |
SC34 and Liaisons |
Refer to: |
|
Supercedes: |
|
Reply to: |
Dr. James David Mason |
This version:
Latest version:
Authors:
Lars Marius Garshol , Ontopia <[email protected]>
Steve Pepper , Ontopia <[email protected]>
This specification describes serialization rules and an output format for topic maps that conform to ISO 13250:200x Topic Maps: Standard Application Model. Its purpose is to enable the development of conformance test suites for topic map processors by ensuring that all processing defined therein has been performed correctly.
This document is intended to become part of the new ISO 13250 standard. For more information on this process see [tm-guide].
This is $Revision: 1.3 $.
1 Introduction
2 General serialization rules
3 General ordering rules
3.1 String values
3.2 Null values
3.3 Comparing sets
4 Ordering and serialization
4.1 The topic map item
4.1.1 Serialization
4.2 Topic items
4.2.1 Ordering
4.2.2 Serialization
4.3 Topic name items
4.3.1 Ordering
4.3.2 Serialization
4.4 Variant items
4.4.1 Ordering
4.4.2 Serialization
4.5 Occurrence items
4.5.1 Ordering
4.5.2 Serialization
4.6 Association items
4.6.1 Ordering
4.6.2 Serialization
4.7 Association role
items
4.7.1 Ordering
4.7.2 Serialization
4.8 Locator items
4.8.1 Ordering
4.8.2 Serialization
A Canonical XTM DTD
B References
This specification describes serialization rules and an output format for topic maps that conform to ISO 13250:200x Topic Maps: Standard Application Model ([SAM]). Its purpose is to enable the development of conformance test suites for topic map processors by ensuring that all processing defined therein has been performed correctly.
Logically equivalent topic maps that are serialized in accordance with this specification have the exact same byte-by-byte representation and can thus be easily compared.
The goal of this specification is not to define rules that ensure a deterministic result for all possible conforming topic maps since to do so would require a level of complexity that would be prohibitive (and perhaps even impossible). The goal is rather to define rules that allow the deterministic serialization of a subset of all possible topic maps that is large enough to enable conformance testing of all aspects of ISO 13250:200x Topic Maps: Standard Application Model.
The output format described in this specification uses a syntax that is a subset of the XTM syntax specified in [XTM]. Topic maps serialized according to this specification can therefore easily be processed by any conforming topic map processor.
Before serialization, the topic map must be processed in accordance with the requirements in ISO 13250:200x Topic Maps: Standard Application Model.
The output document must be a canonical XML document as defined in [xml-c14n]. In addition, a line feed (U+00A0) must be inserted after every end tag and likewise after every start tag of elements that have element content or are empty.
This section describes general ordering rules. Ordering rules for sets of specific item types are described in the sections for the individual item types.
String values are ordered in lexicographical order, based on UCS code point values.
Object properties with null value are considered to be ordered before properties with a value.
Before they can be compared sets must be sorted using the specific ordering rules for the item types of which they are composed. They are then compared element by element, starting from the beginning of the set until either:
1. a pair of elements is different, in which case the order of the sets is determined by the order of those elements; or
2. one of the sets is exhausted before the other, in which case the set with the smaller number of elements is considered to be ordered before the one with the greater number of elements.
Is this conformant to the XML c14n with respect to namespaces?
Should we always output an ID attribute, or only when the topic map is reified?
The topic map item is serialized as
a <topicMap>
element
with the following attributes:
·
xmlns
:
"http://www.topicmaps.org/cxtm/1.0/"
·
xmlns:xlink
:
"http://www.w3.org/1999/xlink"
·
id
:
"tm"
The topic map is serialized by first serializing all topic items in the [topics] property, and then serializing all association items in the [associations] property.
The [base locator] is used to create relative URIs for locator items as described in section 4.8 Locator items.
The following properties are ignored:
· [reifier] (redundant)
· [source locators] (not used for conformance testing)
A set of topic items is ordered by comparing the following properties in the order given:
1. [subject addresses]
2. [subject identifiers]
3. [topic names]
4. [occurrences]
5. [roles played]
In cases where the criteria given above are not sufficient to determine the order of two topic items a warning must be issued.
Note:
The criteria given above will not suffice when two topic items have no [subject addresses] or [subject identifiers] and also have an identical set of characteristics. If one or both of those topic items are referenced from another item, the results of canonicalization will not be deterministic.
Issue (cxtm-topic-roles-played-order):
The rules for ordering association role items in section 4.7 Association role items are not sufficient to make comparisons of [role played] properties in all cases.
Each topic item is serialized as follows:
A <topic>
element is output with its id
attribute set to the value
"tN", where N is the number of the topic item in order of
serialization, starting with 1. The content of the <topic>
element is constructed as
follows:
·
If any of the topic item's [subject identifiers],
[subject addresses] or [reified] properties have non-null values, a <subjectIdentity>
element is
output and its content is constructed as follows:
1.
If the [subject addresses] property is not the empty
set, one <resourceRef>
subelement is output for each locator item with an xlink:href
attribute whose value is
determined by the locator item.
2.
If the [subject identifiers] property is not the empty
set, one <subjectIndicatorRef>
subelement is output for each locator item with an xlink:href
attribute whose value is
determined by the locator item.
3.
If the [reified] property is not null, one <subjectIndicatorRef>
subelement
is output for the item ("A") that is the value of that property. The
value of the <subjectIndicatorRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topicMap>
, <baseName>
, <variant>
, <occurrence>
, <association>
, or <member>
element to which item
"A" gives rise.
· Following this, the topic item's [topic names] and [occurrences] properties are serialized, in that order, in accordance with the rules for serializing topic name items and occurrence items.
The following properties are ignored:
· [roles played] (redundant)
· [source locators] (not used for conformance testing)
A set of topic name items is ordered by comparing the following properties in the order given:
1. [value]
2. [variants]
3. [type]
4. [scope]
Each topic name item is serialized as follows:
A <baseName>
element is output. If and only if the
value of the [reifier] property is not null, an id
attribute is specified and given the
value "bnN", where N is the value of a counter that starts at 1 and
is incremented by 1 for each <baseName>
element that is output with an id
attribute. The content of the <baseName>
element is constructed as follows:
·
If the [type] property is not null, an <instanceOf>
subelement is output
containing a <topicRef>
subelement. The value of the <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that is the value of the [type] property.
·
If the [scope] property is not the empty set, a <scope>
subelement is output
containing one <topicRef>
subelement for each topic information item in the value of the property. The
value of each <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that gives rise to the <topicRef>
element.
· If the [variants] property is not the empty set, it is serialized in accordance with the rules for serializing variant items.
·
A <baseNameString>
element is output whose content is the value of the [value] property.
The following property is ignored:
· [source locators] (not used for conformance testing)
A set of variant items is ordered by comparing the following properties in the order given:
1. [value]
2. [resource]
3. [scope]
Each variant item is serialized as follows:
A <variant>
element is output. If and only if the
value of the [reifier] property is not null, an id
attribute is specified and given the
value "vN", where N is the value of a counter that starts at 1 and is
incremented by 1 for each <variant>
element that is output with an id
attribute. The content of the <variant>
element is constructed as follows:
·
A <parameters>
subelement is output containing one <topicRef>
subelement for each topic information item in the value of the [scope]
property. The value of each <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that gives rise to the <topicRef>
element.
·
A <variantName>
element is output and its content is constructed as follows:
o
If the [value] property is not null, the
element's content is a <resourceData>
element whose content is the value of the [value] property.
o
If the [resource] property is not null, the
element's content is a <resourceRef>
element with an xlink:href
attribute whose value is determined by the locator item that is the value of
that property.
The following property is ignored:
· [source locators] (not used for conformance testing)
A set of occurrence items is ordered by comparing the following properties in the order given:
1. [value]
2. [resource]
3. [type]
4. [scope]
Each occurrence item is serialized as follows:
A <occurrence>
element is output. If and only if the
value of the [reifier] property is not null, an id
attribute is specified and given the
value "oN", where N is the value of a counter that starts at 1 and is
incremented by 1 for each <occurrence>
element that is output with an id
attribute. The content of the <occurrence>
element is constructed as follows:
·
If the [type] property is not null, an <instanceOf>
subelement is output
containing a <topicRef>
subelement. The value of the <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that is the value of the [type] property.
·
If the [scope] property is not the empty set, a <scope>
subelement is output
containing one <topicRef>
subelement for each topic information item in the value of the property. The
value of each <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that gives rise to the <topicRef>
element.
·
If the [value] property is not null, a <resourceData>
element is output
whose content is the value of the [value] property.
·
If the [resource] property is not null, a <resourceRef>
element is output
with an xlink:href
attribute
whose value is determined by the locator item that is the value of that
property.
The following property is ignored:
· [source locators] (not used for conformance testing)
A set of association items is ordered by comparing the following properties in the order given:
1. [type]
2. [scope]
3. [roles]
Each association item is serialized as follows:
A <association>
element is output. If and only if the
value of the [reifier] property is not null, an id
attribute is specified and given the
value "aN", where N is the value of a counter that starts at 1 and is
incremented by 1 for each <association>
element that is output with an id
attribute. The content of the <association>
element is constructed as follows:
·
If the [type] property is not null, an <instanceOf>
subelement is output
containing a <topicRef>
subelement. The value of the <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that is the value of the [type] property.
·
If the [scope] property is not the empty set, a <scope>
subelement is output
containing one <topicRef>
subelement for each topic information item in the value of the property. The
value of each <topicRef>
element's xlink:href
attribute is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that gives rise to the <topicRef>
element.
· The [roles] property is serialized in accordance with the rules for serializing association role items.
The following property is ignored:
· [source locators] (not used for conformance testing)
A set of association role items is ordered by comparing the following properties in the order given:
1. [type]
2. [role playing topic]
Each association role item is serialized as follows:
A <member>
element is output. If and only if the
value of the [reifier] property is not null, an id
attribute is specified and given the
value "arN", where N is the value of a counter that starts at 1 and
is incremented by 1 for each <member>
element that is output with an id
attribute. The content of the <member>
element is constructed as follows:
·
An <instanceOf>
subelement is output containing a <topicRef>
subelement. The value of the <topicRef>
element's xlink:href
attribute
is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that is the value of the [type] property.
·
A <topicRef>
subelement is output. The value of the <topicRef>
element's xlink:href
attribute
is set to the concatenation of "#" and the value of the id
attribute of the <topic>
element created by the
topic item that is the value of the [role playing topic] property.
The following property is ignored:
· [source locators] (not used for conformance testing)
A set of locator items is ordered by comparing the following properties in the order given:
1. [notation]
2. [reference]
Locator items in the values of
[subject identifiers] properties of topic items give rise to <subjectIndicatorRef>
elements.
Locator items in the values of
[subject addresses] properties of topic items, or [resource] properties of
variant or occurrence items, give rise to <resourceRef>
elements.
Other locator items are ignored.
When a locator item is not ignored
and its [notation] property has the value "URI", the xlink:href
attribute of the
corresponding element is set to a value determined by the locator item's [reference]
property. That value must be a minimal URI relative to the [base locator] of
the topic map item.
Note:
Relative URIs are required in order to remove dependencies on the source locations of the input topic map.
Ed. Note:
Do we need text to cover the escaping of URIs or is this handled by SAM?
When the notation of the locator is
not "URI", the value of the corresponding element's xlink:href
attribute is set to the
concatenation of the [notation] property, ":", and the [reference]
property.
Ed. Note:
Is this a satisfactory way of handling locators that are not URIs?
<!ELEMENT topicMap ( topic*, association* ) > <!ATTLIST topicMap id ID #FIXED 'tm' xmlns CDATA #FIXED 'http://www.topicmaps.org/cxtm/1.0/' xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink' > <!ELEMENT topic ( subjectIdentity?, baseName*, occurrence* ) ) > <!ATTLIST topic id ID #REQUIRED > <!ELEMENT instanceOf ( topicRef ) > <!ELEMENT subjectIdentity ( resourceRef*, subjectIndicatorRef* ) > <!ELEMENT topicRef EMPTY > <!ATTLIST topicRef xlink:href CDATA #REQUIRED > <!ELEMENT subjectIndicatorRef EMPTY > <!ATTLIST subjectIndicatorRef xlink:href CDATA #REQUIRED > <!ELEMENT baseName ( instanceOf?, scope?, baseNameString, variant* ) > <!ATTLIST baseName id ID #IMPLIED > <!ELEMENT baseNameString ( #PCDATA ) > <!ELEMENT variant ( parameters, variantName ) > <!ATTLIST variant id ID #IMPLIED > <!ELEMENT variantName ( resourceRef | resourceData ) > <!ELEMENT parameters ( topicRef+ ) > <!ELEMENT occurrence ( instanceOf?, scope?, ( resourceRef | resourceData ) ) > <!ATTLIST occurrence id ID #IMPLIED > <!ELEMENT resourceRef EMPTY > <!ATTLIST resourceRef xlink:href CDATA #REQUIRED > <!ELEMENT resourceData ( #PCDATA ) > <!ELEMENT association ( instanceOf?, scope?, member+ ) > <!ATTLIST association id ID #IMPLIED > <!ELEMENT member ( roleSpec, topicRef ) ) > <!ATTLIST member id ID #IMPLIED > <!ELEMENT roleSpec ( topicRef ) > <!ATTLIST roleSpec id ID #IMPLIED > <!ELEMENT scope ( topicRef+ ) > |
Canonical XML Version 1.0, John Boyer, Author/Editor. World Wide Web Consortium. 15 March 2001.
ISO/IEC 13250:2002 Topic Maps, ISO, Geneva, 2002.
The Standard Application Model for Topic Maps, SC34/WG3 draft, 2003.
Guide to the topic map standardization process, Lars Marius Garshol, 2002-06-23, ISO/IEC JTC1 SC34/N0323.
The XML Topic Maps (XTM) Syntax 1.1, SC34/WG3 draft, 2003.