ISO/IEC JTC 1/SC 34N0934
ISO/IEC JTC 1/SC 34
Information Technology --
Document Description and Processing Languages
TITLE: | Revised 13250-6 Topic Maps -- Compact Syntax -- Issues |
SOURCE: | Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh |
PROJECT: | WD 13250-6: Information technology - Topic Maps - Compact syntax |
PROJECT EDITOR: | Mr. Lars Heuer; Mr. Gabriel Hopmans; Dr. Sam Gyun Oh; Mr. Steve Pepper |
STATUS: | Draft |
ACTION: | Review |
DATE: | 2007-11-16 |
DISTRIBUTION: | SC34 and Liaisons |
REPLY TO: |
Dr. James David Mason (ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada) Crane Softwrights Ltd. Box 266, Kars, ON K0A-2E0 CANADA Telephone: +1 613 489-0999 Facsimile: +1 613 489-0995 Network: [email protected] http://www.jtc1sc34.org |
Revised CTM - Issues for the ISO meeting Kyoto 2007
Item identifier syntax
For item identifiers a new marker ('
) was introduced.
Action: Needs approval from WG and TMQL-editors. Related TMQL issue: tmql-item-references
Multiline comments
Multiline comments were introduced (: comment :)
.
Action: Needs approval from WG and TMQL-editors.
Reifier syntax
CTM uses a tilde (~
) to introduce the reifying topic. TMQL uses a slightly different
syntax (<~
) and uses the tilde as axis shortcut.
Action: Is the difference acceptable?
Template imports
CTM provides a mechanism to import templates. In Montreal was a request to remove this feature and to unify the functionality with the "include" directive. The editors were unable to follow that request because this unification led to problems with "prefix" declarations.
Action: Unification required or is the determination acceptable?
undef
literal
CTM introduced the undef
literal.
Action: Decide if undef
should be part of CTM. If it
belongs to CTM, the definition / semantics of undef
must be provided from TMQL.
Related TMQL issue: tmql-undef-vs-null
Angle brackets around IRIs
We introduced an "angle brackets" syntax for IRIs.
Action: TMQL should reintroduce IRIs which are embedable into angle brackets. Related TMQL issue: tmql-iri-ambiguity
Meaningful Whitespaces
CTM provides two possibilities to delimit topics: The '.' and an empty line. The latter makes the following topic declarations impossible:
neil-young - "Neil Young" # The name is detached from the topic "neil-young" created # A CTM processor would assume a topic here, because of the following empty line (album: harvest-moon, creator: neil-young) # Oh oh, the user wanted an association
Action: Do we want to keep the meaningful whitespaces or do we want to keep the '.' as topic end-delimiter only? Is the empty line a theoretical problem or does it exist in real-life? (The meaningful whitespace forces the user to keep the topic declaration (names, occurrences) close together).
Unicode escape sequences
The Unicode escape sequences are limited to string literals
Action: Would the unicode escape sequences make sense for topic identifiers, too? This would require some kind of preprocessing: First all unicode escape sequences are replaced by their codepoints. If a codepoint conflicts with the grammar, an error is issued. Only characters which are allowed at that grammar rule are allowed.
QNames vs. IRIs
CTM and TMQL may have a problem to decide if something is meant as IRI or as QName.
Given, the parser detects foo:bar
: Is that a QName or an IRI? According to RFC 3987 the
parser may interpret it as IRI and not as QName.
Action:
We enforce that
foo:bar
is interpreted as IRI unlessfoo
was previously defined as prefix.Problem: The parser would never detect undeclared prefixes, since it assumes an IRI.
We enforce that either the IRI notation or the QName notation requires delimiters.
We enforce that CTM (and TMQL) parsers are aware of official IRI schemes. If a parser detects
foo:bar
andfoo
is not an official IRI scheme, it assumes an QName. If the prefixfoo
was not declared, an error is reported.Problem: Parser must be aware of schemes. At least it must be aware of those schemes which are "rootless" (they do not have a slash after the colon (15 schemes, currently)).