ISO/IEC JTC 1/SC34 N0403, 2003-04-03
Title: |
AsTMa! Language Definition |
Source: |
Robert Barta, Bond University |
Project: |
ISO 18048 |
Project editor: |
Lars Marius Garshol, H. Holger Rath |
Status: |
Individual contribution |
Action: |
For review and comment |
Date: |
2003-04-11 |
Summary: |
Specification of the AsTMa! topic map constraint language |
Distribution: |
SC34 and Liaisons |
Refer to: |
|
Supercedes: |
|
Reply to: |
Dr. James David Mason |
Robert Barta Bond University
[email protected]Copyright © 2002, 2003 Robert Barta
Constraining Topic Map (TM) instances is a necessary component of TM engineering. To make these constraints explicit we propose a Topic Map Constraint Language, AsTMa!. As such, AsTMa! is a part of the AsTMa language family which was designed to facilitate authoring, but also constraining and querying topic maps. This document provides a semi-formal definition of the language. Note that this is still an evolving project and that this document is a current snapshot only.
This document has no formal status. It is a
technical report of Bond University.
1 Introduction
2 Conventions
3
Validation Relation
4
Maplets and Maplet Denotations
5
Unconditional Constraints
6 Maplet
Patterns
7
Quantified Constraints
8 Constraints
9 Functions
10 Syntax
Extensions
11 Moderators
12 Summary
13
Acknowledgements
A
Appendix: Language Skins (Non-normative)
B
Appendix: Predefined Functions (Normative)
C
Appendix: Syntax (Normative)
D
Appendix: External Functions (Non-normative)
E
Appendix: Directives (Normative)
F
Appendix: References (Non-normative)
The objective of any TMCL (Topic Map Constraint Language) is to allow to formally specify rules for Topic Map documents.
In some respect a TMCL has a similar purpose as schema languages for relational databases and for XML applications. In all cases the constraint is supposed to characterize the possible values of the data (or the document). As Topic Map data may not only reside in documents containing Topic Map notated text but may also be generated from other data sources, we abstract away from documents and only regard Topic Map instances.
Constraints behave like type definitions: Given a constraint and a particular instance of a topic map it must be well-defined and computable in a reasonable time whether the instance validates against the constraint, or not. In the case of Topic Maps a constraint language must allow to specify
· a vocabulary, i.e. a set of topics,
· their respective types and relationships between them, and
· patterns which must, may or must not occur within a given topic map.
For our purposes we collectively use the term ontology to encompass all the above aspects of a particular knowledge domain. Accordingly, AsTMa! can be used as an ontology specification language.
As AsTMa! is making use of the authoring language AsTMa= it is mostly line oriented. It extends the authoring language by introducing variables which can be bound to text patterns similar to regular expressions and also variables which bind whole structures like in Prolog. We also peruse concepts of other languages e.g. the quantifiers 'forall' and 'exists' from logic-oriented languages and path expressions from XPath to add expressitivity or simplify the notation.
Following the general theme of the AsTMa* language family, AsTMa! was designed not so much as an orthogonal language, but more bias toward human authoring.
FIXME: Maybe make design goals more explicit?
FIXME: use of regexp
FIXME: use of grammars (EBNF)
FIXME: use of relational algebra
Given a particular map M, one may now ask whether M conforms to an ontology specified by a constraint O. Formally, we write O ||- M if M validates against the ontology O.
In practical settings, though, we will have to validate a map against several ontologies. Examples are situations where a customer has an ontology different from that of a vendor, or, when a company-wide ontology is refined and extended by a branch ontology. In these cases, topic map instances will have to comply with several such constraints. Technically this means that we will have to find ways to merge, i.e. combine constraints by logical operators having the following properties:
O1 ^ O2 ||- M <=> O1 ||- M ^ O2 ||- M
and, because of symmetry
O1 v O2 ||- M <=> O1 ||- M v O2 ||- M
With these requirements set, in the following we will specify the semantics of AsTMa!. For every map and for every AsTMa! expression this validation relation will be defined and can so be checked automatically.
We semi-formally define the relation ||- bottom-up. We start formalizing atomic parts of a map, maplets, and extend these to patterns which can be matched against topic map instances. Finally, we add quantifiers and boolean operators to allow the construction of rules.
To formally capture when exactly a topic map instance conforms to a constraint, we introduce maplets (Fig.1). A maplet is a part of a map, consisting of a single association and all topic characteristics of the topics mentioned in that association. This includes the association type, the association scope and all players and roles in the members.
Fig.1 Maplet connecting Topics |
To also allow a single topic to be a maplet, we introduce a trivial association of type sum-ergo-sum and assume that any topic is a player in it:
(sum-ergo-sum)
topic: any-topic-here
The topic characteristics are represented by a tuple (type, scope, value).
The type is indicated by a URI and specifies whether the characteristics is a baseName, an occurrence with a topic reference or inline data, etc. This is done using the predefined PSIs defined in [XTM].
The scope is another URI indicating the one topic which scopes the characteristic (We only use one scoping topic) in AsTMa*. For the unconstrained scope we will use the PSI defined in [XTM].
The value is a string comprising the value
of the characteristics.
We will use maplets as atomic constructs. Obviously, any map can be constructed from these maplets or, conversely, can be fragmented into maplets.
Merging Maplet to build a Map |
As operator + to combine maplets we simply use the merge operator as defined in the XTM standard[XTM00]. Consequently, any map M can be broken down into maplets m1, ...,mN:
M = + Nn=1 mn
We call the set of maplets GM
= {m1, ..., mn} for a particular map M the generator
of M. M is a submap of M', M << M', if GM << GM'.
Obviously, the set of all submaps of M, 2M, is a lattice.
We distinguish between maplets and maplet denotations which are text fragments in AsTMa= whereas maplets themselves are an abstraction of a representation an TM processor would use after having processed the maplet denotation (this is comparable to SAM[SAM03]). Thus, a maplet may be represented by a number of different maplet denotations.
We formalize the set of all maplet denotations as M=.
Every maplet denotation is an unconditional constraint. Unconditional constraints only help to define the vocabulary aspect of an ontology. Per-se they do not impose any restrictions on a topic map structure and they only set the stage for more complex constraints. Nonwithstanding, an unconditional constraint is a constraint.
Any given map M satisfies any given unconditional constraint c= is element of M=:
c= ||- M
It is straightforward to define a merge operator between unconditional constraints (we will later extend this to general constraints). If c1 and c2 are unconditional constraints, then c1 + c2 is also an unconditional constraint defined via
c1 + c2 ||- M <=> c1 ||- M ^ c2 ||- M
Again, as any map satisfies any unconditional constraint, that map also satisfies any merge of unconditional constraints. This also implies that any AsTMa= denotation A is an AsTMa! constraint, although all maps M will trivially conform to it:
A ||- M
Maplet patterns themselves are no constraints but we will need them to build compound constraints later. Basically, maplet patterns are maplet denotations in AsTMa= with some of the text being replaced by variables and regular expressions.
Example:
$t (tutorial)
bn: /tutorial/i
oc: m|http://|
Here we simply reuse the matching semantics of regular expressions to define whether a particular maplet matches a maplet pattern. The maplet denotated by
astma (tutorial)
bn: AsTMa Tutorial
oc: http://astma.it.bond.edu.au/
matches the maplet pattern from above, whereas the maplet denotated by
astma (tutorial)
bn: AsTMa Tutorial
oc: no URL here
does not.
Before we define more expressive pattern matching we need variables which can be bound to values during the pattern matching process described below. After that we can define patterns for an open and closed expectation we could have on matching maplets.
FIXME unification-not: Maybe make a
note that his is only about matching, not unification or reasoning. Unification
would be necessary whenever two ontologies are going to be compared.
AsTMa! distinguishes between three different kinds of variables:
Reference Variables
Reference variables can only have topic identifiers as values and can match against topic identifiers.
Regular Expression Variables
These variables always appear as part of an Extended Regular Expression. They can only contain strings and can be matched against strings
Structural Variables
These variables are bound to whole submaps.
Since it is clear from the context what kind of variable is meant, there is no need to distinguish between them syntactically.
All variables are prefixed with the character $ or @.
The name itself must follow the pattern \w[[:alnum:]-_]+.
While variables of a different kind inhabit different namespaces, it is
erroneous, though, to have within one scope two or more variables of different
kinds but with the same name. This is mainly thought as a safety guard against
casual errors.
The scope of a variable is determined by the syntactic form of a constraint and will be defined later. Variables are implicitely declare by simply using them. In it's scope any further occurrence of a variable implies that all these occurrences must be bound to the same value during one match.
A notable deviation from the usual rules is that
variables with different names must be bound to different values. This
somewhat unorthodox interpretation helps to keep constraints concise and can be
overridden if necessary by using the =~
operator, as in $a =~ $d.
During the matching process patterns will be matched against maplets. In this process variables within the patterns will be assigned values from the maplets (variable binding).
If - during such a match - an unbound variable is matched against a maplet value, that variable will be bound to that value for this match. This binding will remain the same for a single match; for such a match the variable will not change its value (immutable). Thus the set of variable bindings is characterizing a match.)
If - during a match - an already bound variable
will be matched against a maplet value, the value of the variable and that in
the maplet will have to be identical. Otherwise the match would not succeed.
During the matching process variables are assigned values. In some cases we expect the variable to hold exactly one value. These variables are those which are prefixed with $. In other cases, though, we allow one variable to hold several values which are acquired during the matching process. This is indicated by the prefix @.
Example:
bn: $bn # $bn will get one value during one match
some-role : @roles # @roles will get several values during one match
Note that these variables are technically bags and not sets as they may contain a certain value more than once. Thus there is no inherent sequence. It also implies that two or more occurrences of a particular plural variable are compared as bags.
If a singular variable $v is to be matched against a particular constant, then the notation $v =~ value can be used. If $v should be matched against some other $w, then $v =~ $w can be used.
For a plural variable @v the value must be a bag for which we will use the
notation @v =~ (value1, value2,
...). To make two bags be matched to the same values, we simply use
@v =~ @w.
Reference variables can appear anywhere in a maplet pattern where a topic reference can appear. During the matching process the variable will be bound to a topic reference (or to several topic references for plural variables).
Example:
$car (ferrari)
In cases where we do not care about the value to
be matched we can also use the wildcard *.
(Note, though, that this is no wildcard in the sense of regular
expressions.)
A regular expression (RE) can appear anywhere in a maplet pattern where arbitrary text is allowed, that is in occurrences, basenames and subject indicators. In the pattern matching process a corresponding maplet part of the map will be matched against regular expressions in a maplet pattern.
The regular expressions themselves follow the Perl syntax and are enclosed by the '/' character as in /.*/. As in Perl any other character can be used if the pattern starts with the character 'm' and the following character is the same as the last non-space character on this line.
Example:
bn: m|regexp pattern here|
In the above pattern we have introduced one regular expression using '|' as pattern terminator. This pattern will be matched against the string of a base name characteristic.
The use of Extended REs also allows to capture matched strings into variables. Here we make use of the facility to add a capture name for a text pattern using the notation (?var-name:pattern).
Example:
bn: /(?:b=.*)/
Here we introduced a variable b which will capture any basename string of a matched maplet.
For notational convenience text variables can also written without a pattern in the $ notation if that is simply /^\S.*\S$/ (any string except with no leading or trailing blanks). So, (?var-name:^\S.*\S$)) is equivalent to $var-name.
FIXME explicit-var-match: Is it necessary to add the syntax $v =~ /P/?
If we are not interested in the value itself and just need a placeholder, we can also use the wildcard character *. As now no name is associated with a matched value, such variables do not take part in the pattern matching process.
Example:
bn: *
In the course of matching a maplet against a pattern various components of topics (such as baseNames, occurrences, etc.) and associations will have to be matched. This section contains the rules governing this process.
The individual components of a maplet pattern are:
instance-class patterns
With such a pattern an instance-class relationship is constrained, as, for example in $car (ferrari). This also includes instance-class constraints for associations.
(scoped) topic characteristics
Topic characteristics are constrained by using notations like bn: $basename or oc @ S ($type) : /http:/. Association roles are not regarded as characteristics here.
association role
Components like these constrain players and the roles in associations, eg. as in author: $a
For the following we assume that a given maplet should be matched against a component of a given maplet pattern. When a match is successful and how value bindings will occur is covered by this list.
· (topic instance-of:)
If a maplet pattern component is of the form
topic (type)
then a maplet set matches if there is a maplet of type instance-of where topic and type play the roles instance and class, respectively; or there is a maplet of type instance-of where topic plays the role instance and some other topic T plays the class, whereby T is a sub-type of type.
If topic is a variable and it is already bound to a value, the match will only succeed if that value plays an instance role in an instance-of association and the topic type matches like above. If it is not bound, it will be bound to any topic id in the maplet set.
FIXME: Not perfectly clear. And not overly elegant.
If the type is a variable and it is bound to a value there must be an association like outlined above between the type and the topic to make the match succeed. If the variable is not bound it will be bound to any value where such an association exists.
· (association instance-of:) If a maplet pattern component is of the form
(type)
then a maplet set matches if there is a maplet of type type. If type is a variable $v then it depends whether $v is already bound to a value or not. If it is then there must be a maplet in the set of exactly this type to make the match succeed. Otherwise, $v will be bound to a type of any maplet in the set.
FIXME: scoped association to be added.
· (basename constant:) If a maplet pattern component is of the form
bn: some-text
then a maplet matches if there is a topic characterstics of type 'baseName' which exactly matches the given text. All leading and trailing blanks are ignored, so is the scope.
· (basename pattern:)If a maplet pattern component is of the form
bn-or-oc-or-in: (?v=P)
with a variable v and a pattern P, then a maplet matches if a topic characteristic is of type 'baseName', 'occurrenceRef', or 'occurrenceData', respectively, and the according value matches the P. The scope value is ignored. If v is not yet bound to a value, it will be bound to the complete value of the characteristic (not only the part of the string which was matched). If v is already bound to a value, then that value must be identical to that of the base name string. The pattern P is ignored in this case.
· (characteristic scope:) If a maplet pattern component is of the form
bn-or-oc-or-in @ scope : ...
then additional to the rules for that particular topic characteristic type the scope is matched:
o If the scope is *, it will be ignored.
o If the scope is a constant S, then the topic characteristic in the maplet must have this as scope value.
o If the scope is a reference variable $s, then it depends whether $s is already bound to a value. If it is, then this value must be identical with the scope value in the topic characteristic. If not, then $s is bound to that value.
· (characteristic type:) If a maplet pattern component is of the form
bn-or-oc-or-in (type) : ...
then additional to the rules for that particular topic characteristic the type is matched:
o If the type is *, it will be ignored.
o If the type is a constant T, then the topic characteristic in the maplet must either have this as value or any other type which is a direct or indirect sub-type of T.
FIXME: match-global-view: need global view to other maplets here.
FIXME: Need somewhere definition of sub-type.
o If the type provided in the component is a reference variable $t, then it depends whether $t is already bound to a value. If it is, then this value must be identical with the type in the topic characteristic. If not, then $t is bound to that value.
FIXME: define-uris: need to define URIs for all of these types?
· (association role:) If a maplet pattern component is of the form
role : ...
then a maplet set will match if there is a maplet which contains the role given. If the role is not a constant, but a variable, then it depends whether this variable is already bound. If it is ... Otherwise the rule for matching the association player must be applied.
FIXME: this has to be factored out. Really.
· (association player:) If a maplet pattern component is of the form
... : player : ...
then a maplet set will match if there is a maplet which contains the role given. If the role is not a constant, but a variable, then it depends whether this variable is already bound.....
FIXME: this has to be factored out. Really.
Otherwise the rule for matching the association role must be applied.
· (subject indicator:) If a maplet pattern component is of the form
sin : value
then a set of maplets match if
there is a topic characteristics of type 'subjectIndicator' and the value
matches.
If p is a maplet pattern, then [ p ] is a open maplet pattern. A maplet set S matches [ p ] if it satifies all of the following:
· all topic characteristic components in p can be matched to one and the same set of topic characteristics in one maplet m is element of S.
· all topic identifiers in an topic instance-of component of p match the player which is characterized by exactly the same topic characteristic set.
· All topic identifier in a association instance-of within p match the topic in that map.
FIXME: Need MUCH MUCH MUCH MORE formalism for this!!!
· All association role components in p match.
· All association player components in p match.
The maplet set can contain
additional information which has no counterparts in p.
Example: Consider the open maplet pattern
[ * (tutorial)
bn: /tutorial/i
oc: /http:/ ]
The maplet denoted via
astma (tutorial)
bn: AsTMa Tutorial
bn @ de : AsTMa Einfuehrung
oc: http://astma.it.bond.edu.au/tutorial1
oc: http://astma.it.bond.edu.au/tutorial2
would match as it has a matching
id, the type (instanceOf) matches and all characteristics (basename and
occurrence) in the pattern have a matching counterpart in the maplet. Any
additional characteristics (the basename with a different scope and the second
additional occurrence) has no influence on the matching outcome.
If p is a maplet pattern, then ] p [ is a closed maplet pattern. A maplet set S matches ] p [ if it satisfies all of matching rules of a open maplet pattern, but all topics are minimal in the sense that they do not contain additional roles and topic characteristics which are explicitely not matched.
FIXME: well could be clearer.
Example: Flipping the [] brackets in the earlier example gives us such an closed maplet pattern:
] * (tutorial)
bn: /tutorial/i
oc: /http:/ [
Now the maplet above will not
match anymore as the additional basename and the occurrence violate the maplet
pattern which allows only one basename and only one occurrence.
Quantified constraints build the core of the AsTMa! language. They finally allow to define rules which a conforming map must, may, or must not follow. To cater for that we have to introduce quantifiers and with them structural variables.
In contrast to regular expression variables and
reference variables, structural variables are bound to whole submaps (maplet
sets) when these are successfully matched against a maplet pattern. Otherwise
the same rules for variables apply for them.
Syntactically, AsTMa! follows the line-oriented paradigm and introduces also indentation to structure expressions syntactically:
· A constraint is not interrupted by an empty line or a line containing a comment.
· Constraints on the same logical level must be indented to the same column. Inconsistent indentation must be flagged by an AsTMa! parser.
Examples of valid constraints are:
# Example 1
exist $t [ * (person) ]
# Example 2, means the same, but with confusing line breaks
exists
$t [
* (person)
]
# Example 3
forall [ $t (person) ]
=> exists [ (is-employed-at)
employee : $t ] # indentation not enforced
# Example 4
forall [ $t (person) ]
=> exists [ (is-employed-at)
employee : $t ] # indentation not enforced
and # indentation important
exists [ (is-married-with)
partner : $t ] # indentation not enforced
Examples of invalid indentations are:
# Example 5
forall [ $t (person) ]
=> exists [ (is-employed-at)
employee : $t ]
and # indentation invalid
exists [ (is-married-with) # indentation invalid
partner : $t ]
If p is an open or closed maplet pattern and $v is a structural variable, then exists $v p is a (quantified) constraint.
If a variable is not needed as it is not necessary to refer to the matched submap in other parts of the constraint, then the variable can also be omitted as in exists p. In this case an internally generated variable will be used.
If c is such a constraint, then var(c) is the set of variables mentioned in that constraint. The scope of these variables starts with their first appearance and reaches until the end of the maplet pattern.
Informally, a map M conforms to th constraint exists $v p, if there is at least one submap m << M matching p.
More formally, we define a function b which computes all variable assignments for the successful matches. In this process only the smallest submaps of M are regarded which can satisfy p.
If var(c)
is the set of variables involved and we introduce an arbitrary order on these
variables (e.g. using alphanumeric sorting), then every single binding is an
ordered tuple with one value for each variable. For $v the matched submap will be used. Technically, b (exists $v p, M) returns thus a
relation. In the following this allows us to base the semantics on relational
algebra.
Some constraints only should apply to specific submaps of the map under consideration. To specify such constraints we have to first select these submaps and check then the rest of the constraint with each of them.
For variables this means that all variables mentioned in the condition are visible also in the rest of the constraint. This defines the scope of variables.
Quantified constraints can also be combined using the usual boolean operators AND, OR and NOT although some restrictions apply to reduce the expressiveness of the language.
Example: The following is a valid quantified constraint:
forall $t [ $p (person) ]
=> exists $t [ bn @ unconstrained-scope: * ]
and
exists [ (is-employed-by)
employee : $p ]
whereas the following is not:
forall $t [ $p (person) ]
=> not forall $t [ bn @ unconstrained-scope: * ] # invalid
=> exists [ (is-employed-by)
employee : $p ]
If c is a quantified constraint, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c is a (conditional) quantified constraint. As with existence quantified constraints, the structural variable can be omitted if it is not needed. In that case an internally generated variable will be used.
Example X: Let us consider
forall [ $c (car) ]
=> exists [ $c
bn: /ferrari/i ]
Such a conditional constraint would enforce that all submaps of the map under consideration which include a (direct or indirect) instance of a car also include the some topic such that it has base name characteristic which's value matches the word ferrari.
More formally, b is a function computing the assignment of values to variables. b (forall $v p, M) is computed as all tuples which have values for their corresponding variables in such a way, that p is successfully matched with a submap of M. In this case only the biggest submaps which match are considered.
For conditional constraints the bindings are computed as
b (f => g, M) = b (f, M) |><|r b (g, M)
whereby |><|r is the natural right join as known from the relational algebra.
Example X: In the following example we first try to match cars. For all of these matched maplets there must exist another maplet in the map matching the is-property-of association.
forall [ $t (car)
bn: /ferrari/i ]
=> exists [ (is-property-of)
owner : millionaire
object: $t ]
In these cases we will often have
to refer to topics, associations or parts thereof. This can be done with
variables as above. For every topic of type car
which we find in our map, its topic identifier will be bound to the variable $t. Our expectation is that the same topic
plays the role of the object in
an is-property-of association.
If c is a quantified constraint, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => not c is a (conditional) quantified constraint.
Intuitively, we check first all submaps matching the pattern and then have to ensure that none of these submaps match successfully with c. More formally, we define b to be:
b (f => not g, M) = b (f, M) - pivar(f) b (f => g, M)
by simply deducting those
bindings from b (f, M) which
follow f => g.
If c and d are quantified constraints, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c and d is a (conditional) quantified constraint.
The and takes not only care that both constraints c and d have to be matched by every submap matching p. In this case all variables are within one scope coercing appearences in c and d to have the same values. This is formalized by using the natural join on the individual bindings:
b (f => c and d, M) = b (f
=> c, M) |><| = b (f => d, M)
If c and d are quantified constraints, p is an open or closed maplet pattern and $v is a structural variable, then forall $v p => c or d is a (conditional) quantified constraint.
In constrast to the conjunction variables in c and d are completely independent. The individual branches can be evaluated separately:
b (f => c or d, M) = b (f
=> c, M) union b (f => d, M)
Finally, we can build constraints. First we define that any quantified constraint is already a constraint. The validity of such constraints is defined as
exists $v p ||- M <=> b (exists $v p, M) is not empty
and if c = forall $v p then
c => d ||- M <=> b (c, M) = pivar(c) (b (c => d, M))
In the first case a map M is valid if there is at least one binding for the variables in b. In the second case all the bindings which forall $v p has in M must also be bindings for forall $v p => d whereby the projection operator pi only filters out those parts of the tuples which are relevant for c.
To build more complex constraints, we introduce
boolean constants and operators to combine constraints.
For formal reasons we define true and false as constraints. All maps satisfy true, no map satisfies false:
true ||- M for all maps M
false ||- M for all maps M
In a next step we define the not operator. It allows to cover situations in which we are interested not to occur in conforming maps. If c is a constraint, then not c is defined via:
( not c ) ||- M <=> not (
c ||- M )
If c1 and c2 are constraints then c1 and c2 is a constraint, whereby we define
c1 and c2 ||- M <=> b (c1, M) |><| = b (c2, M) is non-empty
Obviously, we use the natural
join to define the conjunction of tuple sets. This implies that all variables
have to be shared.
Also c1 or c2 is a constraint defined by
c1 or c2 ||- M <=> c1 ||- M v c2 ||- M
Here all variables are
independent from each other; the clauses can be evaluated separately.
AsTMa! also offers language constructs to define new (boolean) functions. Their primary purpose is to keep constraints concise and to reuse particular domain knowledge.
Functions can be declared at any place within an AsTMa! instance and can be referenced via their name. The names follow the pattern /\w[[:alnum:]-_]+/.
function-definition -> 'function' function-name '(' [ formal-param-list ]')' 'returns' constraint
param-list -> < ',', parameter-name [ ':' value ] >
Functions can have parameters. Formal parameters are listed within a () pair in a comma separated list and their names also follow the pattern /\w[[:alnum:]-_]+/. If for a parameter there exists a default value it is specified after the name following a colon (:). Optionally these values can be enclosed by "" to avoid ambiguity.
A function is referenced via its name followed by a list of actual parameters. The assignment of actual parameters to formal parameters is processed positional. If the actual parameter list is shorter than the formal parameter list, the defaults will be used. If there are no defaults, an error will be flagged.
FIXME: no recursion, this is more macro expansion.
Example: The following defines a function to control whether a particular topic T is used as a topic or association type:
function is-used-as-type (T)
returns
exists [ * (T) ]
OR
exists [ (T) ]
Outside the function it can be referenced like in the following:
is-used-as-type (elephant)
FIXME: Should functions only be for top-level constraints?
An AsTMa! processor must understand the predefined functions listed in Appendix ?.
In this section we extend the AsTMa! syntax
without adding more expressiveness. The added constructs serve as convenience
and help to keep real-world constraints concise.
When using the exists operator often individual situations are requested as in
exists [ $c (car)
bn: /ferrari/i ]
AND
exists [ (is-owned-by)
owner : rho
property : $c ]
If the exists keyword is not followed by a structural variable, the individual patterns can be combined into one:
exists [ $c (car)
bn: /ferrari/i
(is-owned-by)
owner : rho
property : $c ]
FIXME: does this work for forall?
What about variables?
In some case it is necessary to constrain that topics should not have particular characteristic. Following the AsTMa! core language this can be done by not allowing the topic to exhibit that characteristic:
not exists $t [ bn @ de : * ]
Effectively, we disallowed all topics to contain base names scoped to the german language. The same can also be achieved with the following abridged syntax:
forall [ ! bn @ de : * ]
The internal ! operator works for occurrences (oc), inline text occurrences (in) and basenames (bn).
If two or more negations are used then the not conditions are ANDed:
forall [ ! bn : /stopword/
! oc : m|http://www.stop.com/| ]
# is equivalent with
NOT exists $t [ bn : /stopword/ ]
AND
NOT exists $t [ oc : m|http://www.stop.com/| ]
Conversely, if we would want to constrain that a particular basename does not occur in some topics we notate:
exists [ ! bn @ de : * ]
which is equivalent with
not forall [ bn @ de : * ]
FIXME: This only can work at the outest level? Allow it?
FIXME: should we allow bn @ ! de : /..../ or oc (!tutorial) : ....?
Also regular expressions within maplet patterns can be negated as the following example shows:
forall [ bn : !/ferrari/ ]
Here we specified that all topics must never have a basename matching the brand above. As such, this is equivalent to
not exists [ bn : /ferrari/ ]
# is equivalent with
forall [ !bn: /ferrari/ ]
# is equivalent with
forall [ $bn !~ /ferrari/ ]
The internal ! operator can also be used within maplet patterns constraining associations. The following constraint
forall [ (is-some-assoc)
! role : * ]
is equivalent with
forall $a [(is-some-assoc) ]
=> not exists $a [role : * ]
Again, multiple !'s
with a pattern are resolved by AND-ing the individual conditions.
In some situations a constraint might include a particular quantity, a lower or an upper bound specifying how often a pattern should occur in a conforming map. For this purpose we extend the exists quantifier and allow a range modifier like below:
exists {4} [ (is-member-of)
group : string-quartett
member: $p
$p (musician) ]
The range modifier {4} sets a lower and the upper limit of the number of matches within a conforming map to 4. If the limits differ, then we can use the Perl-like syntax {lower,upper}:
exists {2,3} [ $c (car)
bn: /ferrari/i
(is-owned-by)
owner : rho
property : $c ]
In this example we constrain a map to have at least 2, but no more than 3 topics of type car having a base name matching ferrari case-insensitively whereby these cars are owned by rho.
Note, that with this notation variables within a pattern morph from singular to plural.
To translate
exists {m,n} [ # condition ]
into the core AsTMa! syntax, we first build constraints for the lower bound:
exists $v1 [ # condition ]
AND
exists $v2 [ # condition ]
AND
...
AND
exists $vm [ # condition ]
As it is inherent in AsTMa! that different variables MUST match different values, we have prescribed the existence of (at least) m instances matching the condition.
For the upper bound, n we add clauses to prohibit the existence of more instances:
...
AND
not exists $vm_plus_1 [ # condition ] # for m+1
AND
...
not exists $vn [ # condition ]
An AsTMa! parser must flag an error if m is bigger than n.
FIXME: add plural variables here?
In other situations it might be necessary to restrict the number elements within one maplet pattern. Application thereof are limitations on the number of members in an association or the number of particular characteristics within one topic. The extended syntax allows {n,m} constructs also there.
To constrain, for example, that a child must have exactly two parents we would notate that as
forall $a [ (are-parents-of) ]
=> exists $a ] (are-parents-of)
parent{2} : *
child : * [
which is equivalent to
forall $a [ (are-parents-of) ]
=> exists $a ] (are-parents-of)
parent : *
parent : *
child : * [
If, as another example, we want to enforce that every topic must contain at least two basenames, then
forall $t [ * ] # all topics
=> exists $t [ bn{2,} : * ]
The mapping into the AsTMa! core languages happens like in the previous section.
FIXME: This needs more elaboration.
Maplet patterns which constrain associations can be relaxed by allowing options for the members. These options are separated by the pipe character |:
exists [ (everybody-loves-someone)
lover: romeo
loved: shirley | kylie | julia ]
The above constraint is equivalent with
exists [ (everybody-loves-someone)
lover: romeo
loved: shirley ]
OR
exists [ (everybody-loves-someone)
lover: romeo
loved: kylie ]
OR
exists [ (everybody-loves-someone)
lover: romeo
loved: julia ]
If options are used for several
members, then all possible combinations are enumerated.
To use constraints as topics in associations, constraints can also be reified in the same way as topics and associations in AsTMa=:
forall [ .... ] is-reified-by constraint-0815
In some cases regular expressions are not expressive enough to constrain string values. Examples are values which have to come from a particular domain such as natural numbers, some finite set of enumeratable values or URLs which represent a downloadable resource at the moment of checking the constraint.
AsTMa! implementations MAY offer particular functions which may be used inside maplet patterns. An example might be
forall $a [ * (online-article) ]
=> suggested exists $a [ oc (download): ?is_live_url ]
After having identified all online articles in a map we check whether for these articles at least one occurrence of type download can be downloaded at validation time.
Another example is to constrain a resourceData occurrence to be a date:
forall $w [ * (world-war) ]
=> exists $w [ in (start-date) : ?is_date ]
Function references can occur anywhere where regular expressions are allowed. When invoked, they get passed the string value matching this particular pattern part (in this sense, the regular expression m|.*| will be used to bind the string, so ignoring leading and trailing blanks).
If a particular function is not implemented by an
implementation, it may flag a warning condition, but otherwise behave as if the
function returned TRUE.
An existence quantified constraint can have either of three modifiers: mandated, suggested or derived. The default is mandated.
The moderators do not change anything on the validity relation; they only indicate to the processor information how to regard constraints.
If a constraint contains an existence quantified constraint with the moderator mandated then that constraint MUST be satisfied by a conforming map. A validator will report an error if that is not the case.
Example X:
forall [ $car (ferrari) ]
=> mandated exists [ (is-property-of)
property: $car ]
Some ontologies not only constrain future maps in a formal sense but also to express current best practices. Others may be input for editing environments directing them to prompt information from the user which may be optional. As such, a map not conforming to such weaker constraints can still be valid. If a constraint contains an existence quantified constraint with the moderator suggested then a validator will not flag an error but only a warning.
Example X:
forall [ $car (ferrari) ]
=> suggested exists [ has-color
object: $car
color : red ]
If a constraint contains a derived moderator this constraint is completely ignored by a validator. In this case the constraint poses not really a constraint for a conforming map; instead it is used to express an ontological property which should be silently assumed to hold for any map. With this it is possible to enrich map with ontological knowledge before they are queried by a query processor.
Example X:
forall [ (is-property-of)
object : $car
owner : $owner ]
=> forall [ $car (ferrari) ]
=> derived exists [ $owner (millionair ) ]
In a semi-formal way we have tried to define AsTMa! as logic oriented pattern matching language. First we homogenized topics and associations introducing the concept of a maplet. It forms a uniform element which can be used to build complete maps.
Then we extended the notation of the authoring language AsTMa= by regular expressions and then defined an upper and a lower bound on how a pattern matches a maplet. Using these open and closed maplet patterns we added quantifiers which build constraints on maps on the whole, not just patterns for individual maplets.
Using normal logic operations on quantified constraints allowed us to build fairly complex constraints which provide ways to identify particular submaps first before constraining other parts of the map.
Much of the initial ideas emerged in the discussions with Jan Gylta and Daniel Eliasson when they worked on using Topic Maps for syndication. Alexander Zangerl contributed significantly to improve the clarity of the text. Erica Santosaputri triggered many changes in the language semantics when she implemented a AsTMa! to Prolog converter in a proof-of-concept study.
Depending on the provienience different people prefer different syntax for particular language elements. This section hosts these syntactical abberrations from the language definition. An AsTMa! processor might allow free mixing or might prescribe a particular skin.
Skin AsTMa
Uses exists and forall as defined in this document.
Skin XQuery
This skin uses the same syntax like AsTMa! with the following exceptions:
· Instead of exists $t p one can write SOME $t SATISFIES p
· Instead of forall $t p one can write EVERY $t SATISFIES p
Skin XAsTMa
This skin is an XML representation of an AsTMa! expression and parallels the syntax definition defined in Appendix C1 whereby using XML features. The following expression
forall [ $b
bn : /pattern/ ]
=> exists ] (some-association)
role1 : player
role2 : $b [
would be written as
<a:constraint xmlns:a="http://astma.it.bond.edu.au/ns/astma!/1.0/"
xmlns:t="http://www.topicmaps.org/xtm/1.0/"
xmlns:xlink="http://www.w3.org/1999/xlink">
<a:forall>
<a:pattern open="yes">
<t:topic id="$b">
<t:baseName>
<t:baseNameString>/pattern/</t:baseNameString>
</t:baseName>
</t:topic>
</a:pattern>
<a:exists>
<a:pattern open="no">
<t:association>
<t:instanceOf>
<t:topicRef xlink:href="#some-association"/>
</t:instanceOf>
<t:member>
<t:roleSpec>
<t:topicRef xlink:href="#role1"/>
</t:roleSpec>
<t:topicRef xlink:href="#player"/>
</t:member>
<t:member>
<t:roleSpec>
<t:topicRef xlink:href="#role2"/>
</t:roleSpec>
<t:topicRef xlink:href="#$b"/>
</t:member>
</t:association>
</a:pattern>
</a:exists>
</a:forall>
</a:constraint>
A more formal definition will follow.
Following functions must be assumed to be declared by any AsTMa! processor:
# Synopsis:
#
# Name: transitivity
#
# Property: a R b ^ b R c => a R c
#
# Invocation: transitivity (is-located-in, location, region)
#
function transitivity (relation, from, to)
returns
forall [ (relation)
from : $a
to : $b ]
=> forall [ (relation)
from : $b
to : $c ]
=> derived exists [ (relation)
from : $a
to : $c ] is-reified-by transitivity
# Synopsis
#
# Name: symmetry
#
# Property: a R b => b R a
#
# Invocation: symmetry (is-adjacent,neighbor1,neighbor2)
#
function symmetry (relation, from, to)
returns
forall [ (relation)
from : $a
to : $b ]
=> derived exists [ (relation)
from : $b
to : $a ] is-reified-by symmetry
# Synopsis
#
# Name: reflexivity
#
# Property:
#
# Invocation: forall [ $x (regions) ]
# => reflexivity (contains, container, containee, $x)
#
function reflexivity (relation, role1, role2, var)
returns
exists [ (relation)
role1 : $x
role2 : $x ]
# Synopsis
#
# Name: is-used-....
#
# Description: return true or false depending whether the given
# topic is used in a particular way or not.
#
# Invocation: is-used-as-type (car)
#
function is-used (T)
returns
is-used-as-type(T)
or
is-used-as-scope(T)
or
is-used-as-player(T)
or
is-used-as-role(T)
function is-used-as-type (T)
returns
is-used-as-topic-type (T)
or
is-used-as-assoc-type (T)
or
is-used-as-occur-type (T)
function is-used-as-topic-type (T)
returns
exists [ * (T) ]
function is-used-as-assoc-type (T)
returns
exists [ (T) ]
function is-used-as-occur-type (T)
returns
exists [ oc (T) : * ]
function is-used-as-scope (T)
return
is-used-as-bn-scope (T)
or
is-used-as-oc-scope (T)
or
is-used-as-assoc-scope (T)
function is-used-as-bn-scope (T)
return
exists [ bn @ T : * ]
function is-used-as-oc-scope (T)
return
exists [ oc @ T : * ]
function is-used-as-assoc-scope (T)
return
exists [ T @ (*) ]
function is-used-as-player (T)
return
exists [ (*)
* : T ]
function is-used-as-role (T)
return
exists [ (*)
T : * ]
AsTMa! is line oriented and follows the same syntactic rules as AsTMa=. All keywords are case sensitive, so are all quantifiers and boolean operators. Variables have to follow the pattern ($|@)\w[[:alnum:]-_]+ as described above.
Constraints must not separated by empty lines or comments. It is recommended but not necessary to indent nested constraints.
The core language has the following syntax:
constraint -> bool-constraint | function-definition
bool-constraint -> '(' bool-constraint ')' |
or-constraint
or-constraint -> and-constraint [ 'OR' bool-constraint ]
and-constraint -> simple-constraint [ 'AND' bool-constraint ]
simple-constraint -> 'forall' variable-name pattern '=>'
(
bool-constraint |
exists-constraint
) |
function-invocation
exists-constraint -> ('derived' | 'suggested' | 'mandated') 'exists' variable-name pattern
name -> /^\$\w[[:alnum]]/
function-definition -> 'function' function-name '(' [ formal-param-list ]')' 'returns' constraint
param-list -> < ',', parameter-name [ ':' default-value ] >
function-invocation -> function-name '(' [ actual-param-list ] ')'
pattern -> '[' maplet-pattern ']' |
']' maplet-pattern ']'
maplet-pattern -> is an AsTMa= expression with regular expressions, wildcards and variables
default-value -> either a string enclosed with "" or a constant
The extended syntax incorporates all syntactical extensions of the core language defined so far.
tbd
AsTMa! defines only a minimal set of functions, mainly to support patterns containing regular expressions. Implementations are free to offer (a) more functions and (b) a mechanism to allow applications to register functions.
Below is a non-normative list of suggested functions.
· is_integer (in string): returns non-zero if the only parameter is a string representation of an integer.
· is_float (in string): returns non-zero if the only parameter is a string representation of an float.
· is_uri (in string): returns non-zero if the only parameter is a valid URI.
· is_urn (in string): returns non-zero if the only parameter is a valid URN.
· is_url (in string): returns non-zero if the only parameter is a valid URL.
· is_live_url (in string): returns non-zero if the only parameter is a valid URL and can be referenced at checking time.
AsTMa! allows directives in the same way as AsTMa=. This implies that AsTMa! directives have to be introduced by a '%' character at the beginning of a line.
The precise semantics of a directive is up to the implementation. There are no mandatary directives an implementation has to provide. Typical examples are
% optimize : speed
% optimize : memory
which could influence runtime properties of the validation process to optimize for either processing speed or memory consumption.
A proposal for a TMQL, Based on OQL and an underlying Object Model of XTM
Christoph Schmidt, Feb 2001, posted to the tmql mailing list
TMQL Draft(Topic Map Query Language)
Ann Wrightson, Ontopia, BSI, 7 Nov 2000 (corrected 28 Nov 2000), http://www.y12.doe.gov/sgml/sc34/document/0186.doc
XQuery
XQuery 1.0: An XML Query Language, W3C Working Draft 20 December 2001, http://www.w3.org/TR/xquery/
Answer is just a question [of matching Topic Maps]
Rafal Ksiezyk, XML Europe 2000, proceedings, http://www.gca.org/papers/xmleurope2000/papers/s22-03.html
XML Topic Maps (XTM) 1.0
Specification, Steve Pepper, Graham Moore, Steven R. Newcomb, Michel Biezunski, http://www.topicmaps.org/xtm/
Creating semantically valid topic maps
Geir Ove GrØnmo, Ontopia, http://www.ontopia.net/topicmaps/materials/tm-schemas-paper.pdf
Topic Map Fundamentals for Knowledge Representation
H. H. Rath, in XML Topic Maps, Creating and Using Topic Maps for the Web, Ed: J. Park, 2002, Addison Wesley
SAM, The Standard Application Model for Topic Maps
Ed: L.M. Garshol, http://www.y12.doe.gov/sgml/sc34/document/0396.htm
Mastering Regular Expressions
Jeffrey E. F. Friedl; 2nd Edition July 2002, O'Reilly
Prolog for Programmers
F. Kluźniak, S. Szpakowicz, 1985, Academic Press
XML Path Language (XPath), Version 1.0
Ed: J. Clark, S. DeRose, W3C, http://www.w3.org/TR/xpath
Requirements for a Web Ontology Language
Ed: J. Heflin, R. Volz, J. Dale,
W3C, http://www.w3.org/TR/webont-req/