ISO/IEC JTC 1/SC34 N0364
ISO/IEC JTC 1/SC34
Information Technology --
Document Description and Processing Languages
Title: |
Datatypes for document content validation |
Source: |
Martin Bryan, editor |
Project: |
Document Schema Description Languages |
Project editor: |
Martin Bryan |
Status: |
For discussion |
Action: |
|
Date: |
10 December 2002 |
Summary: |
|
Distribution: |
SC34 and Liaisons |
Refer to: |
|
Supercedes: |
|
Reply to: |
Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12
National Security Complex Information Technology Services Bldg. 9113
M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865
574-6973 Facsimile: +1 865 574-1896 E-mailk:
mailto:[email protected] http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm
Ms. Sara Hafele Desautels, ISO/IEC JTC 1/SC 34 Secretariat American
National Standards Institute 25 West 43rd Street New York, NY 10036
Tel: +1 212 642-4937 Fax: +1 212 840-2298 E-mail:
[email protected] |
Datatypes for document content validation
Part 5 of the Document Schema Description Language (DSDL) defines a set
of primitive datatypes, a set of DSDL datatypes, a set of commonly required
derived datatypes and a method for defining customized datatypes that can be
used to validate the contents of specific elements and attributes within DSDL
document instances. The specification also includes a set of constraints that
can be used to limit the range of primitive datatypes and their
derivatives.
The following goals have influenced the way in which datatypes are
defined in this specification:
- Minimize the number of primitive datatypes
- Provide datatypes for constructs defined in other parts of DSDL
- Define an extensive set of derived datatypes within the standard
- Allow users to define customized datatypes based on primitive or
derived datatypes, or by extending existing customized definitions
- Allow derived datatypes to be defined by assigning limits to existing
datatypes
- Allow datatypes to be defined through matching values to patterns
- Make it possible to define valid subsets of permitted values of
datatypes
- Make it possible to restrict the use of specific values within a
datatype.
1. Primitive Datatypes
The following minimal set of primitive datatypes can be used to derive
other datatypes:
- String (string)
A contiguous sequence of parsed characters that
conform to a specified character set, or to the default ISO 10646
Universal Character Set (UCS) if no character set
has been specified during validation. Every character in
the string has a corresponding UCS code point, which is an
integer.
Constraints: fixedLength; minLength; maxLength; pattern
- Boolean (boolean)
A binary value. Can be expressed using the
strings "true" or "false" or the integers 0 and 1.
- Fixed point number (decimal)
A sequence of digits which can
optionally contain a decimal point (expressed either as a period or a comma)
separating a sequence of one or more integer digits on the left from one or
more decimal digits on the right. The integer digits can optionally be preceded
by a plus sign (+) that confirms that the number has a positive value (the
default) or a hyphen (-) that indicates that it has a negative
value.
Note: Commas may not be used to identify subsets of integers.
Values between 1 and -1 must have an integer value of 0.
Constraints:
totalDigits; decimalDigits; minInclusive; maxInclusive; minExclusive;
maxExclusive; pattern
Derived datatypes: integer; positiveInteger;
negativeInteger; nonPositiveIntger; nonNegativeInteger, long, int, short,
unsignedLong, unsignedInt, unsignedShort
[Issue 1-1: Do we need to take the
Fract and Accum fixed-point datatypes being proposed for C in ISO WDTR 18037
(http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n972.pdf) into account?]
- Floating point number (real)
A number consisting of manitisa
expressed as a decimal followed, optionally, by the character "E" or "e",
followed by an exponent expressed as an integer. Both the mantissa and the
exponent can optionally be preceded by a plus sign (+) that confirms that the
following number has a positive value (the default) or a hyphen (-) that
indicates it has a negative value.
Constraints: minInclusive;
maxInclusive; minExclusive; maxExclusive; pattern
[Issue 1-2: Should
floating point numbers be constrained as far as maximum length is
concerned?]
[Issue 1-3: Should special patterns be defined to identify
positive and/or negative infinity?]
Derived datatypes: double
- Date and/or Time (dateTime)
A specific instance of Gregorian time,
defined using the ISO 8601 extended format CCYY-MM-DDThh:mm:ss.sss±hh:mm
where "CC" represents the century (optionally preceded by a hyphen to identify
dates preceding the Gregorian calendar start point), "YY" the year, "MM" the
month and "DD" the day. The letter "T" is the date/time separator and "hh",
"mm", "ss.sss" represent hours, minutes and seconds (including fractional
seconds following a period) respectively. Where the time is specified for a
timezone other than the Coordinated Universal Time (UTC) zone the relevant time
offset must be entered after either a plus sign (+) or a hyphen (-) to specify
the number of hours and minutes difference from UTC/GMT. The letter Z may be
used in place of the plus or minus, without following numbers, to confirm that
the default Coordinated Universal Time zone has been used to specify the time.
Both the timezone and the time data are optional.
Note: The CCYY value
may not be 0000. Months are defined using a pair of digits in the range 01 to
12. Days are defined using a pair of digits in the range 01 to 31, with certain
values being forbidden in combination with specific values for the month. Hours
(hh) are defined as a pair of digits in the range 00 to 23, minutes (mm) are
defined as a pair of digits in the range 00 to 59 and seconds are defined as a
decimal number in the range 00.000 to 59.999.
[Issue 1-4: Do we need a
mechanism to allow people to specify that the last day of the month should be
the one that applies, irrespective of its number?]
Derived datatypes:
Gregorian date (CCYY-MM-DD); Gregorian year (CCYY); recurring month (--MM--,
expressed as a pair of digits in the range 01 to 12)); recurring day (----DD,
expressed as a pair of digits in the range 01 to 31); time
(Thh:mm:ss.sss±hh:mm where hour is expressed as a pair of digits in the
range 00 to 23, minute is expressed as a pair of digits in the range 00 to 59
and seconds are as a decimal number in the range 00.000 to 59.999)
- Period of time (period)
Period of Gregorian time defined using the
ISO 8601 PnYnMnDTnHnMnS format, where
nY represents the number of years, nM the number of months,
nD the number of days, 'T' is the date/time separator, nH the
number of hours, nM the number of minutes and nS the number of
seconds. All numeric values (n) can be expressed using decimals of
arbitrary precision providing the following letters and numbers are
omitted.
[Issue 1-5: The XML Schema representation of period only allows
decimals to be used to qualify seconds. Should DSDL be similarly constrained,
or should it remain more compatible with ISO 8601?]
- Hexadecimal binary sequence (hexBinary)
Sequence of hexadecimal
numbers (in the range 00 to FF) that encode a finite length sequence of binary
octets.
Constraints: fixedLength; minLength;
maxLength; pattern
- Base64 binary sequence (base64Binary)
Binary stream is encoded using the Base64
Content-Transfer-Encoding defined in Section 6.8 of
IETF RFC
2045.
Constraints: fixedLength; minLength; maxLength; pattern
[Issue 1-6: Should the language independent datatypes defined in ISO
11404 (http://std.dkuug.dk/jtc1/sc22/wg11/docs/iso11404.pdf) not incorporated
into XML Schema Part 2 be considered, given that this document is currently
under review by SC22, having first been published in 1996?
NB: ISO 11404 primitives are:
primitive-type =
boolean-type | state-type | enumerated-type | character-type
| ordinal-type | time-type | integer-type |
rational-type | scaled-type | real-type | complex-type |
void-type
The types that specifically need to be considered for
inclusion are "state-type", "ordinal-type", "scaled-type" and "void-type". The
"boolean-type", "character-type", "time-type" and "real-type" can be equated to
existing definitions. Whether "complex-type" and "enumerated-type" are true
datatypes or expressions of ways in which datatypes can be created from basic
types is questionable.
Other ISO 11404 types of possible interest
include:
generated-type = pointer-type |
procedure-type | choice-type | aggregate-type
aggregate-type =
record-type | set-type | sequence-type | bag-type | array-type |
table-type]
2. DSDL Datatypes
The following datatypes are used to identify constructs that conform to
DSDL constraints:
NB: This list will need to be revised in the light of the development
of DSDL.
[Issue 2-1: At what point is datatype validation applied? Do we really
need all of these?]
- String with no whitespace that conforms to DSDL naming rules
(localName)
<datatype
name="localName">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName"/>
</constraints>
</datatype>
- Name where namespace prefix has been replaced by URI mapped to
namespace prefix (qualifiedName)
<datatype name="qualifiedName">
<constraints
base="string">
<pattern
asDefinedIn="http://www.iso.ch/jtc1/sc34/ISO19757/Part2.dsdl#name"/>
</constraints>
</datatype>
[Issue 2-2: Is
qualifiedName really a datatype? Am I right in recorded it as the URI+LocalName
as per Part 2 rather than Prefix+LocalName as is done in XML?]
[Issue 2-3:
Is Prefix also required to record the namespace prefix associated with the
URI?]
- String with no whitespace containing name characters without
restriction on Letter for first character (NMTOKEN)
<datatype name="NMTOKEN">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#NT-Nmtoken"/>
</constraints>
</datatype>
[Issue 2-4: Should we
continue to use SGML-style names that are all caps, or require the use of
nameToken, etc, to be more conformant with other datatype names?]
- Tokenized string containing only valid name tokens
(NMTOKENS)
<datatype
name="NMTOKENS">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#NT-Nmtokens"/>
</constraints>
</datatype>
- Tokenized string containing any sequence of characters other than
spaces, tabs and control characters (tokenizedString)
<datatype name="tokenizedString">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#NT-Enumeration"/>
</constraints>
</datatype>
[Issue 2-5: Do we need to
allow for tokenized strings that contain punctuation other than that which is
valid in names or should we stick to the restricted definition of XML? For
example, tokens such as name+, name? and M&S would not be valid name tokens
but could be valid within a tokenized string.]
- Name used as unique identifier (ID) [May need to be generalized to
cope with DSDL keys.]
<datatype
name="ID">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#id"/>
</constraints>
</datatype>
- Name used to reference a unique identifier (IDREF) [May need to be
generalized to cope with DSDL key references.]
<datatype name="IDREF">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#idref"/>
</constraints>
</datatype>
- Tokenized string containing names that will be used to reference
unique identifiers (IDREFS)
<datatype name="IDREFS">
<constraints
base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#idref"/>
</constraints>
</datatype>
[Issue 2-9:
There is no distinguishing differentiator between the definitions of IDREF and
IDREFS in the XML specification. Is this acceptable: i.e. must their patterns
have different pointers?]
- Name used to identify formally defined DSDL entity
(ENTITY)
<datatype
name="ENTITY">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#entname"/>
</constraints>
</datatype>
- Tokenized string containing names that identify formally defined DSDL
entities (ENTITIES)
<datatype
name="ENTITIES">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#entname"/>
</constraints>
</datatype>
[Issue 2-9:
There is no distinguishing differentiator between the definitions of ENTITY and
ENTITIES in the XML specification. Is this acceptable: i.e. must their patterns
have different pointers?]
- String containing a name that identifies a formally defined DSDL
notation (NOTATION)
<datatype
name="Notation">
<constraints base="string">
<pattern
asDefinedIn="http://www.w3.org/TR/1998/REC-xml-19980210/#NT-NotationDecl"/>
</constraints>
</datatype>
[Issue 2-10:
Neither NotationDecl or NotationType is quite correct.
NotationType allows name groups, which is OK for permitted sets of attribute
values, but not for the datatype, which must be to a single notation.
NotationDecl is a pointer to a notation declaration, only the Name part of
which is relevant to the datatype definition. But do we really want to point
simply to the definition of a Name?]
- String containing a name that identifies a primitive, derived or
customized DSDL datatype (datatypeName)
<datatype name="datatypeName">
<constraints
base="string">
<pattern
asDefinedIn="#DatatypeName"/>
</constraints>
</datatype>
[Issue 2-6: What other types of DSDL data types will there be?]
[Issue 2-7: Should we include DSSSL types such as quantity, pair, real
(if floating point is not defined in an acceptable form for DSSSL) and number
(if not directly mappable to the fixed point number construct)?]
3. Commonly Required Derived Datatypes
The following commonly used datatypes can be derived from primitive
datatypes:
- Resource Identifier (URI)
A string containing sequence of valid
resource identifiers that form a Uniform Resource Identifier (URI) as defined
in IETF RFC 2396, as amended
by IETF RFC 2732, or an
Internationalized Resource Identifier (IRI) when this specification is formally
approved as an IETF standard. Values can be absolute or relative, and may have
an optional fragment identifier.
<datatype name="URI">
<constraints base="string">
<choice>
<pattern
asDefinedIn="http://www.ietf.org/rfc/rfc2396.txt
http://www.ietf.org/rfc/rfc2732.txt"/>
<pattern
asDefinedIn="http://www.ietf.org/rfc/rfc????.txt"/>
</choice>
</constraints>
</datatype>
- Integer
Fixed point number with no decimal point
<datatype name="integer">
<constraints base="decimal">
<decimalDigits>0</decimalDigits>
</constraints>
</datatype>
[Issue 3-1: Do
we need the different bit length and positive/negative variants for integers to
be defined as separate datatypes as shown in the following entries?]
- Positive integer (positiveInteger)
Integer greater than
0
<datatype
name="positiveInteger">
<constraints
base="integer">
<minExclusive>0</minExclusive>
</constraints>
</datatype>
- Negative integer (negativeInteger)
Integer less than 0
<datatype
name="negativeInteger">
<constraints
base="integer">
<maxExclusive>0</maxExclusive>
</constraints>
</datatype>
- Non-positive integer (nonPositiveInteger)
Integer less than or
equal to 0
<datatype
name="nonPositiveInteger">
<constraints
base="integer">
<maxInclusive>0</maxInclusive>
</constraints>
</datatype>
- Non-negative integer (nonNegativeInteger)
Integer greater or equal
to 0
<datatype
name="nonNegativeInteger">
<constraints
base="integer">
<minInclusive>0</minInclusive>
</constraints>
</datatype>
- 64-bit integer (long)
Integer in range -9223372036854775808 to
9223372036854775807
<datatype
name="long">
<constraints base="integer">
<minInclusive>-9223372036854775808</minInclusive>
<maxInclusive>9223372036854775807</maxInclusive>
</constraints>
</datatype>
- 32-bit integer (int)
Integer in range -2147483648 to
2147483647
<datatype
name="int">
<constraints base="integer">
<minInclusive>-2147483648</minInclusive>
<maxInclusive>2147483647</maxInclusive>
</constraints>
</datatype>
- 16-bit integer (short)
Integer in range -32768 to 32767
<datatype name="short">
<constraints base="integer">
<minInclusive>-32768</minInclusive>
<maxInclusive>32767</maxInclusive>
</constraints>
</datatype>
[Issue
3-2: Do we need 8-bit integers (bytes) as well, given that DSDL is
based on a 16-bit syntax, UTF16?]
- 64-bit positive integer (unsignedLong)
Integer in range 0 to
18446744073709551615
<datatype
name="unsignedLonge">
<constraints
base="integer">
<minInclusive>0</minInclusive>
<maxInclusive>
18446744073709551615</maxInclusive>
</constraints>
</datatype>
- 32- bit positive integer (unsignedInt)
Integer in range 0 to
4294967295
<datatype
name="unsignedInt">
<constraints
base="integer">
<minInclusive>0</minInclusive>
<maxInclusive>4294967295</maxInclusive>
</constraints>
</datatype>
- 16-bit positive integer (unsignedShort)
Integer in range 0 to
65535
<datatype
name="unsignedShort">
<constraints
base="integer">
<minInclusive>0</minInclusive>
<maxInclusive>65535</maxInclusive>
</constraints>
</datatype>
[Issue 3-3:
Do we need 8-bit bytes as well, given that DSDL is based on a 16-bit syntax,
UTF16?]
- 64-bit floating point number (double)
IEEE double-precision 64-bit
floating point number conforming to
IEEE
754-1985.
[Issue 3-4: Do we need separate datatypes for 16 and 32
bit floating numbers?]
<datatype
name="double">
<constraints base="real">
<minExclusive>-1e971</minExclusive>
<maxExclusive>1e971</maxExclusive>
</constraints>
</datatype>
[Issue 3-5: How can
we constrain the exponent to be in the range -1075 to 970, and ensure the
mantissa does not exceed 2^53?]
- Specific Time (time)
Thh:mm:ss.sss±hh:mm subset of the
dateTime primitive
<datatype
name="time">
<constraints base="dateTime">
<pattern>T[0-2][0-9]:[0-5][0-9]:[0-5][0-9](.[0-9]([0-9]([0-9])?)?)?
([+-][0-5][0-9]:("00"|"15"|"30"|"45"))?</pattern>
</constraints>
</datatype>
[Issue 3-6: How can
the constraints on the values of time be accurately expressed? For example, the
pattern suggested above does not restrict hours to the range 00 to 23.]
- Gregorian Date (gDate)
CCYY-MM-DD subset of the dateTime
primitive
<datatype
name="gDate">
<constraints
base="dateTime">
<pattern>[0-9](4)-[0-1][0-9]-[0-3][0-9]</pattern>
</constraints>
</datatype>
[Issue 3-7: How can
the constraints on the values of date be accurately expressed?]
- Gregorian Recurring Month (gMonth)
Subset of the dateTime
primitive where CCYY and DD are replaced by - to give --MM--, with no time
specified.
<datatype
name="gMonth">
<constraints
base="dateTime">
<pattern>--[0-1][0-9]--</pattern>
</constraints>
</datatype>
- Gregorian Recurring Day Of Month (gDay)
Subset of the dateTime
primitive where CCYY and MM are replaced by - to give ----DD, with no time
specified.
<datatype
name="gDay">
<constraints
base="dateTime">
<pattern>----[0-3][0-9]</pattern>
</constraints>
</datatype>
[Issue 3-8: Should
Day Within Week be a derived datatype?]
- Gregorian Recurring Day in Month (gMonthDay)
Subset of the
dateTime primitive where CCYY is replaced by - to give --MM-DD, with no
time specified.
<datatype
name="gMonthDay">
<constraints
base="dateTime">
<pattern>--[0-1][0-9]-[0-3][0-9]</pattern>
</constraints>
</datatype>
- Gregorian Month In Year (gYearMonth)
Subset of the dateTime
primitive consisting solely of CCYY-MM.
<datatype name="gYearMonth">
<constraints base="dateTime">
<pattern>[0-9](4)-[0-1][0-9]</pattern>
</constraints>
</datatype>
- Gregorian Duration (gDuration)
Start and end dates expressed as
two dateTime primitives separated by a slash (/)
(CCYY-MM-DDThh:mm:ss.sss±hh:mm/CCYY-MM-DDThh:mm:ss.sss±hh:mm)
[Issue
3-9: How can this constraint be expressed? (Should duration be a primitive?)]
[Issue 3-10: Do we need to define all the alternative ISO 8601 date
variants (e.g. without hyphens and colons, etc), or will the limited dateTime
primitive definitions shown above be sufficient?]
[Issue 3-11: What other options do we need to allow for? Should ISO 639
language be included (if so what about the IETF rules re extensions)? What
about currency as used in XForms, in support of ISO 4217 (or is this simply an
application of decimal)?]
4. Constraining Properties
The following properties can be used to constrain datatypes that are
derived from strings:
- fixed length (fixedLength)
Integer defining number of UCS
characters that must be contained in a valid string
or
- maximum length (maxLength)
Integer defining the maximum number of
UCS characters that can occur in a valid string
- minimum length (minLength)
Integer defining the minimum number of
UCS characters that must occur in a valid string
The following properties can be used to constrain datatypes that are
derived from fixed point and floating point numbers:
- maximum value: inclusive (maxInclusive)
The highest value that an
entered number is permitted to have
- maximum value: exclusive (maxExclusive)
A value that the entered
number must be less than
- minimum value: inclusive (minInclusive)
The lowest value that an
entered number is permitted to have
- minimum value: exclusive (minExclusive)
A value that the entered
number must be greater than
The following additional properties can be used to constrain fixed point
numbers:
- maximum number of digits, including any decimal point (totalDigits)
- maximum number of digits that can follow the decimal point
(decimalDigits)
The following properties can be used to constrain any datatype:
- validation pattern (pattern)
[Issue 4.2: Should more than one pattern grammar be allowed for by
applying a pointer to the relevant grammar at some higher level in the
syntax?]
5. Deriving Customized Datatypes
Each type defined in a DSDL schema must be
assigned a unique identifier as its name, which must not be identical to any of
the names assigned to primitive or derived datatypes defined in this standard,
or to the name of any customized datatype imported into the schema. The
datatype name must conform to the rules for defining DSDL names.
[Issue 5-1: Should datatype names be DSDL unique identifiers or keys?
How can we ensure that imported datatypes do not share the same name?]
Each set of constraints defined for a datatype shall be based on either
a primitive datatype, a derived datatype defined in this standard or a
customized datatype defined in, or imported into, the same schema. The unique
identifier of the base datatype must be specified as an attrribute of the
constraint element. Where more than one constraint element is defined for a
datatype the constraints are applied sequentially to create a "compound
datatype" made up of components of different datatypes.
Note: It is an error if two consecutive constraint elements have the
same base type, or have base types derived from the same primitive
datatype.
[Issue 5-6: Do we need elements around sets of constraints to allow the
definition of structured constructs, such as arrays and tables? If so, do we
also need sets, bags and repeatable sequences to be definable?]
Constraining properties shall be entered as subelements of the
constraints definition using elements whose name is shown in brackets after the
name of the property. Only those properties relevant for the primitive datatype
from which the datatype is derived may be defined. The only property that can
be specified more than once in each set of constraints is the pattern property,
which can be duplicated as many times as necessary to indicate all the known
patterns for the datatype. (Patterns are checked for in the order listed in the
instance.)
[Issue 5-2: How will DSDL allow us to manage the fact that models of the
constraints element will be dependent on the value given to the base attribute,
which may in its turn be derived from a customized datatype rather than a
primitive?]
[Issue 5-3a: How will patterns be specified? (Is this a good name, given
its application in Part 2, or should it be renamed datatypePattern?) How can we
restrict the number of times a part of a pattern is repeated? Should users be
able to use the asDefinedIn attribute to reference external definitions of
datatypes?]
Where only specified values are to be permitted a list of validValues
may be specified. These values can be assigned a value for the rule attribute
of either "no-others" to indicate that only specified values are valid, or of
"with-others" to indicate that the list of validValues only indicates
currently known values, which the user can extend by entering any other value
that conforms to the constraints assigned to the datatype. Where no value is
specified for the rule attribute the default value of "no-others" applies.
Optionally a statement of meaning can be assigned to each value. The
contents of all values entered within a single accept element must be unique.
Where the datatypes of a valid value differ in datatype from that assigned as
the base datatype of the containing constraints element, the optional datatype
attribute must be used to indicate the datatype of the entered value.
Note: The mixing of datatypes within lists is deprecated, even though
it has been enabled.
The invalidValues element can be used to identify specific values that
may not be used as valid entries. Optionally a statement of the reason why the
value is invalid can be assigned to each value. The contents of all values
entered within a single reject element must be unique.
[Issue 5-4: Should datatype be allowed to have structured values? (NB:
XML Schema defines permitted enumeration values as attributes, which allows the
element itself to consist solely of annotation.)]
[Issue 5-5: Given that RelaxNG has exceptPatterns, do we also need to
ability to inhibit values as part of the datatype definition? What is the
relationship between patterns declared to be invalid for the whole datatype and
those declared to be invalid for a specific element?]
The following example shows how customized datatypes can be expressed
using the elements defined in Appendix A.
<datatype
name="type-a">
<constraints base="string">
<fixedLength>3</fixedLength>
<pattern>[a-fA-F(3)]</pattern>
<validValues
rule="no-others">
<accept>
<value>abc</value>
<meaning>Latin
alphabet</meaning>
</accept>
<accept>
<value>def</value>
<meaning>Braille
alphabet</meaning>
</accept>
...
</validValues>
<invalidValues>
<reject>
<value>bad</value>
<reason>Can be
confused with bed.</reason>
</reject>
</invalidValues>
</constraints>
</datatype>
Appendix A: DSDL Schema for the Specification of Datatypes
The following (yet-to-be-validated) DSDL schema can be used to validate
datatype definitions:
[Issue A.1: Can we validly used DSDL datatypes to define the
specification for DSDL datatypes?]
<grammar
datatypeLibrary="http://www.iso.ch/jtc1/sc34/iso19757/Part5.dsdl"
ns="http://relaxng.org/ns/structure/1.0"
xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="datatypeLibrary">
</start>
<element name="datatypeLibrary">
<optional>
<attribute name="name">
<data type="ID"/>
</attribute>
</optional>
<choice>
<group>
<oneOrMore>
<ref name="importedDefinitions"/>
</oneOrMore>
<zeroOrMore>
<ref name="datatype"/>
</zeroOrMore>
</group>
<oneOrMore>
<ref name="datatype">
</oneOrMore>
</choice>
</element>
<element name="importedDefinitions">
<attribute name="source">
<data type="URI"/>
</attribute>
<empty/>
</element>
<element name="datatype">
<attribute name="name">
<data type="localName"/>
</attribute>
<oneOrMore>
<ref name="constraints"/>
</oneOrMore>
</element>
<element name="constraints">
<attribute name="base">
<data type="datatypeName"/>
</attribute>
<zeroOrMore>
<choice>
<!-- Only the pattern element may be repeated -->
<ref name="pattern"/>
<ref name="validValues"/>
<ref name="invalidValues"/>
<ref name="minExclusive"/>
<ref name="maxExclusive"/>
<ref name="minInclusive"/>
<ref name="maxInclusive"/>
<ref name="fixedLength"/>
<ref name="minLength"/>
<ref name="maxLength"/>
<ref name="totalDigits"/>
<ref name="decimalDigits"/>
</choice>
</zeroOrMore>
</element>
<element name="pattern">
<optional>
<attribute name="asDefinedIn">
<data type="URI"/>
</attribute>
<attribute name="notation">
<data type="NOTATION"/>
</attribute>
</optional>
<data type="string"/>
</element>
<element name="validValues">
<optional>
<attribute name="rule">
<data type="string">with-others</data>
<data type="string">no-others</data>
</attribute>
</optional>
<ref name="accept"/>
</element>
<element name="value">
<data type="string"/>
<optional>
<attribute name="datatype">
<data type="datatypeName"/>
</attribute>
</optional>
</element>
<element name="accept">
<oneOrMore>
<ref name="value">
<zeroOrMore>
<element name="meaning">
<oneOrMore>
<anyName>
<except>
<nsName/>
</except>
</anyName>
<text/>
</oneOrMore>
</element>
</zeroOrMore>
</oneOrMore>
</element>
<element name="invalidValues">
<ref name="reject"/>
</element>
<element name="reject">
<oneOrMore>
<ref name="value">
<zeroOrMore>
<element name="reason">
<oneOrMore>
<anyName>
<except>
<nsName/>
</except>
</anyName>
<text/>
</oneOrMore>
</element>
</zeroOrMore>
</oneOrMore>
</element>
<element name="fixedLength">
<data type="integer"/>
</element>
<element name="minLength">
<data type="integer"/>
</element>
<element name="maxLength">
<data type="integer"/>
</element>
<element name="minExclusive">
<data type="integer"/>
</element>
<element name="maxExclusive">
<data type="integer"/>
</element>
<element name="minInclusive">
<data type="integer"/>
</element>
<element name="maxInclusive">
<data type="integer"/>
</element>
<element name="totalDigits">
<data type="integer"/>
</element>
<element name="decimalDigits">
<data type="integer"/>
</element>
</grammar>