Understanding DTDs
The code of the document type definitions is stored at the W3C Web site. You can access the following DTD files:
XHTML 1.0 Frameset
XHTML 1.0 Transitional
XHTML 1.0 Strict
XHTML 1.1 DTD Modules
You can review the code on these pages to better understand the rules of each DTD. Let's examine how DTDs can be used to declare the valid elements and attributes of a document.
Declaring a Document Element
In a valid document, every single element used in the document must be declared in the DTD. An element type declaration specifies the name of the element and indicates what kind of content the element can contain. It can even specify the order in which elements appear in the document. The syntax of an element declaration is:
<!ELEMENT element content-model>
where element is the name of the element. The element name is case-sensitive. The content-model specifies what type of content the element contains. Generally, elements contain either text or other elements. DTDs define five different types of element content:
- Any elements. There are no restrictions on the element's content.
- Empty elements. The element cannot store any content.
- Character data. The element can only contain a text string.
- Elements. The element can only contain child elements.
- Mixed. The element contains both a text string and child elements.
ANY Content
The most general type of content model is ANY, which allows the declared element to store any type of content. The syntax for declaring that element can contain anything is:
<!ELEMENT element ANY>
EMPTY Content
The EMPTY content model is reserved for elements that store no content. The syntax for an empty element declaration is:
<!ELEMENT element EMPTY>
Character Content
Elements that can store only text strings are declared as follows:
<!ELEMENT element (#PCDATA)>
The keyword, #PCDATA, stands for "parsed-character data." Parsed-character data is any well-formed text string. Most text strings are well-formed, except those that contain symbols reserved by XML, such as "<", ">" or "&". Note that child elements are not allowed with this declaration.
Element Content
The most complicated element declaration is for elements that contain child elements. The syntax for declaring that an element contains only child elements is:
<!ELEMENT element (child elements)>
where child elements is a list of child elements. The simplest content model would consist of a single child associated with a parent element. If an element can contain several child elements they can be entered either as a sequence or selection of choices. The syntax of a sequence is:
<!ELEMENT element (child1, child2, ...)>
where child1, child2, and so forth represent the sequence of child elements within the parent element.
The order of the child elements in the document must match the order defined in the element declaration. The other way of listing child elements, choice, presents a set of possible child elements. The syntax of the choice model is:
<!ELEMENT element (child1 | child2 | ...)>
where child1, child2, and so forth, are the possible child elements of the parent element. The choice model does not require these child elements nor does it force a particular order on those child elements (unlike the sequence model).
Mixed Content
As the name implies, an element with mixed content contains both character data and child elements. The syntax for declaring mix content is:
<!ELEMENT element (#PCDATA | child1 | child2 | ...)*>
This means that the parent element can contain character data or any number of the specified child elements-or it could contain no content at all.
Declaring Element Attributes
In a DTD you also declare the attributes associated with each element. To do this you must add an attribute-list declaration to the document's DTD. The attribute-list declaration accomplishes the following:
- Lists the names of all of the attributes associated with a specific element
- Specifies the data type of the attribute
- Indicates whether the attribute is required or optional
- If necessary, provides a default value for the attribute
The syntax for declaring a list of attributes is:
<!ATTLIST element attribute1 type1 default1
attribute2 type2 default2
attribute3 type3 default3 ... >
where element is the name of the element associated with the attributes; attribute is the name of an attribute, type is the attribute's data type, and default indicates whether the attribute is required or implied, and whether it has a fixed value or a default value.
In actual practice, declarations for elements with multiple attributes are easier to interpret if the attribute declarations are defined separately rather than in one long declaration. An equivalent form in the DTD would be:
<!ATTLIST element attribute1 type1 default1>
<!ATTLIST element attribute2 type2 default2>
<!ATTLIST element attribute3 type3 default3>...
The XML parser will combine the different statements into a single attribute declaration. If the processor encounters more than one declaration for the same attribute, it will ignore the second statement. Attribute-list declarations can be located anywhere within the document type declaration, although it is easier to work with attribute declarations that are located next to declaration for the element they’re associated with.
All attribute values are text strings, but you can control what type of text is used with the attribute. Attribute values can be placed into three general categories: string, enumerated, and tokenized. Each of these categories gives you varying degrees of control over the attribute’s content. Let’s investigate these categories in greater detail.
String Types
String types are the simplest form for the attribute value. The content of an attribute that is declared as a string type is ignored by the XML parser, which means that string types can contain blank spaces and any character except those reserved by XML (chiefly the <, >, and & characters), even symbols that are not part of ASCII text. To declare an attribute value as string type, use the attribute type:
attribute CDATA
Enumerated Types
Attributes that are limited to a set of possible values are enumerated types. The general form of an enumerated type is:
attribute (value1 | value2 | value3 | ...)
where value1, value2, and so forth are allowed values for specified attribute.
Tokenized Types
Tokenized types are text strings that must follow certain rules for the format and content. The syntax for declaring an attribute as a tokenized type is:
attribute token
where token is the type of token being applied to the attribute. There are seven tokenized types, described in the following table:
| Tokenized Type |
Description |
| ID |
Used to create a unique identifier for an attribute. |
| IDREF |
Used to allow an attribute to reference the ID attribute from another element |
| IDREFS |
A list of ID references, separated by blank spaces |
| NMTOKEN |
A name token whose value is restricted to a valid XML name |
| NMTOKENS |
A list of name token references, separated by blank spaces |
| ENTITY |
A reference to an external file, usually one containing non-XML data |
| ENTITIES |
A list of entity references, separated by blank spaces |
Attribute Defaults
The final part of an attribute declaration is the attribute default. There are four possible defaults: #REQUIRED, #IMPLIED, a default value, and a fixed default value. The following table describes each of these possible values.
| Attribute Default |
Description |
| #REQUIRED |
The attribute must appear with every occurrence of the element |
| #IMPLIED |
The attribute is optional |
| "default" |
The attribute is optional. If an attribute value is not specified, a validating XML parser will supply the default value |
| #FIXED "default" |
The attribute is optional. If an attribute value is specified, it must match the default value |
Working with Entities
One of the strengths of XML is that document’s content can be stored in multiple files and in multiple formats. These storage units are called entities. The most fundamental entity is the XML document itself, known as the document entity, but entities can refer to other items as well, including:
- A text string
- A DTD
- An element or attribute declaration
- An external file containing character or binary data
Entities can be declared in a DTD. The syntax for declaring an entity depends on how the entity is classified. There are three factors involved in classifying an entity: 1) the content of the entity, 2) how the entity is constructed, and 3) where the definition of the entity is located. Let’s consider each of these in turn. An entity that is part of an XML document’s content is called a general entity. General entities are often used as placeholders for text strings that the author wants to repeat throughout the document or within other documents. An entity that is not part of the document’s content is called a parameter entity. Parameter entities are used to store the various declarations found in a DTD. Those declarations can then be shared among multiple documents.
If the entity is constructed using well-formed XML text, it is a parsed entity. The company’s address and phone number would be one such example. If the entity is constructed from non-XML data, it is an unparsed entity. A graphic image file would be an example of an unparsed entity.
Finally, if the entity can be defined with a text string within the document’s DTD, it’s an internal entity. If the definition relies on the content of an external file, particularly a non-XML file, it’s an external entity. The following table summarizes these entity types.
| Entity Classifications |
Description |
| What does the entity refer to? |
General vs. Parameter |
General entities are only used with the contents of an XML document. Parameter entities are used only with contents of a DTD. |
| How is the entity constructed? |
Parsed vs. Unparsed |
Parsed entities consist entirely of well-formed XML content. Unparsed entities are constructed from non-XML data, including non-text data. |
| Where is the entity located? |
Internal vs. External |
An internal entity is defined within a declaration in the document’s DTD. An external entity is defined in an external file. |
General Parsed Entities
Like elements and attributes, general entities are declared within the document’s DTD. The syntax for declaring a general internal entity is:
<!ENTITY entity "value">
where entity is the name you’ve assigned to the entity and value is the general entity’s value. The entity name follows the same rules that apply to all XML names: there can be no blank spaces in the name and the name must begin with either a letter or underscore. The entity value itself must be well-formed XML text. This can be a simple text string, or it can be a text string containing XML tags.
General External Entities
General entities can also refer to values located in external files. The advantage of an external entity is that it can be accessed by several XML documents, and if the entity is modified, those documents will automatically reflect that change. The syntax for declaring a general external entity is:
<!ENTITY entity SYSTEM "URL">
where URL indicates the location of the file containing the entity data.
Parameter Entities
The other type of entity is the parameter entity, which is used to store the content from DTDs. Parameter entities are declared using a form similar to general entities. For internal parameter entities, the syntax is:
<!ENTITY % entity "value">
where entity is the name of the parameter entity and value is a text string of the entity’s value. For external parameter entities, the syntax is:
<!ENTITY % entity SYSTEM "URL" >
where URL is the URL of the file containing the entity’s value. To reference a parameter entity within the DTD you use the syntax:
%entity;
where entity is the name assigned to the parameter entity.
|