Building Blocks of XML Documents
From the DTD point of view, all XML documents are made up by the following building blocks:
- Elements
- Attributes
- Entities
- PCDATA
- CDATA
Each of them is discussed below in details –
1) Elements:-
Elements are the main building blocks of XML documents. In a DTD, XML elements are declared with an element declaration with the following syntax:
<!ELEMENT element-name category> |
a. Empty Elements
Empty elements are declared with the category keyword EMPTY:
<!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example:
|
b. Elements with Parsed Character Data
Elements with only parsed character data are declared with #PCDATA inside parentheses:
<!ELEMENT element-name (#PCDATA)> Example: <!ELEMENT from (#PCDATA)> |
c. Elements with any Contents
Elements declared with the category keyword ANY, can contain any combination of parsable data:
<!ELEMENT element-name ANY> Example: <!ELEMENT note ANY> |
d. Elements with Children (sequences)
Elements with one or more children are declared with the name of the children elements inside parentheses:
<!ELEMENT element-name (child1)> Example: <!ELEMENT book (title, author, info, body)> |
When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the “note” element is:
<!ELEMENT book (title, author, info, body)> |
e. Declaring Only One Occurrence of an Element
The example below declares that the child element “genre” must occur once and only once inside the “book” element:
<!ELEMENT element-name (child-name)> Example: <!ELEMENT book (genre)> |
f. Declaring Minimum One Occurrence of an Element
The + sign in the example below declares that the child element “genre” must occur one or more times inside the “book” element:
<!ELEMENT element-name (child-name+)> Example: <!ELEMENT book (genre+)> |
g. Declaring Zero or More Occurrences of an Element
The * sign in the example below declares that the child element “message” can occur zero or more times inside the “note” element:
<!ELEMENT element-name (child-name*)> Example: <!ELEMENT book (genre*)> |
h. Declaring Zero or One Occurrence of an Element
The ? sign in the example below declares that the child element “message” can occur zero or one times inside the “note” element:
<!ELEMENT element-name (child-name?)> Example: <!ELEMENT genre (genre?)> |
i. Declaring either/or Content
The example below declares that the “book” element must contain a “title” element, a “author” element, a “info” element, and either a “genre” or a “body” element:
Example: <!ELEMENT book (title, author, info, genre|body))> |
j. Declaring Mixed Content
The example below declares that the “book” element can contain zero or more occurrences of parsed character data, “title”, “author”, “info”, or “message” elements:
Example: <!ELEMENT note (#PCDATA|title|author|info|genre)*> |
2) Attributes:-
Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following “payment” element has additional information about payment type:
|
The name of the element is “payment”. The name of the attribute is “type”. The value of attribute is “check”. Since the element itself is empty it is closed by a ” /”.
In a DTD, attributes are declared with an ATTLIST declaration. An attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value> DTD example: <!ATTLIST payment type CDATA “check”> XML example:
|
The attribute-type can be one of the following:
Type | Description |
CDATA | The value is character data |
(en1|en2|..) | The value must be one from an enumerated list |
ID | The value is a unique id |
IDREF | The value is the id of another element |
IDREFS | The value is a list of other ids |
NMTOKEN | The value is a valid XML name |
NMTOKENS | The value is a list of valid XML names |
ENTITY | The value is an entity |
ENTITIES | The value is a list of entities |
NOTATION | The value is a name of a notation |
xml: | The value is a predefined xml value |
The default-value can be one of the following:
Value | Explanation |
value | The default value of the attribute |
#REQUIRED | The attribute is required |
#IMPLIED | The attribute is not required |
#FIXED value | The attribute value is fixed |
A Default Attribute Value
In the example below, the “square” element is defined to be an empty element with a “width” attribute of type CDATA. If no width is specified, it has a default value of 0:
DTD: Valid XML: |
#REQUIRED
Syntax
<!ATTLIST element-name attribute-name attribute-type #REQUIRED> |
Example
DTD: Valid XML: Invalid XML: |
We use the #REQUIRED keyword if we don’t have an option for a default value, but still want to force the attribute to be present.
#IMPLIED
Syntax
<!ATTLIST element-name attribute-name attribute-type #IMPLIED> |
Example
DTD: Valid XML: Valid XML: |
We use the #IMPLIED keyword if we don’t want to force the author to include an attribute, and we don’t have an option for a default value.
#FIXED
Syntax
<!ATTLIST element-name attribute-name attribute-type #FIXED “value”> |
Example
DTD: Valid XML: Invalid XML: |
We use the #FIXED keyword when we want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error.
Enumerated Attribute Values
Syntax
<!ATTLIST element-name attribute-name (en1|en2|..) default-value> |
Example
DTD: XML example: |
We use enumerated attribute values when we want the attribute value to be one of a fixed set of legal values.
3) Entities:-
Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. Most of us know the HTML entity: ” “. This “no-breaking-space” entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
Entity References | Character |
< | < |
> | > |
& | & |
“ | “ |
‘ | ‘ |
4) PCDATA:-
PCDATA means parsed character data. We can think of character data as the text found between the start tag and the end tag of an XML element. It is the text that will be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively.
5) CDATA:-
CDATA means character data. CDATA is the text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.