Building blocks of XML documents

Building Blocks of XML Documents

From the DTD point of view, all XML documents are made up by the following building blocks:

Elements
Attributes
Entities
PCDATA
CDATA

Each of them is discussed below in details –

1) Elements:-

Elements are the main building blocks of XML documents. In a DTD, XML elements are declared with an element declaration with the following syntax:

<!ELEMENT element-name category>
or
<!ELEMENT element-name (element-content)>

a. Empty Elements

Empty elements are declared with the category keyword EMPTY:

<!ELEMENT element-name EMPTY>

Example:

<!ELEMENT br EMPTY>

XML example:

b. Elements with Parsed Character Data

Elements with only parsed character data are declared with #PCDATA inside parentheses:

<!ELEMENT element-name (#PCDATA)>

Example:

<!ELEMENT from (#PCDATA)>

c. Elements with any Contents

Elements declared with the category keyword ANY, can contain any combination of parsable data:

<!ELEMENT element-name ANY>

Example:

<!ELEMENT note ANY>

d. Elements with Children (sequences)

Elements with one or more children are declared with the name of the children elements inside parentheses:

<!ELEMENT element-name (child1)>
or
<!ELEMENT element-name (child1, child2,…)>

Example:

<!ELEMENT book (title, author, info, body)>

When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the “note” element is:

<!ELEMENT book (title, author, info, body)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT info (#PCDATA)>
<!ELEMENT body (#PCDATA)>

e. Declaring Only One Occurrence of an Element

The example below declares that the child element “genre” must occur once and only once inside the “book” element:

<!ELEMENT element-name (child-name)>

Example:

<!ELEMENT book (genre)>

f. Declaring Minimum One Occurrence of an Element

The + sign in the example below declares that the child element “genre” must occur one or more times inside the “book” element:

<!ELEMENT element-name (child-name+)>

Example:

<!ELEMENT book (genre+)>

g. Declaring Zero or More Occurrences of an Element

The * sign in the example below declares that the child element “message” can occur zero or more times inside the “note” element:

<!ELEMENT element-name (child-name*)>

Example:

<!ELEMENT book (genre*)>

h. Declaring Zero or One Occurrence of an Element

The ? sign in the example below declares that the child element “message” can occur zero or one times inside the “note” element:

<!ELEMENT element-name (child-name?)>

Example:

<!ELEMENT genre (genre?)>

i. Declaring either/or Content

The example below declares that the “book” element must contain a “title” element, a “author” element, a “info” element, and either a “genre” or a “body” element:

Example:

<!ELEMENT book (title, author, info, genre|body))>

j. Declaring Mixed Content

The example below declares that the “book” element can contain zero or more occurrences of parsed character data, “title”, “author”, “info”, or “message” elements:

Example:

<!ELEMENT note (#PCDATA|title|author|info|genre)*>

2) Attributes:-

Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following “payment” element has additional information about payment type:

The name of the element is “payment”. The name of the attribute is “type”. The value of attribute is “check”. Since the element itself is empty it is closed by a ” /”.

In a DTD, attributes are declared with an ATTLIST declaration. An attribute declaration has the following syntax:

<!ATTLIST element-name attribute-name attribute-type default-value>

DTD example:

<!ATTLIST payment type CDATA “check”>

XML example:

The attribute-type can be one of the following:

Type	Description
CDATA	The value is character data
(en1\|en2\|..)	The value must be one from an enumerated list
ID	The value is a unique id
IDREF	The value is the id of another element
IDREFS	The value is a list of other ids
NMTOKEN	The value is a valid XML name
NMTOKENS	The value is a list of valid XML names
ENTITY	The value is an entity
ENTITIES	The value is a list of entities
NOTATION	The value is a name of a notation
xml:	The value is a predefined xml value

The default-value can be one of the following:

Value	Explanation
value	The default value of the attribute
#REQUIRED	The attribute is required
#IMPLIED	The attribute is not required
#FIXED value	The attribute value is fixed

A Default Attribute Value

In the example below, the “square” element is defined to be an empty element with a “width” attribute of type CDATA. If no width is specified, it has a default value of 0:

DTD:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA “0”>

Valid XML:

#REQUIRED

Syntax

<!ATTLIST element-name attribute-name attribute-type #REQUIRED>

Example

DTD:
<!ATTLIST person number CDATA #REQUIRED>

Valid XML:

Invalid XML:

We use the #REQUIRED keyword if we don’t have an option for a default value, but still want to force the attribute to be present.

#IMPLIED

Syntax

<!ATTLIST element-name attribute-name attribute-type #IMPLIED>

Example

DTD:
<!ATTLIST contact fax CDATA #IMPLIED>

Valid XML:

We use the #IMPLIED keyword if we don’t want to force the author to include an attribute, and we don’t have an option for a default value.

#FIXED

Syntax

<!ATTLIST element-name attribute-name attribute-type #FIXED “value”>

Example

DTD:
<!ATTLIST sender company CDATA #FIXED “Microsoft”>

Valid XML:

Invalid XML:

We use the #FIXED keyword when we want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error.

Enumerated Attribute Values

Syntax

<!ATTLIST element-name attribute-name (en1|en2|..) default-value>

Example

DTD:
<!ATTLIST payment type (check|cash) “cash”>

XML example:
or

We use enumerated attribute values when we want the attribute value to be one of a fixed set of legal values.

3) Entities:-

Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. Most of us know the HTML entity: ” “. This “no-breaking-space” entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.

The following entities are predefined in XML:

Entity References	Character
<	<
>	>
&	&
“	“
‘	‘

4) PCDATA:-

PCDATA means parsed character data. We can think of character data as the text found between the start tag and the end tag of an XML element. It is the text that will be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively.

5) CDATA:-

CDATA means character data. CDATA is the text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.