XML – Understanding XML Syntax

Understanding XML SyntaxUnderstanding XML Syntax

XML languages use tags to mark up text. As a web developer, you’re probably familiar with the concept of text markup:

<p>Here is an introduction to XML.</p>

The previous line is XHTML, but it’s XML too. In XHTML, you know that the tag indicates a paragraph of text. All tags in XHTML have predefined meanings.

XML allows you to create your own tags, so you can rewrite the previous markup as follows:

<intro>Here is an introduction to XML.</intro>

In this example, the tag tells you the purpose of the text it marks. A big advantage of XML is that tags can describe their content – this is why XML languages are often called self-descriptive. 

XML is flexible enough to allow the creation of many types of languages to describe data. The only constraint on XML vocabularies is that they are well formed.

Well-Formed Documents

XML documents are well-formed if they meet the following criteria:

• Document contains one or more elements.

• Documents contain a single document element. group.

• Each element closes properly.

• Elements are case sensitive.

• Attribute values are enclosed in quotation marks and cannot be left blank.

I will describe all of these criteria in this chapter, but it is very important now a few points. XML languages take registers; this means that the tag is not, like or . In XML, these are three different tags. Until the days of XHTML, HTML ignored the register, soandwere equivalent tags.

All XML tags must have an adequate closing tag in the same case as the opening tag. Thus, the tag must have a corresponding tag. If there are opening and closing tags that do not contain content, you can shorten it to a separate tag. Again, the contrast is with HTML, where you could write a single

tag to add a paragraph break. The order of the tags is important in XML. The tags that open first must be closed:

<chapter><intro>Here is an introduction to XML.</intro></chapter>

There was no such requirement in HTML pages. In HTML, the following would be correct, even if it is inconvenient in XML:

<p><strong>Paragraph text</p></strong>

In XML, attributes always use quotation marks around their values:

<intro type="chapter">

It doesn’t matter if it’s one or two quotation marks, but they should be present. This was not required in HTML. Similarly, some HTML attributes, such as the nowrap attribute <td> tag, do not need to include a pair of attributes and values:

<td nowrap>A table cell</td>

This type of tag design is not possible in XML. You must need to replace it with something like this:

<td nowrap="true">A table cell</td>