XFox Prower's XML Tutorial: Page 1

What is XML?

XML is a structure language intended for organizing data. It was created after HTML and shares several similarities. It's the core of many languages including XHTML, XSLT, RSS, SVG, XUL, and many more. Before learning any of those, you'll need to know the basics of XML, which is why this tutorial was created.


Creating The XML File

Open up your favorite plain text editor (Notepad, Simpletext, Nano, etc) and save the file as page.xml ("page" can be any valid name you choose, but the "xml" file extension will be necessary). You can open the XML file in a web browser such as FireFox. Since the file contains no data yet, it will be normal to see an error. With both the browser and text editor open, you can save your changes to the file from the text editor and "refresh" the browser to show the changes (F5 key in FireFox).


Error Handling

Your XML file must be well-formed in order for it to be vieweable. This means that it must not encounter any problems when being parsed by the XML processor (web browser in this case). If you made a mistake and the XML file is not well-formed, the processor must halt at the error. It will not continue beyond that error. Rather than displaying your XML file, the processor will display an error message, along with the line and column of where the error happened. The error will usually be self-exlanatory and enough for you to know what went wrong. Repeat this process until all errors are gone.


The XML Declaration

Now that the file has been created and saved, the first thing we enter is the XML Declaration. This goes on line 1 column 1 goes on the XML file.

<?xml version="1.0" encoding="iso-8859-1"?>

The XML Declaration tells the viewer which version of XML to parse the page as. Currently the only version is 1.0 (later versions are currently working drafts). We're also using the XML Declaration to declare the character encoding. Other encodings exist but this is enough to get started.


The DOCTYPE

The DOCTYPE (also known as the "Document Type Definition" or DTD) goes on the second line of the file. The purpose of the DOCTYPE is to give the browser a vocabulary of what tags are allowed to be used on the page and where they can be used. If you're using this tutorial for XML, you'll make your own DOCTYPE. If you're going to be working with XHTML, you'll be using the one of the XHTML DOCTYPE which will be explained in the XHTML Tutorial. your DTD will allow you to define different rules and entities for your XML file. You'll learn how to create a DTD for your files later on.


Tags

Tags are what defines the different sections for the data in the XML file. For every opening tag you create, there must be a closing tag for it to end that section. Here is what a simple opening tag looks like:

<tag>

This opening tag starts with a less-than sign (<), followed immediately by the name of the tag (tag), followed by the greater-than sign. (>). And here is what the closing tag looks like:

</tag>

The closing tag starts with a less than sign (<), followed immediately by a forward slash (/, followed immediately by the tag name (tag), followed by the greater-than sign (>). The difference between the opening tag and the closing tag is that the closing tag has a forward slash (/)before the tag name. Be sure that the tag name in the closing tag matches the tag name in the opening tag. They are case-sensitive.


Root Tag

The first tag in the XML file is the root tag. All other tags in the page must be contained inside this tag (children of the root). In other words, the root tag is the first to open and the last to close.

<?xml version="1.0" encoding="iso-8859-1"?>
<ReleaseDates>
</ReleaseDates>


Qualified Names

In XML, you create your own tags. This means there is no predefined list of tags (although languages based on XML such as XHTML will have their own predefined list of tags). The names you give to your tags are not only important for making your XML file easy to understand and maintain, but their names must also be valid. Here is a list of things you should know when naming your tags:


Content Between Tags

Above, we taught you about tags that open and close. Inside tags like these (between the opening and closing tags), you can put more content. The name of the tag you opened should reflect purpose of the content inside it. Here is an example:

<greeting>
Welcome to The Tails Archive!
</greeting>

And of course, you can put more tags inside of open tags to further simplify things. This is called "Nesting"

<greeting>
	<line1>Welcome To The Tails Archive!</line1>
	<line2>All Tails Fans are welcome here ^_^.</line2>
</greeting>

All tags in the XML file are "related". In the above example, <line1> and <line2> are both children of <greeting>. They both share the same parent. This means <line1> and <line2> are siblings.


Well-formed Nested Tags.

A basic rule of XML is that you should always have the same number of opening tags as you do closing tags. And when you nest tags, you must always make sure you close the most recently opened tag first before closing anything else. Otherwise, the XML processor will halt with an error about an open tag not matching the closing tat that follows, or a closing tag for a tag that isn't currently open.

Correct:

<a>
	<b></b>
</a>

Incorrect:

<a>
	<b></a>
</b>


Whitespace

Whitespace defined as characters you cannot see. These are: Spaces, Tabs, and Newline. As long as your XML is well-formed, you can use whitespace between your tags to make it easier to understand and edit. You can indent nested tags and the structre will still remain well -formed. These are all valid:

<a><b></b></a>
<a><b>
</b></a>
<a>
	<b>
	</b>
</a>


Attributes

Attributes allow you to give more "flavor" to a tag without having to use a different tag name for different uses. Here's an example of an attribute being used in a tag:

<game title="Tails Adventures"></game>

In the above example, "title" is the attribute. The value "Tails Adventures" is used. This enables us to use multiple <game> tags for easier organization. You can also have multiple attributes.

	<game title="Tails Adventures" year="1995">
		<review author="TailsFan" date="2005/Apr/17th">This is the best Tails game ever!</review>
	</game>

It's very important that the value of each attribute you use is enclosed in quotation marks. Unlike tag names, attribute values are allowed to contain spaces, slashes, and many other characters.

Here are the rules of attributes:


Empty Tags

You should be familiar with the concept of all opening tags requiring a closing tag. There is another type of tag called an "Empty Tag." In Empty Tags, the opening tag is also the closing tag. An Empty Tag does not contain data inside it because it doesn't have a separate closing tag. They can still use attributes. If you see opening tags as positive and closing tags as negative, then picture Empty Tags as neutral. Here's an example:

<magazine title="Nintendo Power" volume="189">
	<bookmark page="31" section="Star Fox Assault" />
</magazine>

Empty Tags have a slash at the end to designate the closing. If you forget the slash, the page will fail to parse as it expects to find an end tag.


Entities

You may have noticed that using characters such as <, >, and & caused errors in your XML file. That is because they are reserved for markup. The angle brackets are used for tags, and shouldn't occur as data anywhere in the document. The ampersand (&) is reserved for entities. You can use entities to express any character using character codes or defined names. Here are the predefined entity names in XML (and this is what you must do in order to use reserved characters in the page).

Use &lt; for < (Less Than).
Use &gt; for > (Greater Than).
Use &amp; for & (Ampersand).
Use &quot; for " (Quotation marks).
Use &apos; for ' (Apostrophe).

All named entities start with an ampersand (&), followed by the name or character code, and ending with a semicolon (;). The above example uses entity names which are predefined in XML. You can (and must for reserved characters) use entities for content between tags as well as values of attributes attributes. Reserved characters and entities are not allowed in tag names or attribute names.

To use character codes in entities, you can specify them by the values by ascii or hex. They are the same values you'd expect to find from a hex editor.

You can use &#64; to represent the @ character using ascii.
You can use &#x40; to represent the @ character using hex.

For all the types of entities, they start with the ampersand and end with a semicolon. for named entities, the name goes in between. For entities by character codes, you use the pound sign (#). Following the pound sign is the character code. If the code is hex, put an x before the number. Otherwise, it is interpreted as ascii.

Later on, when we get into creating the doctype, you will learn how to make your own named entities. This will allow you to use an entity name you choose to represent a character or set of characters (such as a sentence) which you can easily insert anywhere in the page.


Comments

You can enter comments (notes separate from the XML data) into your XML document. They'll be visible but won't affect the markup. For formatted XML languages (built upon XML) such as XHTML and XSL, comments will not be displayed but will still be visible in the source. Comments may be useful to help leave notes if your markup gets complex. You may use comments inside any tag. Comments start and end differently than tags:

<tag>
<!-- Comments inside -->
</tag>

Comments start with <!-- and end with -->. Anything including <, >, and & may appear inside comments as they're ignored by the XML parser. However, -- may not appear inside a comment except for ending the comment or the document would not be well-formed.


CDATA Sections

For areas in the document that must not be be parsed by the XML parser, you may mark them as CDATA sections. By default, all content is parsed as XML. These sections are called Parsed Character Data, or PCDATA. But inside a CDATA section, all data is treated a literal text. Entities and tags are not parsed inside these sections and therefore will not cause any non-well-formed errors. A CDATA section looks like this:

<tag>
	<![CDATA[ <ooh>Text</ooh> ]]>
</tag>
<tag>
	<![CDATA[ <><> blah -- ]]>
</tag>
<script type="text/javascript">
<![CDATA[
	for(i=0;i<4;i++)
		{
		}
]]>
</script>

CDATA sections start with <![CDATA[ and end with ]]>. CDATA Can contain any character, but the character sequence ]] cannot occur anywhere except when closing the CDATA section. This is similar to the syntax of comments.

In XHTML for example, Javascript and CSS2 may be used internally. See the 3rd example above for some javascript. As PCDATA (default), < and > must be used only for start and end tags while& is reserved for entities. If these characters would appear in any other form, the document wouldn't be well-formed. In that case, the contents of the tag would need to be enclosed in a CDATA section so that they are not parsed as XML.



Back to Web Tutorials
Back to Web Development
Home

Valid XHTML 1.1! Valid CSS!