XML

XML is the latest buzzword in web development. It is supposed to solve everyone's problems, be really simple and easy to use, and have a lot of power.

Well…it won't solve all your problems, but it does harbour a lot of flexibility.

The whole point of XML is that it is what you want it to be. It is a markup language, that can be customised by you. "How then", I hear you ask, "can it be shown by browsers, if it uses tags that they've never seen before?". Well, frankly, it isn't. XML on its own can't really be presented in a meaningful way by a browser. For that, you need XSLT to transform it into XHTML; I will cover this in a later article.

Enough of the talk, let's get down to some code!

All you need to declare an XML document, is the following line, and a MIME type of "text/xml"

<?xml version="1.0" encoding="ISO-8859-1"?>

Once you've done that, you can then proceed to use whatever tags you like, in whatever structure you like, providing you follow the basic rules of XML (outlined in the XHTML tutorial). (All tags and attributes are lower-case, tags should always be closed, or self-closing, all attribute values should be enclosed in double-quotation-marks, and should all have a value.)

e.g.

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>rakaz</title>
<link>http://www.rakaz.nl/nucleus/</link>
<description></description>
<language>en-us</language>
<generator>Nucleus v3.15</generator>
<copyright>Copyright 2005, Niels Leenheer</copyright>
<category>Weblog</category>
<item>
<title>Nucleus skins</title>
<link>http://www.rakaz.nl/nucleus/index.php?itemid=49</link>
<description>...</description>
<category>nucleus</category>
<comments>http://www.rakaz.nl/nucleus/index.php?itemid=49</comments>
<pubDate>Thu, 3 Feb 2005 22:01:05 +0100</pubDate>
</item>
</channel>
</rss>

That is XML used to create an RSS feed for Rakaz's blog

There are, however, more things you can do with XML. Namespaces allow you to effortlessly mix several different markup languages in one document. Using a namespace, you can put XHTML markup in an XML document, and vice-versa. To use a namespace, you must first declare it.

xmlns:namespacename="http://www.w3.org/TR/html4/"

That code usually goes into the first tag in the document. 'namespacename' is the name of the namespace, and the URI following it should be unique. It isn't actually used to do anything but identify the namespace. You may have multiple namespaces declared, just as long as they don't have the same name or URI.

To use the namespace, you prefix the namespace name onto any tag you want in that namespace, with a colon.

<namespacename:tagname>...</namespacename:tagname>

You may also use the default namespace format.

xmlns="http://www.w3.org/TR/html4/"

That code would be put into a tag, and that tag, and all its children would then belong in the specified namespace. This saves us from having to type the namespace name in front of all the tags.

So, how can we use this to embed XHTML in XML? Well, we would have to declare the XHTML namespace: xmlns:xhtml="http://www.w3.org/TR/xhtml11/"

We would then have to prefix all XHTML elements with the 'xhtml' prefix.

e.g.

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0" xmlns:xhtml="http://www.w3.org/TR/xhtml11/">
<channel>
<title>rakaz</title>
<link>http://www.rakaz.nl/nucleus/</link>
<description></description>
<language>en-us</language>
<generator>Nucleus v3.15</generator>
<copyright>Copyright 2005, Niels Leenheer</copyright>
<category>Weblog</category>
<item>
<title>Nucleus skins</title>
<link>http://www.rakaz.nl/nucleus/index.php?itemid=49</link>
<description><xhtml:p>This is some <xhtml:strong>sample</xhtml:strong> text.</xhtml:p></description>
<category xmlns="http://www.w3.org/TR/xhtml11/"><strong>default namespace usage</strong></category>
<comments>http://www.rakaz.nl/nucleus/index.php?itemid=49</comments>
<pubDate>Thu, 3 Feb 2005 22:01:05 +0100</pubDate>
</item>
</channel>
</rss>

Another advanced feature of XML is the CDATA syntax. This nifty little mabob allows you to put markup into an XML document, but have it appear as plain text. This is useful for preventing people from using XHTML in blog comment syndications, for example. To use CDATA, just surround the required markup with <![CDATA[, and ]]>.

e.g.

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>rakaz</title>
<link>http://www.rakaz.nl/nucleus/</link>
<description></description>
<language>en-us</language>
<generator>Nucleus v3.15</generator>
<copyright>Copyright 2005, Niels Leenheer</copyright>
<category>Weblog</category>
<item>
<title>Nucleus skins</title>
<link>http://www.rakaz.nl/nucleus/index.php?itemid=49</link>
<description><![CDATA[<description>The XML tags in <item>this</item> section won't be parsed, preventing unwanted markup from accidentally messing up newsreaders, etc.</description>]]></description>
<category>nucleus</category>
<comments>http://www.rakaz.nl/nucleus/index.php?itemid=49</comments>
<pubDate>Thu, 3 Feb 2005 22:01:05 +0100</pubDate>
</item>
</channel>
</rss>

Hopefully this little XML tutorial has been of some use. Next time, I will try and give a fairly useful article on XSLT, even though it is a hugely complex language, and I haven't learnt it all yet :P