Friends

Wednesday, October 26, 2011

XML Tutorial

What is XML? XML (eXtensible Markup Language) is a meta-language; that is, it is a language in which other languages are created. In XML, data is "marked up" with tags, similar to HTML tags. In fact, the latest version of HTML, called XHTML, is an XML-based language, which means that XHTML follows the syntax rules of XML.
XML is used to store data or information. This data might be intended to be by read by people or by machines. It can be highly structured data such as data typically stored in databases or spreadsheets, or loosely structured data, such as data stored in letters or manuals.

XML Benefits

Initially XML received a lot of excitement, which has now died down some. This isn't because XML is not as useful, but rather because it doesn't provide the Wow! factor that other technologies, such as HTML do. When you write an HTML document, you see a nicely formatted page in a browser - instant gratification. When you write an XML document, you see an XML document - not so exciting. However, with a little more effort, you can make that XML document sing!
This section discusses some of the major benefits of XML.

XML Holds Data, Nothing More

XML does not really do much of anything. Rather, developers can create XML-based languages that store data in a structure way. Applications can then use this data to do any number of things.

XML Separates Structure from Formatting

One of the difficulties with HTML documents, word processor documents, spreadsheets, and other forms of documents is that they mix structure with formatting. This makes it difficult to manage content and design, because the two are intermingled.
As an example, in HTML, there is a <u> tag used for underlining text. Very often, it is used for emphasis, but it also might be used to mark a book title. It would be very difficult to write an application that searched through such a document for book titles.
In XML, the book titles could be placed in <book_title> tags and the emphasized text could be place in <em> tags. The XML document does not specify how the content of either tag should be displayed. Rather, the formatting is left up to an external stylesheet. Even though the book titles and emphasized text might appear the same, it would be relatively straight forward to write an application that finds all the book titles. It would simply look for text in <book_title> tags. It also becomes much easier to reformat a document; for example, to change all emphasized text to be italicized rather than underlined, but leave book titles underlined.

XML Promotes Data Sharing

Very often, applications that hold data in different structures must share data with one another. It can be very difficult for a developer to map the different data structures to each other. XML can serve as a go between. Each application's data structure is mapped to an agreed-upon XML structure. Then all the applications share data in this XML format. Each application only has to know two structures, its own and the XML structure, to be able to share data with many other applications.

XML is Human-Readable

XML documents are (or can be) read by people. Perhaps this doesn't sound so exciting, but compare it to data stored in a database. It is not easy to browse through a database and read different segments of it as you would a text file. Take a look at the XML document below.

Code Sample: XMLBasics/Demos/Paul.xml

<?xml version="1.0"?>
<person>
 <name>
  <firstname>Paul</firstname>
  <lastname>McCartney</lastname>
 </name>
 <job>Singer</job>
 <gender>Male</gender>
</person>
Code Explanation
It is not hard to tell from looking at this that the XML is describing a person named Paul McCartney, who is a singer and is male.
Do people read XML documents? Programmers do (hey, we're people too!). And it is easier for us if the documents we work with are easy to read.

XML is Free

XML doesn't cost anything to use. It can be written with a simple text editor or one of the many freely available XML authoring tools, such as XML Notepad. In addition, many web development tools, such as Dreamweaver and Visual Studio .NET have built-in XML support. There are also many free XML parsers, such as Microsoft's MSXML (downloadable from microsoft.com) and Xerces (downloadable at apache.org).

XML in Practice

Content Management

Almost all of the leading content management systems use XML in one way or another. A typical use would be to store a company's marketing content in one or more XML documents. These XML documents could then be transformed for output on the Web, as Word documents, as PowerPoint slides, in plain text, and even in audio format. The content can also easily be shared with partners who can then output the content in their own formats.
Storing the content in XML makes it much easier to manage content for two reasons.
  1. Content changes, additions, and deletions are made in a central location and the changes will cascade out to all formats of presentation. There is no need to be concerned about keeping the Word documents in sync with the website, because the content itself is managed in one place and then transformed for each output medium.
  2. Formatting changes are made in a central location. To illustrate, suppose a company had many marketing web pages, all of which were produced from XML content being transformed to HTML. The format for all of these pages could be controlled from a single XSLT and a sitewide formatting change could be made modifying that XSLT.

Web Services

XML Web services are small applications or pieces of applications that are made accessible on the Internet using open standards based on XML. Web services generally consist of three components:
  • SOAP - an XML-based protocol used to transfer Web services over the Internet.
  • WSDL (Web Services Description Language) - an XML-based language for describing a Web service and how to call it.
  • Universal Discovery Description and Integration (UDDI) - the yellow pages of Web services. UDDI directory entries are XML documents that describe the Web services a group offers. This is how people find available Web services.

RDF / RSS Feeds

RDF (Resource Description Framework) is a framework for writing XML-based languages to describe information on the Web (e.g, web pages). RSS (RDF Site Summary) is an implementation of this framework; it is a language that adheres to RDF and is used to describe web content. Website publishers can use RSS to make content available as a "feed", so that web users can access some of their content without actually visiting their site. Often, RSS is used to provide summaries with links to the company's website for additional information.

XML Documents

An XML document is made up of the following parts.
  • An optional prolog.
  • A document element, usually containing nested elements.
  • Optional comments or processing instructions.

The Prolog

The prolog of an XML document can contain the following items.
  • An XML declaration
  • Processing instructions
  • Comments
  • A Document Type Declaration

The XML Declaration

The XML declaration, if it appears at all, must appear on the very first line of the document with no preceding white space. It looks like this.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
This declares that the document is an XML document. The version attribute is required, but the encoding and standalone attributes are not. If the XML document uses any markup declarations that set defaults for attributes or declare entities then standalone must be set to no.

Processing Instructions

Processing instructions are used to pass parameters to an application. These parameters tell the application how to process the XML document. For example, the following processing instruction tells the application that it should transform the XML document using the XSL stylesheet beatles.xsl.
<?xml-stylesheet href="beatles.xsl" type="text/xsl"?>
As shown above, processing instructions begin with and <? end with ?>.

Comments

Comments can appear throughout an XML document. Like in HTML, they begin with <!-- and end with -->.
<!--This is a comment-->

A Document Type Declaration

The Document Type Declaration (or DOCTYPE Declaration) has three roles.
  1. It specifies the name of the document element.
  2. It may point to an external Document Type Definition (DTD).
  3. It may contain an internal DTD.
The DOCTYPE Declaration shown below simply states that the document element of the XML document is beatles.
<!DOCTYPE beatles>
If a DOCTYPE Declaration points to an external DTD, it must either specify that the DTD is on the same system as the XML document itself or that it is in some public location. To do so, it uses the keywords SYSTEM and PUBLIC. It then points to the location of the DTD using a relative Uniform Resource Indicator (URI) or an absolute URI. Here are a couple of examples.
Syntax
<!--DTD is on the same system as the XML document-->
<!DOCTYPE beatles SYSTEM "dtds/beatles.dtd">
Syntax
<!--DTD is publicly available-->
<!DOCTYPE beatles PUBLIC "-//Webucator//DTD Beatles 1.0//EN"
     "http://www.webucator.com/beatles/DTD/beatles.dtd">
As shown in the second declaration above, public identifiers are divided into three parts:
  1. An organization (e.g, Webucator)
  2. A name for the DTD (e.g, Beatles 1.0)
  3. A language (e.g, EN for English)

Elements

Every XML document must have at least one element, called the document element. The document element usually contains other elements, which contain other elements, and so on. Elements are denoted with tags. Let's look again at the Paul.xml.

Code Sample: XMLBasics/Demos/Paul.xml

<?xml version="1.0"?>
<person>
 <name>
  <firstname>Paul</firstname>
  <lastname>McCartney</lastname>
 </name>
 <job>Singer</job>
 <gender>Male</gender>
</person>
Code Explanation
The document element is person. It contains three elements: name, job and gender. Further, the name element contains two elements of its own: firstname and lastname. As you can see, XML elements are denoted with tags, just as in HTML. Elements that are nested within another element are said to be children of that element.

Empty Elements

Not all elements contain other elements or text. For example, in XHTML, there is an img element that is used to display an image. It does not contain any text or elements within it, so it is called an empty element. In XML, empty elements must be closed, but they do not require a separate close tag. Instead, they can be closed with a forward slash at the end of the open tag as shown below.
<img src="images/paul.jpg"/>
The above code is identical in funciton to the code below.
<img src="images/paul.jpg"></img>

Attributes

XML elements can be further defined with attributes, which appear inside of the element's open tag as shown below.
Syntax
<name title="Sir">
 <firstname>Paul</firstname>
 <lastname>McCartney</lastname>
</name>

CDATA

Sometimes it is necessary to include sections in an XML document that should not be parsed by the XML parser. These sections might contain content that will confuse the XML parser, perhaps because it contains content that appears to be XML, but is not meant to be interpreted as XML. Such content must be nested in CDATA sections. The syntax for CDATA sections is shown below.
Syntax
<![CDATA[
 This section will not get parsed
 by the XML parser.
]]>

White Space

In XML data, there are only four white space characters.
  1. Tab
  2. Line-feed
  3. Carriage-return
  4. Single space
There are several important rules to remember with regards to white space in XML.
  1. White space within the content of an element is significant; that is, the XML processor will pass these characters to the application or user agent.
  2. White space in attributes is normalized; that is, neighboring white spaces are condensed to a single space.
  3. White space in between elements is ignored.

xml:space Attribute

The xml:space attribute is a special attribute in XML. It can only take one of two values: default and preserve. This attribute instructs the application how to treat white space within the content of the element. Note that the application is not required to respect this instruction.

XML Syntax Rules

XML has relatively straightforward, but very strict, syntax rules. A document that follows these syntax rules is said to be well-formed.
  1. There must be one and only one document element.
  2. Every open tag must be closed.
  3. If an element is empty, it still must be closed.
    • Poorly-formed: <tag>
    • Well-formed: <tag></tag>
    • Also well-formed: <tag />
  4. Elements must be properly nested.
    • Poorly-formed: <a><b></a></b>
    • Well-formed: <a><b></b></a>
  5. Tag and attribute names are case sensitive.
  6. Attribute values must be enclosed in single or double quotes.

Special Characters

There are five special characters that can not be included in XML documents. These characters are replaced with predefined entity references as shown in the table below.
Special Characters
Character Entity Reference
< &lt;
> &gt;
& &amp;
" &quot;
' &apos;

Creating a Simple XML File

The following is relatively simple XML file describing the Beatles.

Code Sample: XMLBasics/Demos/Beatles.xml

<?xml version="1.0"?>
<beatles>
 <beatle link="http://www.paulmccartney.com">
  <name>
   <firstname>Paul</firstname>
   <lastname>McCartney</lastname>
  </name>
 </beatle>
 <beatle link="http://www.johnlennon.com">
  <name>
   <firstname>John</firstname>
   <lastname>Lennon</lastname>
  </name>
 </beatle>
 <beatle link="http://www.georgeharrison.com">
  <name>
   <firstname>George</firstname>
   <lastname>Harrison</lastname>
  </name>
 </beatle>
 <beatle link="http://www.ringostarr.com">
  <name>
   <firstname>Ringo</firstname>
   <lastname>Starr</lastname>
  </name>
 </beatle>
 <beatle link="http://www.webucator.com" real="no">
  <name>
   <firstname>Nat</firstname>
   <lastname>Dunn</lastname>
  </name>
 </beatle>
</beatles>
  1. Open XMLBasics/Exercises/Xml101.xml
  2. Add a required prerequisite: "Experience with computers".
  3. Add the following to the topics list:
    • XML Documents
      • The Prolog Elements
      • Attributes
      • CDATA
      • XML Syntax Rules
      • Special Characters
    • Creating a Simple XML File
  4. Add a modifications element that shows the modifications you've made.

Code Sample: XMLBasics/Exercises/Xml101.xml

<?xml version="1.0"?>
<course>
 <head>
  <title>Introduction to XML</title>
  <course_num>XML101</course_num>
  <course_length>3 days</course_length>
 </head>
 <body>
  <prerequisites>
   <prereq>Experience with Word Processing</prereq>
   <prereq optional="true">Experience with HTML</prereq>
   <!-- Add a required prerequisite: "Experience with computers"  -->
  </prerequisites>
  <outline>
   <topics>
    <topic>XML Basics
     <topics>
      <topic>What is XML?</topic>
      <topic>XML Benefits
       <topics>
        <topic>XML Holds Data, Nothing More</topic>
        <topic>XML Separates Structure from Formatting</topic>
        <topic>XML Promotes Data Sharing</topic>
        <topic>XML is Human-Readable</topic>
        <topic>XML is Free</topic>
       </topics>
      </topic>
      <!-- 
       Add the following to the topics list ("XML Documents" and "Creating a Simple XML File" should be siblings of "What is XML?" and "XML Benefits"):
       
       -XML Documents
        -The Prolog
        -Elements
        -Attributes
        -CDATA
        -XML Syntax Rules
        -Special Characters
       -Creating a Simple XML File
       -->
     </topics>
    </topic>
   </topics>
  </outline>
 </body>
 <foot>
  <creator>Josh Lockwood</creator>
  <date_created>2002-07-25</date_created>
  <modifications madeby="Colby Germond" date="2003-05-05">
   <modification type="insert">Added HTML prerequisite</modification>
   <modification type="edit">Fixed some typos</modification>
  </modifications>
  <!-- 
   Add a modifications element that shows the modifications you've made.
   -->
 </foot>
</course>

0 comments:

Post a Comment

#
### ###