Chapter 2. Technologies in XML 19
only after it has parsed the whole document. However, once the document has
been created in memory, it can be navigated and changed. A DOM parser would
be a tree-based parser.
Event-based parsing
These parsers process the document as it encounters the tags of the document.
It is a data-centric view of the XML. Whenever an element or tag is encountered,
it (or its contents) can be processed. However, it cannot backtrack once the tag
has been passed. The parser returns the element, its attributes and the contents.
The event-based parser never attempts to build a structure of the data, and
therefore, its memory requirements are less. It comes in useful, when one is
looking in the document only for certain elements. A SAX parser would be an
example of a event-based parser.
The most popular XML parsers on the market is the Apache XML Project’s
Xerces. The parsers provides XML parsing and generation, and are
fully-validating parsers available for both Java and C++, implementing the W3C
XML and DOM (Level 1 and 2) standards, as well as SAX (Level 2) standard. The
parsers also support for XML Schema
. This parser has been incorporated into
the IBM set of products (WebSphere, Application Studio and DB2).
Another parser is IBM’s XML Parser for Java (XML4J and XML4C). The XML4J is
a validating XML parser written in 100% pure Java, whereas XML4C is a
validating XML parser written for C++. It provides classes for parsing, generating,
manipulating, and validating XML documents. Both parsers are support the XML
1.0 Recommendation and associated standards (DOM 1.0, SAX 1.0, DOM 2.0).
XML4J contains implementations of the DOM Level 2, the SAX Level 2
implementations, and parts of W3C schema, but these are experimental at this
stage. XML4C is supported on most operating systems including AIX and Linux.
Both parsers are open source and have the same code base, where the XML4J
parser has the latest code enhancements, while Xerces has been through
production level testing.
2.2 DTD and XML Schema
DTDs and XML Schema are both used to describe structured information,
however, in the last two years acceptance of XML Schema has gained
momentum. Both DTDs and schemas are building blocks for XML documents
and consists of elements, tags, attributes, and entities
XML Schemas evolved to overcome limitations in DTDs. W3C has three
documents published, the latest update being in May 2001: