Monday, January 25, 2010

XML Parser

Java API for XML Processing - JAXP
is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents. The three basic parsing interfaces are:
  • the Document Object Model parsing interface or DOM interface
  • the Simple API for XML parsing interface or SAX interface
  • the Streaming API for XML or StAX interface (added in JDK 6; separate jar available for JDK 5)

    XML Parsing
    Traditionally, XML APIs are either:
  • tree based - the entire document is read into memory as a tree structure for random access by the calling application
  • event based - the application registers to receive events as entities are encountered within the source document.
    Both have advantages; the former (for example, DOM) allows for random access to the document, the latter (e.g. SAX) requires a small memory footprint and is typically much faster.

    These two access metaphors can be thought of as polar opposites. A tree based API allows unlimited, random, access and manipulation, while an event based API is a 'one shot' pass through the source document.

    Event-based XML Parsing Libs
    SAX is the Simple API for XML, originally a Java-only API. SAX was the first widely adopted API for XML in Java, and is a "de facto" standard.

    A parser which implements SAX (ie, a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include:
  • XML Text nodes
  • XML Element nodes
  • XML Processing Instructions
  • XML Comments

    Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events. SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.

    DOM-based XML Parsing Libs
    JDOM
    An open source Java-based document object model for XML that was designed specifically for the Java platform so that it can take advantage of its language features. JDOM integrates with Document Object Model (DOM) and Simple API for XML (SAX), supports XPath and XSLT. It uses external parsers to build documents.

    DOM4J
    An open source Java library for working with XML, XPath and XSLT. It is compatible with DOM, SAX and JAXP standards.

    Streaming API for XML (StAX)
    An application programming interface (API) to read and write XML documents, originating from the Java programming language community.

    StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.

    StAX XML Parsing Libs
    Apache Axiom
    A a light weight XML object model based on top of Stax and also provides lazy object building.
  • 2 comments:

    anon_anon said...
    This comment has been removed by the author.
    anon_anon said...

    there is also vtd-xml that is far more advanced than DOM, SAX, JDOM etc...

    http://vtd-xml.sf.net