Monday, September 10, 2012


XML parsing - traditional parsers: 

Pull-parsing

Pull parsers are similar to SAX in that they read in a file line by line and do not store the entire document. However, these parsers are the next step up from SAX.
Pull parsers include XMLReader in PHP and .NET and StAX in java.


Pull parsers treat the document as a series of items which are read in sequence. The contain the concept of a cursor which can be moved to various locations in the incoming file.
There is an iterator that sequentially visits the various elements, attributes, and data in an XML document.
There are then methods which can use this iterator, test the current item to see it's type and, if it is the expected type, pull out aspects of the element such as text value or attributes.
These methods also have the task of moving the cursor on to the next element.


Advantages
  1. Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code.
  2. Pull-parsing can be faster and more memory efficient than DOM
  3. Can be used to read objects


Disadvantages
  1. More difficult to use than DOM and has a tougher learning curve.
  2. Creates a massive if-else loop in code which can be messy and unmaintainable
  3. You can only go in a forward direction
  4. No XML file validaton
How to parse a document with StAX
  1. Create the parser factory object. This factory is then used to create the parser.
    XMLInputFactory inputFactory=XMLInputFactory.newInstance();
  2. Create the parser (reader) from the factory object and create a file input stream and place this into the factory method.
    InputStream input=new FileInputStream(new File("C:/STAX/catalog.xml"));
 XMLStreamReader  xmlStreamReader  =inputFactory.createXMLStreamReader(input);
  1. Call the hasNext() method to see if there are other elements remaining.
    int event=xmlStreamReader.next();
  2. In order to skip a type of element you don't want to process add a simple method which identifies this type of element and continues on.
    If(event.getEventType()==XMLStreamConstants.ENTITY_DECLARATION){
  int event=xmlStreamReader.next();
 }
  1. You can get each element using a method which pulls in the next element. You can then extract data from that element. In this example, we get the element's name.
    if(event==XMLStreamConstants.START_ELEMENT){
 
 System.out.println("Element Local Name:"+xmlStreamReader.getLocalName());
 
 }
 
  1. You can also loop around the attributes of an element and get the value of an attibute based on it's index in the loop e.g.
    xmlStreamReader.getAttributeLocalName(i)

In the next entry I will be talking about the general drawbacks of using DOM, SAX or StAX and how our new approach to parsing solves these problems. 


Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



or find out more at www.sxml.com.au



No comments:

Post a Comment