XML parsing - traditional parsers:
Pull-parsing
Pull parsers are
similar to SAX in that they read in a file line by line and do not
store the entire document. However, these parsers are the next step
up from SAX.
Pull parsers include XMLReader in PHP and .NET and
StAX in java.
Pull parsers treat
the document as a series of items which are read in sequence. The
contain the concept of a cursor which can be moved to various
locations in the incoming file.
There is an iterator that sequentially visits the
various elements, attributes, and data in an XML document.
There are then methods which can use this
iterator, test the current item to see it's type and, if it is the
expected type, pull out aspects of the element such as text value or
attributes.
These methods also have the task of moving the
cursor on to the next element.
Advantages
- Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code.
- Pull-parsing can be faster and more memory efficient than DOM
- Can be used to read objects
Disadvantages
- More difficult to use than DOM and has a tougher learning curve.
- Creates a massive if-else loop in code which can be messy and unmaintainable
- You can only go in a forward direction
- No XML file validaton
How to parse a document with StAX
- Create the parser factory object. This factory is then used to create the parser.XMLInputFactory inputFactory=XMLInputFactory.newInstance();
- Create the parser (reader) from the factory object and create a file input stream and place this into the factory method.
InputStream input=new FileInputStream(new File("C:/STAX/catalog.xml"));
XMLStreamReader xmlStreamReader =inputFactory.createXMLStreamReader(input);
- Call the hasNext() method to see if there are other elements remaining.
int event=xmlStreamReader.next();
- In order to skip a type of element you don't want to process add a simple method which identifies this type of element and continues on.
If(event.getEventType()==XMLStreamConstants.ENTITY_DECLARATION){
int event=xmlStreamReader.next(); }
- You can get each element using a method which pulls in the next element. You can then extract data from that element. In this example, we get the element's name.
if(event==XMLStreamConstants.START_ELEMENT){
System.out.println("Element Local Name:"+xmlStreamReader.getLocalName()); }
- You can also loop around the attributes of an element and get the value of an attibute based on it's index in the loop e.g.
xmlStreamReader.getAttributeLocalName(i)
In the next entry I will be talking about the general drawbacks of using DOM, SAX or StAX and how our new approach to parsing solves these problems.
No comments:
Post a Comment