Monday, September 10, 2012

XML parsing - traditional parsers - SAX 


Simple API for XML

SAX stands for Simple API for XML. It is different from DOM in the way it reads in XML. DOM reads in an entire file in one go and stores it all in memory. SAX reads a file in line by line.
SAX is known as event-driven as a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design
So, a user creates some code to instantiate a SAX parser and read in the XML file.
Next, the user creates a series of methods which act on certain information pulled out of a file.
These methods can then go off with this extracted data and do things with it.
The methods themselves are triggered by various elements being found in the particular line of text which has been read in.


Advantages of using DOM or SAX or StAX
  1. SAX is fast and efficient to implement
  2. SAX can handle large files
Problems
  1. SAX is difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed
  2. SAX is difficult to use for any kind of complicated search
  3. SAX is seen as more daunting to learn for OO programmers as it uses callbacks rather than an OO API.




How to parse a document with SAX
  1. Create a java class for the parsing. We will call is SAX..
  2. In the static main method, we will set up the parser and in the other methods we will handle the parsing callbacks.
  3. So, in the main method, create the parser factory object. This factory is then used to create the parser.
    SAXParserFactory spf = SAXParserFactory.newInstance();
  4. Create the parser from the factory object.
    saxParser = spf.newSAXParser();
  5. Use the parser to create an XMLReader object.
    XMLReader xmlReader = saxParser.getXMLReader();
  6. Set the content handler to this particular SAX class which contains the callbck methods.
    xmlReader.setContentHandler(new Sax());
  7. Set an error handle to deal with any errors.
    xmlReader.setErrorHandler(new MyErrorHandler(System.err));
  8. Parse the Xml file
    xmlReader.parse(convertToFileURL(fileName));
  9. Now, create the first callback method startElement. This method will be called when a new element is found and it will pull in the element namespaceURI, element name, qName and attributes to be handled inside the method.
    startElement(String namespaceURI, String localName, String qName, Attributes atts)
  10. Finally create another java class which will contain methods to hold errors. Call it MyErrorHandler. This is called This class contains methods to catch errors. e.g.
    private String getParseExceptionInfo(SAXParseException spe)



NEXT ENTRY

Next I am going to talk about StAX parsers. It is good to learn about different options available for parsing as each is better in different scenarios.
After that  I will be talking about our cloud-based graphical alternative to traditional XML parsing.

Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



or find out more at www.sxml.com.au






No comments:

Post a Comment