Monday, September 24, 2012


Expresso client code: the inner workings - Part 1 XML connection client code in java 


About client code

Client code is useful. It enables Expresso parser to be accessed pragmatically. Basically, you set up rules for parsing or web service modules using the GUI and then remotely access those rules from your own project using client code.  This makes client code powerful! 

Various types of client code

Client code comes in different languages. We are currently aiming at increasing it's usefulness by adding new languages so that you can use Expresso parser whether you are a java, C++, Ruby or javascript developer. This list will keep growing and we are eager for suggestions as to new languages which we can support! 

There are two types of client code: XML connection code and Web service module code. The former allows you to parse an XML file using your prepared rules. The latter allows you to consume a prepared vendor web service. 

How client code works

Client code connects to Expresso using HTTPS. It passes paramaters into a HTTPS request, which Expresso processes, and it gets a result back.

The values returned from XML connection client code

A three dimensional array is returned from an XML connection. Most times you will only need one or two dimensions of this array.

The outer array is a list of rules which you parsed. If you only parsed using one rule, there will only be one item in this array.
So, we start with an array which has one element for each rule parsed. If we wish to handle the results of any one rule we simple choose that element from the array.
For example, if you are parsing three rules and you wish to process the results of the first rule, simply use the first element of this array.

Each rule element contains a 2 dimensional array.
This middle array contains each return type.

So, what are return types? 

Simple rule - 1 return type
Well, if you have a rule which says "get element address and return it" you are returning each address element value. Your rule has one return type - the address.
In this case, you will have a one element array. The element will simply be a list of addresses.  

Complex rule - multiple return types
If you then say "get element address and return it AND also get element postcode" you are getting two return types - address and postcode. So, the results will contain a list of addresses and their corresponding postcodes.

In this case, you will have a two element array. The first element will be a list of addresses. The second element will be a list of postcodes.

As you can imagine, the inner array is this list of returns e.g. the list of addresses.

Here is an example of a simple rule....

<books>
<book>
A brief history of everything
</book>

<book>
History of Europe 1900 - present
</book>
</books>

We create a rule for this XML file. Our rule is as follows:
Return the text value of any element called book.

We then call client code to run our one rule search. The results are a three dimensional array as follows:

1. Our outer array is the rule. It will have one element as there is only one rule. We take this element and look inside. It is a 2 dimensional array - the middle array.
2. The middle array will have 1 element in it as we are only returning one return type i.e. book value. We take this element and look inside. It is an array.
3. This inner array contains the value of each book element. i.e. A brief history of everything, History of Europe 1900 - present.

We can loop through this array and print out the values.

The values returned from web service connection client code

Web service connection code is simpler. We return an array containing two elements. The first element is the XML response from the web service. The second element is the parsed XML response as an array of results. 

XML Connection client code in java in more detail


The steps involved in the client remote connection
There are three major parts to this client code 
These are:
  1. Setting parameter values
  2. Sending a HTTPS request 
  3. Reading the response 

Setting parameter values

We set various values for the parameters. Some of these are required such as the username, password, connection name and company of the sender. 
There are then some optional parameters which allow you to specify the location of the XML file,  the XML file itself (if on your system) and whether or not you wish to use a cached version of the file.

There are also advanced parameters which enable you to do things such as specific particular rules, supply parameters and sort results. 

Part 1: parameters

required parameters

  1. Username - the name you use to login to the website.
  2. Password  - the password you use to login to the website.
  3. Company  - the company name you use to login to the website.
  4. ConnectionName - the name of the XML connection you wish to parse. This is the name you supplied when creating the connection on the website.

Optional simple paramaters

  1. xml source - the source of the XML you will parse. You have three choices here: client, web or server.  
    1. You can use an XML file which you supply with the request i.e. it is uploaded. This is client code. It allows you to supply a new XML file with each request.
    2. You can use a web-based XML file. This is called web mode. When you set up a connection on the website you have the option to supply a URL rather than uploading an XML file. Now you use this URL again to access the XML file. Since the URL has been saved with your account you do not need to supply it.
    3. In most cases the XML file is uploaded to the website when creating a connection and this XML file stored on the sever is used for parsing. This is server mode.
    4. If no mode is supplied server mode is used by default.
  2. XML File  - If using client mode, the XML file is supplied with the request. This field is it's location on your local system and it is specified here so that the file can be loaded as a string and sent with the request. This is only required with client mode.
  3. caching - This specifies whether or not the file will be parsed using a cached version. It defaults to false.

optional advanced parameters

  1. mode - This allows you to parse by a selection of rules rather than all the rules associated with that connection. You can choose to parse a connection with all it's associated rules by using mode = all. This is the default. You can specify one or more rules to parse with by listing these rules as the mode. Each rule should be separated by &. 
  2. sortBy - This allows you to sort the results in ascending order. For simple rules, sortBy should specify the rule name and 0 as there is only one possible return type to sort by. Otherwise choose which of the return types to sort by e.g. if returning the title and price of a list of books, choose 0 to sort by title and 1  to sort by price. 
  3. dynamic Parameters - These can be used  to modify rules on the fly depending on user input. You can add a new value to a rule and this value will be used with the rule. e.g. You can have a rule which searches for tag = book and price is  > 5.00. You can then add a parameter of 10.00 to the rule and the rule will become tag = book and price is  > 10.00. 
  4. URL parameters - if you are using a web based XML source and the URL changes with each request you can supply URL parameters to dynamically create the URL where the XML file is found. 

Part 2: sending the response
The response is send via HTTPS to the Expresso parser and the results are returned. 

Part 3: Dealing with the returns

The returns are checked for errors and then the 3 dimensional array is looped through and the values are stored and printed out. 


Part 4: Possible error messages 

ERROR CODE 1: incorrect user details

This means that your username, password or company is not correct. 

ERROR CODE 2: userFileStore is missing

This error means that the username or company you supplied does not exist on the server. Check these parameters and contact SXML Help if this happens.

ERROR CODE 3: file is missing from request. Please ensure this field has been added

This means that the fileForXMLUpload parameter is blank and that you have chosen client as your XML file source. Ensure that the correct local location for the XML file to be uploaded is supplied. 

ERROR CODE 4: remote file name on server does not exist at this location

The file you are trying to parse does not exist on the server. This can be caused by choosing not to save the file when creating an XML connection or by deleting an XML connection.  Check that you have correctly spelled the XML connection name supplied and that this connection exists and that the 'save file' option is set to true. 

ERROR CODE 5 - parsing error

This means that there was an error parsing this XML file. The error details are supplied. 

ERROR CODE 6: file not saved on remote Server. Please login to web page to upload file

The file you are trying to parse does not exist on the server. This can be caused by choosing not to save the file when creating an XML connection or by deleting an XML connection.  Check that you have correctly spelled the XML connection name supplied and that this connection exists and that the 'save file' option is set to true. 


ERROR CODE 7: cache could not be located

This means that the cache related to the XML file does not exist. Ensure that you choose 'caching' as true when creating the connection. 



Monday, September 17, 2012


Expresso Parser times trial benchmarks - The results are in!

 Time trials 

The Expresso parser was tested in it's non -caching mode.
The parser was tested against the leading java XML parsers in the field. These were Xerces DOM, Woodstox StAX, Picillo SAX and VTD-XML.

When Expresso was tested against VTD-XML, both parsers were tested in non-catching mode. 

Each parsing was the first and only parsing of the file by the parser, there were no loops involved or any other complexities.

Files used


The Expresso parser was tested on a simple search for TAG = PERSONA on a Shakespeare play john.xml.

Files of various sizes were used ranging from 850kb to 2.5mb.

Larger files were simple the file john.xml repeated multiple times.


Results

 



Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



Expresso Parser Large File parsing - The results are in!

The Expresso parser works well with massive XML files including files up to 35GB in size.

As the parser is not limited by file size it is potentially possible to parse files of any size.

The Expresso parser is limited only by the amount of return elements.

According to latest tests Expresso can now return 230,000 elements with normal JVM memory conditions.

That's right, 230,000 elements!


Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



Accessing Expresso remotely: 

Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp

The power of client code 

Expresso client code allows you to remotely interact with the Expresso Parser through your own application in either java or javascript. The amount of supported access languages will be extended in future to include C++, ruby and node.js among other languages.

Accessing expresso remotely 

Expresso client code is used to access the service. It is available for both XML parsing connections and web service modules and presently in java and javascript with JSON.

The client code generation page

 The expresso client code generation page is accessible by clicking on the 'client code' header.



When on the generation page select a language and select either XML connections or web services. Click generate. Client code is generated on the screen.




The client code is already populated with your username, company and the last connection or web service which you accessed as well as any web service parameters you supplied.

All you have to do is paste this code into your own project, add your access password (your normal Expresso account password) and start using the client code.

Java access through HTTPS

The java client code works as follows.
  1. The parameters needed for the request are created and given values.
  2. a HTTPS request is created using the URL of the Expresso Remote Parser.
  3. The HTTPS request is sent and the results are arranged in a three dimensional array.
  4. The outer array is for each rule which has been parsed. 
  5. The middle array contains each set of return set
  6. Each inner array contains each of the return types within the return set

Javascript access through JSON

The javascript client code works as follows.
  1. Within script tags a JSON request over HTTPS is created.
  2. The various required parameters are added to the JSON request
  3. The JSON request is sent and the result is available for processing

Updating client code graphically

You can modify your client code parameters using our graphical tool. When you generate client code simply click 'modify client code graphically' and you are brought to a page where you can update any parameter and the resulting client code is produced. 




Forms Wizard and dynamic parameter forms

Expresso allows you to create a HTML form and backing java servlet code. You can then publish this HTML code to your website where users can supply parameters values for XML parsing rules and have they stored XML file parsed with those parameters.
For example you might wish to create an XML parsing rule which is TAG = SHOP and tag value = "X" where X is a value entered by a user. With Forms you can easily do this. You can create  a form linked to a rule where the user enters a value for part of the rule.

To create a dynamic parameter form

  • Click on Forms header to get to the forms section. Choose the particular XML connection which you wish to create a form

  •  The rules associated with this XML connection are listed. You now go through each rule deciding whether to create form elements for that rule or not.
  • For each rule, you can choose a form element for any rule part.
  • Choose a field name for the form element.

 
  1. When you have chosen all the form elements the client code is created.
  2. You can paste this client code into your own project and start using it straight away.




 



Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp


 

 

Monday, September 10, 2012

Getting started with Expresso parser 



Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



Today I will show you how to parse an XML file with Expresso.  We will be dealing with the visual aspect of parsing a file.


Register for a free developer account

  1. To register simply enter a username, password and a company. If you are not currently working add your username again in this field. 
  2. Choose free developer version and Click 'register'. 


Login to the site


  1. Enter your username, password and company and click 'login'. You will be brought to the dashboard







The dashboard
The main area is called the dashboard. this contains a list of all XML connections and web services which you have created. Each can be edited.







Create a new XML connection

  1. In dashboard, click 'add new connection' and you are brought to a new page.
  2. Enter a name for the connection, upload or specify an XML file URI and choose settings and click save.
  3. The new connection will now be shown on the dashboard.







Create a new parsing rule 

  1. Click on the rules section of the connection listing to see a particular connection's rules.
  2. Click 'add new' to add a new parsing rule. A popup box appears. Fill in details.



  1. To search for child elements click 'add child'. A hierarchy of search rules can be created.
  2. You can return various aspects of an element including tagname, tag value, attribute name and attribute value.


  1. You can use regular expressions and mathematical operators in the search.
  2. You can search for child elements or descendants.
  3. Click save to save the rule. 
  4. Once you save a rule it appears on the right hand side of the screen. When you select a new connection only the selected connection's rules are shown. 











See results
1.  Click on a rule's run method to run the rule against the XML file. The results are shown on screen.





Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp





XLM Parsing on the cloud

Why cloud parsing?

We choose to provide a cloud-based parser as we believe that the painful integration layer that companies go through can be minimized or avoided. 

My husband asked me once why companies buy third party software and still take months to start using it? 

My answer: The integration layer. 






What does cloud parsing mean for us?
Instead of downloading XML parser and adding it as a third party library to your existing project, you can simply register and login to our website and start parsing online. 

No more conflicts
This means that you can use our parser from any environment as long as you have a web browser!
There are no more conflict issues with JVMs, other 3rd party libraries or other parsers. 

We handle the complexity
If you are running your application on something like node.js then outsource all the mathematical complexity of XML parsing to us!! 
We handle the parsing and send your back your results over HTTPS. 



How to get started with Expresso parser 

  1. Register for a free developer version
  2. Login and click the browse button to upload an XML file or enter the URL of a HTTP based file.
  3. Click a button to open the graphical parser which lets you visually generate parsing rules.
  4. Click "run" to test your newly created parsing rule against the XML file you are using and see the results on screen.
How to connect with client code 
  1. After creating your rules you can connect remotely to the parser using java or javascript or JSON.
  2. Our client code generator creates client code specific to your user details and to the last XML connection you created.
  3. Simply paste this code into your project and immediately start connecting to the parser remotely.
  4. Results from each parsing can be returned as java arrays or as JSON objects.

How to share files 
  1. If you have a medium or large business with multiple users you can set up groups and roles graphically within Expresso.
  2. You can then use these groups and roles to share XML file connections and associated parsing rules securely among your team.
  3. You can use roles to limit access e,g, to read only.

How to access global Web services 
  1. We are creating a library of globally used web services which are set up using our parser.
  2. You can browse through these, choose web services you like and add them to your parsing suite.
  3. You can then graphically modify the web service method used and any parameters and consume the web service from the parsing suite and remotely using client code. 


Check out our free developer version at http://www.sxml.com.au:8080/Expresso/login.jsp



The benefits of Expresso Parser



  1. Faster parsing
  2. Parse larger files with no memory restrictions
  3. Parse files simply.
  4. Graphically set search rules for parsing without the need for any parsing code.
  5. Parsing is error-proof with a GUI for rules setting
  6. XML search results can be tested immediately
  7. Instantly see the results of parsing without any parsing code.
  8. Modify parsing rules dynamically without changing the parsing code.
  9. Carry out complex search on large XML files quickly and efficiently.
  10. Can handle inheritance searches
  11. Permanently store, manage and re-run graphically created XML parsing rules.
  12. No integration layer
  13. Business rules are separate from underlying code.
  14. No costly, time consuming coding changes needed when business rules change.



  1. Faster parsing
SXML parses faster than DOM and SAX and is even faster when in caching mode. The SXML caching is based on virtual tag ids and so allows for changes in the values of elements in an XML file as well as alterations in tag values.

  1. Parse larger files with no memory restrictions
SXML allows a user to parse large files in caching mode without memory restrictions. XML files of 15GB can be parsed without any speed decrease.

  1. Parse files simply
Files parsed based on pre-set rules. No parsing code needed. Client code needed only to connect to the server and to obtain results for a specified file.

  1. Graphically set search rules for parsing without the need for any parsing code.
Say goodbye to complicated, time consuming and error prone XML parsing code. XML parsing can now be set graphically and tested dynamically on an uploaded XML file. The small amount of client code needed to access the SXML server remotely for later parsing is generated automatically.


  1. Parsing is error-proof with a GUI for rules setting
Since parsing is carried out with a Graphical user interface, it is not error prone like writing parsing code.

  1. XML search results can be tested immediately
When a user creates a search rule, they can see the results immediately, so they are able to test the validity of their search rules ensuring that they are returning the correct data from the XML file.

  1. Instantly see the results of parsing without any parsing code.
Graphically set a parsing rule for a file and see the results of the parsed file appear on the screen.
  1. Modify parsing rules dynamically without changing the parsing code.
Manage and maintain multiple search rules for each file. Modify search rules dynamically without having to update the client code.

  1. Carry out complex searches on large xml files quickly and efficiently.
Complex searches can be carried out intuitively using SXML Categories, the powerful engine behind the rules parser. SXML categories uses graph theory to allow the parser to search for results from an XML file while bypassing the regular, time consuming tree navigation.

  1. Can handle inheritance searches
SXML allows a user to search an xml file for elements based on the value of their parent, ancestor or sibling elements.

  1. Permanently store, manage and re-run graphically created XML parsing rules.
Each user keeps their own store of XML rules for each xml file which they have tested. These rules can be modified, re-tested or deleted. The user can remotely parse a file based on one or more of the search rules which have been developed for this file.

  1. No integration layer
SXML is available as a service. It does not need any integration layer as it is not installed on a user's machine. There are no interaction issues with various software versions and no security issues with having a new piece of software installed.

  1. Business rules are separate from underlying code.
The XML rules are stored separately from the XML parsing code and can be viewed, managed and modified using the GUI.

  1. No costly, time consuming coding changes needed when business rules change.
When a business rule changes, why spend three months or more updating parsing code for XML files containing the underlying data? Why risk future errors with XML parsing code in order to facilitate changes in business rules.
In the real world business rules change all the time, so you need an XML parser which will be automatically updated when you graphically set new business rules. SXML allows rules to be changed online using the Rules parser and no code changes are required to parse the xml with these rules.


For now, our beta version is available here. Try our XML Parser.


or find out more at www.sxml.com.au