Monday, April 27, 2009

XQuery Unit Tests

I had a fright last week - Wolfgang asked for a copy of the test harness I'd used to evaluate different implementations of a lookup table. This is code I wrote some time ago, tinkered with, good enough for our internal use but ... well pretty bad code.

I have to confess here that as a lone XQuery programmer, my code doesn't get the level of critique it needs. The Wikibook has been disappointing in that regard: I've published thousands of lines of code there and there has not been a single criticism or improvement posted. Typos in the descriptions are occasionally corrected by helpful souls, graffiti erased by others but as a forum for honing coding skills - forget it. In my task as project leader on our FOLD project (now coming to an end), I see and review lots of my students' code as well as the code Dan McCreary contributes to the WikiBook so I do quite a bit of reviewing. However I am only too conscious of the lacunae in my XQuery knowledge which perhaps through over kindness or because everyone is so busy, remain for too long. I'm envious of my agile friends who have been pair-programming for years. Perhaps there should be a site to match up lonely programmers for occasional pairing.

Anyway the test suite got a bit of work on it one day last week and its looking a bit better.

Here is a sample test script . As a test script to test the test runner it has the unusual property that some failed tests are good since failing is what's being tested. Here is it running.
Here is another, used to test the lookup implementations and one to test the geodesy functions.

Version 1 of the test runner executed tests and generated a report in parallel. A set of tests may have a common set of modules to import, prefix and suffix code. For each test, modules are dynamically loaded, the code concatenated and then evaled inside a catch.


let $extendedCode := concat($test/../prolog,$test/code,$test/../epilog)
let $output := util:catch("*",util:eval($extendedCode),Compile error)


The output is compared with a number of expected values. Comparison may be string-based, element-based, substring present or absent. (I also need to add numerical comparison with defined tolerance.) A test must meet all expectations to pass.

To get a summary of the results requires either running the sequence of tests recursively or constructing the test results as a constructed element and then analysing the results. Recursion would be suitable for a simple sum of passes and fails, but it closely binds the analysis to the testing. An intermediate document decouples testing from reporting, thus providing greater flexibility in the analysis but requiring temporary documents.

So version 2 constructed a sequence of test results, and then merged these results with the original test set to generate the report. Collating two sequences is a common idiom which in functional languages must either recurse over both, or iterate over one sequence whilst indexing into the other, or iterate over a extracted common key and index into both. The reporting is currently done in XQuery but it should be possible to use XSLT. Either the collating would need to be done before the XSLT step or XSLT would have the collating task. Not a happy situation.

So last week in comes version 3. Now the step which executes the tests augments each test with new attributes (pass, timing) and elements (output) and similarly each expectation with the results of its evaluation so that one single, enhanced document is produced, with the same schema as the original [the augmented data has to be optional anyway since some tests may be tagged to be ignored]. Transformation of the complete document to HTML is then straightforward either in line,in a pipeline with XQuery or XSLT. The same transformation can be run on the un-executed test set.

Augmenting the test set is slightly harder in XQuery than it would be in XSLT. For example, after executing each test, the augmented test is recreated with:


element test {
$test/@*,
attribute pass {$pass},
attribute timems {$timems},
$test/(* except expected),
element output {$output},
$expectedResults


This approach means that, once again, handling the construction of temporary documents is a key requirement for XQUery applications.

But I'm still not quite happy with version 3. As so often I'm struggling with namespaces in the test scripts - now where's my pair?

Sunday, April 19, 2009

Implementing a table look-up in XQuery

Handling temporary XML fragments in the eXist XML db has improved markedly in version 1.3. I have been looking again at an example of processing MusicXML documents which I first wrote up in the XQuery wikibook.

The code requires a translation from the note name (A, B) to the midi note value for each note. The pitch of a note is defined by a structure like:

<pitch>
<step>C</step>
<alter>1</alter>
<octave>3</octave>
</pitch>



One approach is to use an if -then -else construct:

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
let $step := $thispitch/step
let $alter :=
if (empty($thispitch/alter)) then 0
else xs:integer($thispitch/alter)
let $octave := xs:integer($thispitch/octave)
let $pitchstep :=
if ($step = "C") then 0
else if ($step = "D") then 2
else if ($step = "E") then 4
else if ($step = "F") then 5
else if ($step = "G") then 7
else if ($step = "A") then 9
else if ($step = "B") then 11
else 0
return 12 * ($octave + 1) + $pitchstep + $alter
} ;

but this cries out for a table lookup as a sequence:

declare variable $noteStep :=
(
<note name="C" step="0"/>,
<note name="D" step="2"/>,
<note name="E" step="4"/>,
<note name="F" step="5"/>,
<note name="G" step="7"/>,
<note name="A" step="9"/>,
<note name="B" step="11"/>
);

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
let $alter := xs:integer(($thispitch/alter,0)[1])
let $octave := xs:integer($thispitch/octave)
let $pitchstep := xs:integer($noteStep[@name = $thispitch/step]/@step)
return 12 * ($octave + 1) + $pitchstep + $alter
} ;


or an XML element:

declare variable $noteStep :=
<steps>
<note name="C" step="0"/>
<note name="D" step="2"/>
<note name="E" step="4"/>
<note name="F" step="5"/>
<note name="G" step="7"/>
<note name="A" step="9"/>
<note name="B" step="11"/>
</steps>;

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
let $alter := xs:integer(($thispitch/alter,0)[1])
let $octave := xs:integer($thispitch/octave)
let $pitchstep := xs:integer($noteStep/note[@name = $thispitch/step]/@step)
return 12 * ($octave + 1) + $pitchstep + $alter
} ;


We could also store the table in the database since it is constant.

eXist does some optimisation of XPath expressions, but it does not factor out the invariant expression $thispitch/step
in the XPath predicate.

I wrote a test suite to time these various implementations. Typically this shows that factoring the sub-expression reduces the execution time by 25%. However, even with this optimisation, the structure lookup is disappointingly slow. It is about 50% slower than the if/then expression when stored on disk, and 100% slower when in memory.

This aspect of XQuery performance is important if XQuery is to be used for general application development since data structures such as indexed and associative arrays have to be represented as sequences of atomic values or elements. This performance is not really surprising and there may be more performance to be gained by indexing the data base element.

Wednesday, April 08, 2009

XQuery module for Geodesy

I wrote my first attempt at Mercator ~ Latitude/Longitude conversion functions about 2 years ago when working in a case study for SPA2007. Part of this was documented in the XQuery Wikibook article on UK bus stops and Ordnance Survey coordinates. At the time I did not appreciate why my coordinates were just a bit off but fudged the difference. Last month I used the same functions to map pedal cycle data but as gardens and roofs appeared to be much more dangerous than roads, I thought I'd better try harder and learnt about Helmert Transformations.

My latest attempt is now available in the XQuery Examples Google Project and the Wikibook article has been revised to use this module. The formulae come mainly from the OS Guide to coordinate systems. PHP Code on Barry Hunter's http://www.nearby.org site was also useful.The test suite for this module is still being added to. I have struggled with rounding, needed to get positions with sensible resolutions and for testing and I'm not yet happy. Some tests need visual inspection and there is a problem with heights.

The module depends on the eXist math module, a prime candidate for cross-implementation standardization by the EXQuery initiative. In the latest version (v1-3) of the module, the math:atan2() function has parameters in the correct order (y,x) but older releases had these parameters reversed,as in v1-2.

The design of the module uses elements with attributes in the same namespace as the module to define compound structures such as LatLongs, Ellipsoids and Projections. These are defined in an associated schema. Compile-time checking in eXist is limited to checking the element name since eXist is not schema-aware although full schema-awareness would be of benefit in this module.

Suggestions for additions to this module are most welcome, as of course is any review of the code.

Sunday, April 05, 2009

Dashboards and Widgets in XQuery

Jim Fuller's recent article on Dashboards in XQuery makes a very good case for using XQuery for mashups generally. Jim's dashboard application reminded me of work I had been doing with my students last term on a configurable web page containing widgets to display NOAA weather data, RSS feeds, Google Maps and their own choice of data source. For this we used PHP with Simple XML, but to demonstrate the power of XQuery, I needed to show them the same application built on XQuery. It also seemed to me that the business dashboard would benefit from a design which split widget code from company-specific data.

The basic architecture of the approach here is to create a set of generalised widgets as XQuery functions and to define the specifics of the widget in a configuration file.

Here are a couple of configurations : jim.xml which is based on Jim's Dashboard example

and dsa.xml which is based on our course weather display.

The second example has a page refresh rate set, but I'm working on making the refresh rate widget- rather than page- specific using AJAX.

In the demo interface code, each widget links to its XQuery source code and hence to the code of called functions. Widget executions are timed and there is a link to the configuration file itself. Here is a basic main script:

import module namespace widgets = "http://www.cems.uwe.ac.uk/xmlwiki/widgets" at "widgets.xqm";

declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes
doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN
doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";

let $configuri := request:get-parameter("config",())
let $config := doc($configuri)/config
return
<html>
<head>
{$config/title}
<link rel="stylesheet" href="{$config/@css}" type="text/css"/>
</head>
<body>
<h1>{$config/title/text()} </h1>
{
for $section in $config/section
let $component := $section/*
return
<div>
<h3>{$section/@title/string()} </h3>
{widgets:render($section)}
</div>
}
</body>
</html>


Run it

There are a couple of ideas used in the code.

Each widget is defined by its own section in the configuration file. As a simple example, to configure the widget to get the date and time:

<section column="c2" title="Time in Bristol">
<datetime>
<format>EE dd/MM HH:mm</format>
</datetime>
</section>


The main script processes the configuration file, and each widget is rendered by calling a function of the same name. The code for this dispatching currently uses the eXist function util:eval(), within a util:catch call, to implement late binding:



declare function widgets:render($section as element(section)) as element(div) {
(: dispatch section to the matching function :)
let $component := $section/*[1]
let $widget := local-name($component)
let $function := concat("widgets:", $widget, "($component)")
return
<div>
{util:catch( "*", util:eval($function), <span>Missing or bad widget.</span>) }
</div>
};



A safer alternative would be to use typeswitch :

declare function widgets:render2($section as element(section)) as element(div) {
let $component := $section/*[1]
return
typeswitch ($component)
case element(datatime) return widgets:datetime($component)
case element(SQL) return widgets:SQL($component)
case element(monitor) return widgets:monitor($component)
....
default return
<div> Missing widget {local-name($component)} </div>
};


but this needs updating every time a new widget is added to the module.

To help with processing table data from different sources such as SQL, Excel and Google Spreadsheets, support functions transform these to a common XML structure. This standard form is then transformed to its final HTML with the XSLT included in the configuration.

For example here is the configuration for an SQL widget, reading data from a (readonly) MySQL database:

<section column="c2" title="Overpaid executives">
<SQL>
<username>cwstudent</username>
<password>cwstudent</password>
<database>jdbc:mysql://stocks.cems.uwe.ac.uk/Emp</database>
<query>select ename,sal from emp where sal > 2000 order by sal desc</query>
<xsl>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="table">
<table border="1">
<xsl:apply-templates select="row"/>
</table>
</xsl:template>
<xsl:template match="row">
<tr>
<td>
<xsl:value-of select="ename"/>
</td>
<td>
<xsl:value-of select="sal"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
</xsl>
</SQL>
</section>


and its widget code:


declare function widgets:SQL($component as element(SQL)) as element (div) {
<div>
{
let $connection := sql:get-connection("com.mysql.jdbc.Driver", $component/database, $component/username, $component/password)
let $result := sql:execute($connection, $component/query, false())
let $table:= widgets:sql-result-to-table($result)
return transform:transform($table,$component/xsl/*,())
}
</div>
};


widgets:sql-result-to-table converts the sql result to a common internal XML format:


declare function widgets:sql-result-to-table($result as element(sql:result)) as element(table) {
<table>
{for $row in $result/sql:row
return
<row>
{for $col in $row/sql:field
return
element {$col/@name}
{string($col)}
}
</row>
}
</table>
};


This simplifies the XSLT transformation to the final HTML div element.

There's more to do on this framework, including adding support for access to passworded Google SS, Calendar and RSS feeds for which Jim's code will be handy. The framework also needs to support widgets from other modules and I'm not sure about the intermediate XML format and I'd like to use this more widely to layer the weather widgets as well. Getting the right split between widget code and configuration data is always tricky of course. However this framework seems quite useful and I intend to revisit some of the WikiBook examples to restructure as widgets.

I'll post the full code to the XQuery Wikibook shortly.

Wednesday, April 01, 2009

Parameterised MS Word Documents with XQuery

It's coming round to exam time again at UWE, Bristol and as usual I've been struggling to get mine written. The XQuery-based FOLD application (which supports staff and students in our School) generates exam front pages contain exam details such as module code and title, examination date, length and time as HTML which had to be copied (poorly) into MS Word. This wasn't very satisfactory and it would be better to generate a Word document with the completed front page and sample pages with headers and footers. I'd put this off as it seemed too complicated. The Word XML format wordml is one route but it looked daunting to generate for scratch.

However for this application I only need to make some small edits to a base document. The most obvious approach was to 'parameterise' the Word document with place-holders. Unique place-holders can be edited in with Word before the document is saved as XML. Fields which are not editable in MS Word, such as the author and timestamps can be parameterised by editing the wordml directly. To instantiate a new Word document, the place-holders in the wordml are replaced with their values.

Treating this as string replacement is easier than editing the XML directly, even if this was possible in XQuery. The XQuery script reads the wordml document, serializes the XML as a string, replaces the placeholders in the string with their values and then converts back to XML for output.

Although this is not a typical task for XQuery and would be written in a similar way in other scripting languages, it is possible in XQuery with the help of a pair of functions which should be part of a common XQuery function library. In eXist these are util:serialize() to convert from XML to a string and the inverse, util:parse().

The function needs to replace multiple strings so we use a an XML element to define the name/value pairs:

let $moduleCode := request:get-parameter("moduleCode",())
..
let $replacement :=
<replacement>
<replace string="F_ModuleCode" value="{$moduleCode}"/>
<replace string="F_Title" value="{$title}"/>
<replace string="F_LastAuthor" value="FOLD"/>
..
</replacement>

and a recursive function to do the replacements:

declare function local:replace($string,$replacements) {
if (empty($replacements))
then $string
else
let $replace := $replacements[1]
let $rstring := replace($string,string($replace/@string),string($replace/@value))
return
local:replace($rstring,subsequence($replacements,2))
};

After gathering the parameter values and formatting a replacement element, the new document is generated by:

let $template := doc("/db/FOLD/doc/examtemplate.xml")
let $stemplate := util:serialize($template,"method=xml")
let $mtemplate := local:replace($stemplate,$replaceStrings/*)
return
util:parse($mtemplate)

Here the generated wordml is displayed in the browser, from where it can be saved, then loaded into Word. I found out the directive at the front of the wordml:

<?mso-application progid="Word.Document"?>

is used by the Windows OS to associate the file with MS Word so the media type is just the standard text/xml. However it is helpful to define a suitable default file name using a function in eXist's HTTP response module, the pair to the request module used to access URL parameters:

let $dummy := response:set-header('Content-Disposition', concat('attachment;filename=',concat("Exam_",$moduleCode,".xml") ))
let $dummy := response:set-header('Content-Type','application/msword')

The document could also be saved directly to the database, or all documents generated eagerly ready for use.

This approach feels like a bit of a hack, but it took only an hour to develop and is a major improvement on the previous approach. Changes to the base document will need re-parameterisation, but that seems a small overhead for slowly changing standard documents. XQuery forces a recursive approach to the string replacements where an iterative updating approach would avoid copying the (rather large) string, but performance is fast enough for this task, indeed in eXist string handling is very fast. My MS Office users will be happier but I still need to think about the Unix and Mac OS users.

Review of IBM developerWorks article by Brian Carey

I've just come across an article published by IBM's developerWorks "Use XQuery for the presentation layer" by Brian Carey. This illustrates the value of storing complex data in XML form and using XQuery to select and transform to HTML. Whilst the main message is well-presented, the implementation, below layers of Java, is over-complicated in a couple of ways.

Brian makes the curious observation under the heading Using XQuery prevents "cheating" that "You cannot write business logic in XQuery because XQuery focuses exclusively on querying and transformation". This rather ignores the fact that a significant number of applications are built soley on XQuery as the server-side language. The consequence is that a simple Web application retains a complex and unnnecessary Java middle tier acting as Controller in an MVC architecture.

Brian's web application provides a browsing interface to a collection of products using AJAX to update the page and XQuery to select producats and transfrom to an HTML table.

Implemented in XQuery on the open source eXist XML database as an example, we need only use the HTTP interface functions provided to couple the HTTP requests directly to the XQuery script. For eXist the additions would be:


declare variable $docName as xs:string := "lures.xml";
declare variable $configuration as xs:string := request:get-parameter("configuration",());
declare variable $usage as xs:string:= request:get-parameter("usage",());


It might be objected that this script binds the resources and interfaces too closely to the script. Indeed the only benefit of the Java controller layer is this de-coupling. We can achieve the same effect in XQuery with a local configuration file to aid portability and testing:

<ConfigurationFile>
<lureFileName>/db/Wiki/BC/lures2.xml</lureFileName>
</ConfigurationFile>


and adding these lines to the XQuery script:


declare variable $config := /ConfigurationFile;
declare variable $docName := $config/lureFileName;


I've implemented Brian's code with eXist on the XQuery Wikibook server and the relevant scripts are here:

http://www.cems.uwe.ac.uk/~cjwallac/apps/BC/index.html


I changing only the script address in the HTML code and correcting an error in the AJAX function where request.readyState was miss-typed. (took me a while to track that down!). Middle layer all gone. Storage of the whole application in eXist would be a normal deployment but was not possible without editing because the HTML page is not valid XHTML .

One impediment to the use of XQuery as an application development language is that functions to extend the XPath function with functionality such as HTTP interfacing are implementation-dependent, limiting portability. A new inititive EXQuery seeks to remedy this problem by developing a cross-platform library.

One other feature of Brian's implementation is the structure of the XML file. The start of the file looks like

<lures>
<casting>
<minnows brand="Yohuri" style="deep" size="3">
<minnow color="midnight blue">


But since the data is intended to be searched for usage(e.g. casting) and configuration (e.g. minnow) this leads to XQuery code like



if ($configuration = 'minnow' and $usage = 'casting') then
for $minnows in doc($docName)//casting/minnows
return

else
else if ($configuration = 'minnow' and $usage = 'trolling') then


..


and because the child structures are actually all the same, this leads to unmaintableable and repetative code.

A fix is possible in XQuery using local-name to filter using the node names themselves. A better approach would be to flatten the whole file, changing the representation of the configuration and usage concepts from elements to attributes:

<lures>
<lure brand="Yohuri" style="deep" size="3" color="midnight blue" usage="casting" configuration="minnow">

..


and the query becomes


for $lure in doc($docName)//lure[@configuration = $configuration][@usage=$usage]
return ..



In summary, this implementation might be a "cheat" but cheating by reducing complexity and lines of code seems a good thing to do.