XML Prague 2007

The list of sessions is complete now. Participants are encouraged to present their posters during the conference.

Conference Sessions

Processing XML With Fun , Eric van der Vlist
XProc: An XML Pipeline Language , Norman Walsh
Python and XML , Uche Ogbuji
Applications of XML pipelines to web applications with XPL , Erik Bruchez
XLinq , Štěpán Bechynský
DocBook , Norman Walsh
Leapfrogging microformats with XML, linking, and more , Uche Ogbuji
Open XML Overview , Štěpán Bechynský
XML Processing by Streaming , Mohamed Zergaoui
Processing OpenDocument , Lars Oppermann
Generative XPath , Oleg Parashchenko
Beyond the simple pipeline: managing processes over time , Geert Bormans
A Generic Transformation Architecture , Bryan Rasmussen
PHP XML BoF , organized by Shaun Rowe
eXist developers meeting , organized by Adam Retter

Processing XML With Fun
Eric van der Vlist
Dyomedea

If you find XML processing dull and boring, then you are probably using last century's techniques such as the DOM and this talk is for you.

You will see during the two days of XML Prague 2007 that you have no excuse to process XML without fun and in this presentation I'll do a quick review of the most promising techniques that can save you from the DOM without loosing the power of XML: it can be seen as a road map that will try to give you the big picture before the following speakers lead you through more detailed areas.

The focus of the talk will be on XMP pipeline languages, data binding APIs and programming extensions.

XProc: An XML Pipeline Language
Norman Walsh
Sun Microsystems

This presentation will explore the design and continued progress on XProc: An XML Pipeline Language currently being developed by the XML Processing Model Working Group at the W3C. The presentation will identify some of the use cases associated with XProc, describe highlights of the current design, and discuss the state of the latest working draft. If possible, the presentation will include a demonstration of XProc pipelines in action.

Python and XML
Uche Ogbuji
Zepheira

Python is a popular language for general-purpose development, including Web development, but it has not always had the coziest relationship with XML. There is a history of often unnecessary philosophical differences between boosters of Python and of XML. Partly because of this the state of the art has been unfortunately slow to develop, despite the work of many open-source developers in the XML-SIG and elsewhere. At present there are several options for XML processing in Python. Because of Python's flexibility and the breadth of library and tool options, it can be a very productive language for XML projects. In this session Uche Ogbuji, long-time Python and XML columnist, discusses the more prominent ways to process XML in Python, touching on pros, cons and other characteristics of each. The presentation will include a good deal of code samples and battle stories.

Applications of XML pipelines to web applications with XPL
Erik Bruchez
Orbeon

The XProc XML pipeline language is well on its way to be standardized at W3C. But, exactly, what are XML pipelines good for? And how do they work in practice?

In this talk, we attempt to answer these questions by presenting use cases for XML pipelines implemented with XPL, a close cousin of XProc. We show in particular how XML pipelines fill a niche in the constantly evolving web applications ecosystem. Can XML pipelines help deal with multiple web browsers? With REST services? With the plethora of syndication formats such as RSS and Atom? With Ajax? We suggest that the answer is yes in all these cases.

We also show how XML pipelines can play a particularly interesting role when used in conjunction with XForms.

The talk will feature live demonstrations using open source software.

XLinq
Štěpán Bechynský
Microsoft

There are two major perspectives for thinking about and understanding XLinq. From one perspective you can think of XLinq as a member of the LINQ Project family of technologies with XLinq providing an XML Language Integrated Query capability along with a consistent query experience for objects, relational database (DLinq), and other data access technologies as they become LINQ-enabled. From a another perspective you can think of XLinq as a full feature in-memory XML programming API comparable to a modernized, redesigned Document Object Model (DOM) XML Programming API plus a few key features from XPath and XSLT.

XLinq represents a new, modernized in-memory XML Programming API. XLinq was designed to be a cleaner, modernized API, as well as fast and lightweight. XLinq uses modern language features (e.g., generics and nullable types) and diverges from the DOM programming model with a variety of innovations to simplify programming against XML. Even without Language Integrated Query capabilities XLinq represents a significant stride forward for XML programming.

DocBook
Norman Walsh
Sun Microsystems

This presentation will briefly introduce DocBook and discuss the ongoing development of DocBook V5.0. There will be plenty of opportunity for audience participation and wide-ranging discussion of issues directly, or at least tangentially, related to DocBook.

Leapfrogging microformats with XML, linking, and more
Uche Ogbuji
Zepheira

Some microformats are straightforward, useful and unobjectionable. Others, including many of the popular ones, abuse HTML, are poorly specified, and are quite prone to confusion. When designed and applied without careful consideration, microformats can detract from the value of the structured information they seek to provide. Beyond the simplest class of microformats it is often better to avoid hackery and embrace the native structure of the Web. XML and other natural data representaton technologies such as JSON are just as viable as many of their counterparts in microformats. The main argument against these is that microformats provide graceful degradation for unsophisticated Web clients. But such graceful degradation can also be achieved through the power of linking. A Web page can still be a Web page, and not a scaffolding for a bunch of ill-fitting and ill-specified records. All it has to do is link to those records in their native format. More sophisticated browsers can be dressed up with all the AJAX gear you like, loading simple, linked XML or JSON into dynamic views while crawlers and legacy Web clients can access the structured information through user-actuated links. This session discusses these simple techniques, and provides detailed reasons for why one should be a little bit cautious in the face of the microformats hype.

Open XML Overview
Štěpán Bechynský
Microsoft

Office XML Formats for the 2007 Office system introduce or improve many types of solutions involving documents that developers can build. You can access the contents of an Office document in Office XML Formats by using any tool or technology capable of working with ZIP archives. The document content can then be manipulated using any standard XML processing techniques, or for parts that exist as embedded native formats, such as images, processed using any appropriate tool for that object type.

You will see basic concepts of Open Packaging Conventions (OPC), WordprocessingML and SpreadsheetML. OPC is fundamental for all documents types, WordprocessingML is used for text documents and SpreadsheetML is used for spread sheets. There are more "MLs" but you will see them just briefly.

XML Processing by Streaming
Mohamed Zergaoui
Innovimax

The first part will be to present the state of the art of XML Streaming processing by reviewing the products in place (joost, cocoon, saxon, etc.), the API available (SAX, Stax, XOM), languages (CDuce, XDuce, XJ), and the spec in progress or stalled (STX, XML Processing, XQuery update). Speaking of what is currently in preparation (i.e. an XML Streaming XG at W3C). And taking the time to present what has already been done in SGML time (Balise and Omnimark, cursor idea that can be find in Arbortext OID in ACL, etc.)

Then the goal is to present all the area where some work has still to be done and give some hints on an elaborated vision of XML Processing trough different kind of process : around constraints, normalizing, streamable path, multilayer transformation, and last but not least constraints aware streamable path. Some light will be spot on static analysis of XSLT and XQuery to detect streamable instances. What are the needed evolutions of the cursor model? What are XDuce-like languages added values?

Processing OpenDocument
Lars Oppermann
Sun Microsystems

This paper explains the basic structure of OpenDocument, an open XML based file file format for office applications, and how standard XML processing tools can be used to create applications beyond the scope of traditional office-productivity applications. We will introduce the OpenDocument package structure and the main package components such as content, styles, and metadata. The package structure can be accessed with standard mechanisms available in a wide range of platforms. We will introduce a set of Java classes that facilitate the access of resources included in an OpenDocument package which are available as part of openoffice.org project. We will show how XSLT can be used to extract specific information from text documents and spreadsheets. Best practices derived from our own work with XSLT on OpenDocument will be provided.

We will demonstrate how documents can be used as input to business processes and how such processes can assemble documents from scratch or based on predefined templates. Similar scenarios have previously required the automation of a full office-productivity application. For many scenarios such a solutions neither scaled very well, nor did they always offer the robustness required in a component of a back-end service. Open-standards based file formats thus offer the possibility to efficiently process documents in environments that go beyond the limits of traditional desktop applications. We will also explore practical limitations to the kinds of processing which can be performed outside the context of an application which actually renders the document.

Generative XPath
Oleg Parashchenko
Saint-Petersburg State University

The most convenient approach to navigate over XML trees is to use XPath queries. But there is no reason to limit ourselves to XML only. Indeed, it's useful to have XPath for navigating over arbitrary tree-like structures. There are a number of projects, in which developers have tried to implement XPath over project-specific hierarchical data. Unfortunately, most of these attempts resulted in something that resembled XPath, but was not XPath. The problem is that implementing XPath, even version 1.0, is a difficult task. We propose an alternative approach. Generative XPath is an XPath 1.0 processor that can be adapted to different hierarchical memory structures and different programming languages. Customizing Generative XPath to a specific environment is several magnitudes of order easier than implementing XPath from scratch.

The Generative XPath framework consists of three components:

XPath compiler,
XML virtual machine,
native (customization) layer.

The XPath compiler transforms XPath expressions to an executable code for the virtual machine. During execution, the code interacts with the native layer to access the tree nodes and its properties.

This paper explains what the virtual machine is, what is expected from the customization layer, and how they work together. Also, background information about the design and implementation of Generative XPath is given.

Beyond the simple pipeline: managing processes over time
Geert Bormans
Independent Consultant

This paper shows an application that takes XML pipeline processing, to a next level. The paper shows the first results of a project that requires version management of pipeline components. As will be shown, the approach taken in this project is generally applicable for building complex XML processing systems with a lesser risk.

A company sends printed statements to customers on a regular basis. They are legally bound to archiving an exact copy of the statement that was sent. Storing the scanned statements or a PDF version would dramatically increase the hardware storage requirements. Instead, the approach was taken to archive the XML source document of the different statements and to version manage the processes with it, ready for execution.

At the heart of the solution sits a pipeline component management system and a process execution kernel. The steps in the processes are all defined as URI-addressable resources. The kernel manages to pull all these resources together at run time, and executes the pipeline as requested. The architecture used in this project has general applicability for the version management of pipeline components. When a complex XML processing application would be layered on top of the architecture presented above, it would be possible to make changes to processes, without breaking the working part. This can be extremely helpful when an application requires changes to the schema, the meta-data-model, the link resolution mechanisms, etc.

A Generic Transformation Architecture
Bryan Rasmussen
Danish OIOXML

This talk describes the implementation of a Generic Transformation Architecture, the requirements and difficulties of an XSL-T framework where it must be considered that the potential amount of inputs and the potential amount of outputs can be infinite.

The implementation will show the following benefits:

Increased transportability of transformations.
Allows for better seperation of developer and maintainer duties in XSL-T centricdevelopment (along a 3-tiered model).
Showing how cross-media development is improved.
The XSL-T implemented templating language funx will be discussed and how configuration of multiple media solutions requires composability.
Configuration of such a framework.

A Theoretical and practical focus will also be given to the needs of the Danish Government OIO project for XSL-T usage, given the ISB repository of namespaces and XML based formats of thousands of potential input formats - http://isb.oio.dk

Demonstrations and discussion of examples will include:

How implementing the well-known DocBook XSL-T transformations in this method makes them easier for others to maintain and work with.
Examples of various document types transformed into various media provided as part of the GTA, cross-media scripting using Funx.
Code examples pertinent to generating media in XSL-FO, HTML, Open Office with Xforms.
Generation of XSL-T, using it in XML Pipelines, using it for generation of Example XML files from templates.

PHP XML BoF
organized by Shaun Rowe

Discussion of state of the art techniques for processing XML with PHP.

eXist developers meeting
organized by Adam Retter

Informal session of developers of eXist XML database.

Gold Sponsors