The list of sessions is complete now. Participants are encouraged to present their posters during the conference.
Processing XML With Fun
If you find XML processing dull and boring, then you are probably using last century's techniques such as the DOM and this talk is for you.
You will see during the two days of XML Prague 2007 that you have no excuse to process XML without fun and in this presentation I'll do a quick review of the most promising techniques that can save you from the DOM without loosing the power of XML: it can be seen as a road map that will try to give you the big picture before the following speakers lead you through more detailed areas.
The focus of the talk will be on XMP pipeline languages, data binding APIs and programming extensions.
XProc: An XML Pipeline Language
This presentation will explore the design and continued progress on XProc: An XML Pipeline Language currently being developed by the XML Processing Model Working Group at the W3C. The presentation will identify some of the use cases associated with XProc, describe highlights of the current design, and discuss the state of the latest working draft. If possible, the presentation will include a demonstration of XProc pipelines in action.
Python and XML
Python is a popular language for general-purpose development, including Web development, but it has not always had the coziest relationship with XML. There is a history of often unnecessary philosophical differences between boosters of Python and of XML. Partly because of this the state of the art has been unfortunately slow to develop, despite the work of many open-source developers in the XML-SIG and elsewhere. At present there are several options for XML processing in Python. Because of Python's flexibility and the breadth of library and tool options, it can be a very productive language for XML projects. In this session Uche Ogbuji, long-time Python and XML columnist, discusses the more prominent ways to process XML in Python, touching on pros, cons and other characteristics of each. The presentation will include a good deal of code samples and battle stories.
Applications of XML pipelines to web applications with XPL
The XProc XML pipeline language is well on its way to be standardized at W3C. But, exactly, what are XML pipelines good for? And how do they work in practice?
In this talk, we attempt to answer these questions by presenting use cases for XML pipelines implemented with XPL, a close cousin of XProc. We show in particular how XML pipelines fill a niche in the constantly evolving web applications ecosystem. Can XML pipelines help deal with multiple web browsers? With REST services? With the plethora of syndication formats such as RSS and Atom? With Ajax? We suggest that the answer is yes in all these cases.
We also show how XML pipelines can play a particularly interesting role when used in conjunction with XForms.
The talk will feature live demonstrations using open source software.
There are two major perspectives for thinking about and understanding XLinq. From one perspective you can think of XLinq as a member of the LINQ Project family of technologies with XLinq providing an XML Language Integrated Query capability along with a consistent query experience for objects, relational database (DLinq), and other data access technologies as they become LINQ-enabled. From a another perspective you can think of XLinq as a full feature in-memory XML programming API comparable to a modernized, redesigned Document Object Model (DOM) XML Programming API plus a few key features from XPath and XSLT.
XLinq represents a new, modernized in-memory XML Programming API. XLinq was designed to be a cleaner, modernized API, as well as fast and lightweight. XLinq uses modern language features (e.g., generics and nullable types) and diverges from the DOM programming model with a variety of innovations to simplify programming against XML. Even without Language Integrated Query capabilities XLinq represents a significant stride forward for XML programming.
This presentation will briefly introduce DocBook and discuss the ongoing development of DocBook V5.0. There will be plenty of opportunity for audience participation and wide-ranging discussion of issues directly, or at least tangentially, related to DocBook.
Leapfrogging microformats with XML, linking, and more
Some microformats are straightforward, useful and unobjectionable. Others, including many of the popular ones, abuse HTML, are poorly specified, and are quite prone to confusion. When designed and applied without careful consideration, microformats can detract from the value of the structured information they seek to provide. Beyond the simplest class of microformats it is often better to avoid hackery and embrace the native structure of the Web. XML and other natural data representaton technologies such as JSON are just as viable as many of their counterparts in microformats. The main argument against these is that microformats provide graceful degradation for unsophisticated Web clients. But such graceful degradation can also be achieved through the power of linking. A Web page can still be a Web page, and not a scaffolding for a bunch of ill-fitting and ill-specified records. All it has to do is link to those records in their native format. More sophisticated browsers can be dressed up with all the AJAX gear you like, loading simple, linked XML or JSON into dynamic views while crawlers and legacy Web clients can access the structured information through user-actuated links. This session discusses these simple techniques, and provides detailed reasons for why one should be a little bit cautious in the face of the microformats hype.
Open XML Overview
Office XML Formats for the 2007 Office system introduce or improve many types of solutions involving documents that developers can build. You can access the contents of an Office document in Office XML Formats by using any tool or technology capable of working with ZIP archives. The document content can then be manipulated using any standard XML processing techniques, or for parts that exist as embedded native formats, such as images, processed using any appropriate tool for that object type.
You will see basic concepts of Open Packaging Conventions (OPC), WordprocessingML and SpreadsheetML. OPC is fundamental for all documents types, WordprocessingML is used for text documents and SpreadsheetML is used for spread sheets. There are more "MLs" but you will see them just briefly.
XML Processing by Streaming
The first part will be to present the state of the art of XML Streaming processing by reviewing the products in place (joost, cocoon, saxon, etc.), the API available (SAX, Stax, XOM), languages (CDuce, XDuce, XJ), and the spec in progress or stalled (STX, XML Processing, XQuery update). Speaking of what is currently in preparation (i.e. an XML Streaming XG at W3C). And taking the time to present what has already been done in SGML time (Balise and Omnimark, cursor idea that can be find in Arbortext OID in ACL, etc.)
Then the goal is to present all the area where some work has still to be done and give some hints on an elaborated vision of XML Processing trough different kind of process : around constraints, normalizing, streamable path, multilayer transformation, and last but not least constraints aware streamable path. Some light will be spot on static analysis of XSLT and XQuery to detect streamable instances. What are the needed evolutions of the cursor model? What are XDuce-like languages added values?
This paper explains the basic structure of OpenDocument, an open XML based file file format for office applications, and how standard XML processing tools can be used to create applications beyond the scope of traditional office-productivity applications. We will introduce the OpenDocument package structure and the main package components such as content, styles, and metadata. The package structure can be accessed with standard mechanisms available in a wide range of platforms. We will introduce a set of Java classes that facilitate the access of resources included in an OpenDocument package which are available as part of openoffice.org project. We will show how XSLT can be used to extract specific information from text documents and spreadsheets. Best practices derived from our own work with XSLT on OpenDocument will be provided.
We will demonstrate how documents can be used as input to business processes and how such processes can assemble documents from scratch or based on predefined templates. Similar scenarios have previously required the automation of a full office-productivity application. For many scenarios such a solutions neither scaled very well, nor did they always offer the robustness required in a component of a back-end service. Open-standards based file formats thus offer the possibility to efficiently process documents in environments that go beyond the limits of traditional desktop applications. We will also explore practical limitations to the kinds of processing which can be performed outside the context of an application which actually renders the document.
The most convenient approach to navigate over XML trees is to use XPath queries. But there is no reason to limit ourselves to XML only. Indeed, it's useful to have XPath for navigating over arbitrary tree-like structures. There are a number of projects, in which developers have tried to implement XPath over project-specific hierarchical data. Unfortunately, most of these attempts resulted in something that resembled XPath, but was not XPath. The problem is that implementing XPath, even version 1.0, is a difficult task. We propose an alternative approach. Generative XPath is an XPath 1.0 processor that can be adapted to different hierarchical memory structures and different programming languages. Customizing Generative XPath to a specific environment is several magnitudes of order easier than implementing XPath from scratch.
The Generative XPath framework consists of three components:
The XPath compiler transforms XPath expressions to an executable code for the virtual machine. During execution, the code interacts with the native layer to access the tree nodes and its properties.
This paper explains what the virtual machine is, what is expected from the customization layer, and how they work together. Also, background information about the design and implementation of Generative XPath is given.
Beyond the simple pipeline: managing processes over time
This paper shows an application that takes XML pipeline processing, to a next level. The paper shows the first results of a project that requires version management of pipeline components. As will be shown, the approach taken in this project is generally applicable for building complex XML processing systems with a lesser risk.
A company sends printed statements to customers on a regular basis. They are legally bound to archiving an exact copy of the statement that was sent. Storing the scanned statements or a PDF version would dramatically increase the hardware storage requirements. Instead, the approach was taken to archive the XML source document of the different statements and to version manage the processes with it, ready for execution.
At the heart of the solution sits a pipeline component management system and a process execution kernel. The steps in the processes are all defined as URI-addressable resources. The kernel manages to pull all these resources together at run time, and executes the pipeline as requested. The architecture used in this project has general applicability for the version management of pipeline components. When a complex XML processing application would be layered on top of the architecture presented above, it would be possible to make changes to processes, without breaking the working part. This can be extremely helpful when an application requires changes to the schema, the meta-data-model, the link resolution mechanisms, etc.
A Generic Transformation Architecture
This talk describes the implementation of a Generic Transformation Architecture, the requirements and difficulties of an XSL-T framework where it must be considered that the potential amount of inputs and the potential amount of outputs can be infinite.
The implementation will show the following benefits:
A Theoretical and practical focus will also be given to the needs of the Danish Government OIO project for XSL-T usage, given the ISB repository of namespaces and XML based formats of thousands of potential input formats - http://isb.oio.dk
Demonstrations and discussion of examples will include:
Discussion of state of the art techniques for processing XML with PHP.
Informal session of developers of eXist XML database.