Wednesday, May 30, 2012

EPUB 3 updates

oXygen XML editor version 14 updates the EPUB support to version 3.0. We made a beta announcement last week on the oxygen-user list:
but since then we already made some additional improvements to the EPUB 3 support, more precisely the OPF and OCF schemas were updated and a new document template to create a new EPUB 3.0 archive was added. So, if you want to test the EPUB support read the announcement to get the test license key for version 14 but get the installation kits below instead:

Mac OS X:
Windows 64 bits

Saturday, May 5, 2012

Getting from an EPUB to an XML file

Just noticed the following on my twitter stream!/matthewdiener/status/198163400587612160
Has anyone used oXygen XML Editor (or any other program) to transform an ePUB to XML? Any XSLT out there? #eprdctn
5/4/12 12:33 AM
and as there is no solution I know of I wrote the following XSLT 2.0 stylesheet that converts an EPUB to one XML file. Basically it identifies the toc file from the EPUB by looking into the container.xml first, then identifying the content file and then the toc file. The toc file is processed then and the content referred by that is added in the result, thus obtaining one single XML file. An xml:base attribute is added on each included content to allow any relative resolving references correctly.Here it is the stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl=""
    exclude-result-prefixes="xs c opf ncx"
    <xsl:param name="epub" select="'file:/Users/george/Documents/workspace/eXml/samples/epub/flowers.epub'"/>
    <xsl:variable name="containerFile" select="concat('zip:', $epub, '!/META-INF/container.xml')"/>
    <xsl:variable name="containerData" select="document($containerFile)"/>
    <xsl:variable name="content" select="$containerData/c:container/c:rootfiles/c:rootfile[@media-type='application/oebps-package+xml']/@full-path"/>
    <xsl:variable name="contentFile" select="concat('zip:', $epub, '!/', $content)"/>
    <xsl:variable name="contentData" select="document($contentFile)"/>    
    <xsl:variable name="tocData" select="document($contentData/opf:package/opf:manifest/opf:item[@media-type='application/x-dtbncx+xml']/@href)"/>

    <xsl:template name="main">
            <xsl:apply-templates select="$tocData" mode="copy"/>
    <xsl:template match="node() | @*" mode="copy">
            <xsl:apply-templates select="node() | @*" mode="copy"/>
    <xsl:template match="ncx:content[@src]" mode="copy">
            <xsl:copy-of select="@*"/>
                <xsl:when test="contains(@src, '#')">
                    <xsl:variable name="file" select="substring-before(@src, '#')"/>
                    <xsl:variable name="id" select="substring-after(@src, '#')"/>
                    <xsl:attribute name="xml:base" select="$file"/>
                    <xsl:apply-templates select="document($file, .)//*[@id=$id]" mode="copy"/>
                    <xsl:attribute name="xml:base" select="@src"/>
                    <xsl:apply-templates select="document(@src)" mode="copy"/>

Note that the stylesheet is an XSLT 2.0 stylesheet and its entry point is a template called main. In oXygen you need to configure a transformation scenario that uses Saxon 9 PE as XSLT processor
and then use the "Advanced options" button that you find next to the processor to specify the initial template to "main"
The XML URL should be left empty.
The stylesheet has one parameter called "epub". If you set this as ${afu} which stands for archive file URL then oXygen will expand that at runtime to the URL of the current archive/EPUB open in the Archive Browser View.