le-tex logo
 

A Text Structure “Epischema”
for TEI

Gerrit Imsieke (@gimsieke), le-tex publishing services GmbH (@letexml)

XML Prague, 2017-02-11

http://tiny.cc/epischema

What this presentation is about

Schema customization, in particular:

Narrowing what is allowed in certain contexts

  • that can be specified by a grammar
  • that is applied on top of a base grammar
  • that permits accurate and efficient content completion
  • by adding a so-called epischema.

Descriptive vs. Prescriptive Schemas

  • Descriptive
    • flexibility to mark up diverse content
  • Prescriptive
    • enables authoring support
    • predictability for conversion tools

Structural Divisions in TEI

<body>
   <div>
     <head>Chapter 1</head>
     <p>Text.</p>
     <div>
       <head>1.1 A Section</head>
       <p>More text.</p>
     </div>
     <div>
       <head>1.2 Another Section</head>
       <p>Even more text.</p>
     </div>
   </div>
</body>

Structural Divisions in TEI

<body>
   <div type="chapter">
     <head>Chapter 1</head>
     <p>Text.</p>
     <div type="section">
       <head>1.1 A Section</head>
       <p>More text.</p>
     </div>
     <div type="section">
       <head>1.2 Another Section</head>
       <p>Even more text.</p>
     </div>
   </div>
</body>

Further structural constraints: Bibliography

Bibliography divs should come after the sections. They should contain an optional head that is followed by listBibl withouthead.

<body>
   <div type="chapter">
     <head>Chapter 1</head>
     <p>Text.</p>
     <div type="section">
     […]
     <div type="bibliography">
       <head>“Bibliography” heading here?</head>
       <listBibl>
         <head>or here?</head>
         <bibl n="1">Bibliographic Entry</bibl>
       </listBibl>
     </div>
   </div>
</body>

Further structural constraints: Floating Text

floatingText should be restricted to certain types (sidebar, box, letter, …)

<div type="section">
 <head n="1.1">A Section</head>
 <p>A paragraph.</p>
 <floatingText type="box">
   <body>
     <div type="section">
       <head>An interspersed Box</head>
     </div>
   </body>
 </floatingText>
 <p>Another paragraph.</p>
</div>

Desirable features of a constraint mechanism

  • No dependence on pre-existing schema design decisions
    (content building block granularity, in particular)
  • No need to know the internal wirings of the base schema at all
  • In-editor and batch validation support
  • Context-dependent, constraint-aware content completion

How to enforce structural constraints?

Candidates:

  • Extend the schema
  • Schematron
  • “Epischema”

Extend the schema

With RELAX NG, designing schemas that can be restricted without complete redefinition is more difficult than designing schemas that are easy to extend.

Eric van der Vlist (2003): RELAX NG, Ch. 12, Writing Extensible Schemas

Extend the schema

From the tei_allPlus.rng customziation:

<define name="div">
  <element name="div">
     <a:documentation>(text division) 
     contains a subdivision of the front, body, 
     or back of a text.</a:documentation>
     <group> [49 lines]
     <ref name="att.global.attributes"/>
     <ref name="att.divLike.attributes"/>
     <ref name="att.typed.attributes"/>
     <ref name="att.declaring.attributes"/>
     <empty/>
  </element>
</define>

No hooks for context-dependent div attributes

Extend the schema

  • Lots of redundancy in forking div into context-dependent definitions.
  • TEI’s ODD metaschema language not (yet?) suited, either.
  • Several other approaches conceivable:

Extend the schema

Still:

  • Knowledge and manipulation of base schema’s internal wirings necessary.
  • XSD assertions are as (little) useful for content completion as Schematron.

Schematron

<pattern id="top-level-div">
  <rule context="tei:body/tei:div">
    <assert test="@type = ('part', 'chapter')">
      Type must be 'part' or 'chapter'.</assert>
  </rule>
</pattern>
<pattern id="chapter-div">
  <rule context="tei:div[@type = 'chapter']/tei:div">
    <assert test="@type = ('section', 'bibliography')">
      Type must be 'section' or 'bibliography'.</assert>
  </rule>
</pattern>

Schematron

<pattern id="bib">
  <rule context="tei:div[@type = 'bibliography']">
    <report test="following-sibling::tei:div[@type = 
                      ('part', 'chapter', 'section')]">
      No regular content allowed after bibliography.
    </report>
  </rule>
</pattern>

…and several more rules

Schematron

Chapter schematron applied to previous example TEI document results in two validation messages

Schematron

  • orthogonal, non-invasive
  • no content completion
  • translating a complex grammar into assertions cumbersome
    (although there is a tool for that)

Add a second grammar

Like this:

<?xml-model 
  href="https://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" 
  type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model 
  href="http://www.le-tex.de/resource/schema/tei-cssa/docbook-like-divs.rng" 
  type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?> 
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  […]

Add a second grammar:
The “almost anything” pattern

start = tei-div-epi_anything
tei-div-epi_anything =
  element * - (div | floatingText | listBibl) {
    tei-div-epi_any-atts,
    (text
     | (tei-div-epi_floatingText-poetry
        | tei-div-epi_floatingText-other
        | tei-div-epi_front
        | tei-div-epi_body 
        | tei-div-epi_back
        | tei-div-epi_anything))*
  }
  | tei-div-epi_floatingText-poetry
  | tei-div-epi_floatingText-other

Add a second grammar (body)

tei-div-epi_body =
  element body {
    tei-div-epi_any-atts,
    ((tei-div-epi_part  |  tei-div-epi_anything)*
     | (tei-div-epi_chapter|  tei-div-epi_anything)*)
  }

Add a second grammar (chapters, sections)

tei-div-epi_chapter =
  element div {
    attribute type { "chapter" },
    tei-div-epi_not-type-atts,
    tei-div-epi_sections,
    (tei-div-epi_glossary | tei-div-epi_bibliography)*
  }

tei-div-epi_sections =
  (tei-div-epi_section | tei-div-epi_anything)*
  | (tei-div-epi_sect1 | tei-div-epi_anything)*

Add a second grammar (listBibl)

tei-div-epi_bibliography =
  element div {
    attribute type { "bibliography" },
    tei-div-epi_not-type-atts,
    element head {
      tei-div-epi_any-atts, (text | tei-div-epi_anything)*
    }?,
    element listBibl { tei-div-epi_any-atts, bibl* }
  }
bibl =
  element bibl { 
    tei-div-epi_any-atts, (text | tei-div-epi_anything)* 
  }

Add a second grammar

  • orthogonal, non-invasive
  • constricts as desired, at least with respect to validation
  • in oXygen, only the first associated schema will be used for content completion

Wrap both schemas in NVDL

Document/schema association

<?xml-model type="application/xml"
href="http://www.le-tex.de/resource/schema/tei-cssa/tei_allPlus_docbook-like-divs.nvdl"  
schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  […]

Wrap both schemas in NVDL

<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
  startMode="tei">
  <mode name="tei">
    <namespace ns="http://www.tei-c.org/ns/1.0">
      <validate useMode="extensions" 
        schema="http://[…]/tei_allPlus.rng"/>
      <validate useMode="allow"
        schema="http://[…]/docbook-like-divs.rng"/>
    </namespace>
  </mode>

Wrap both schemas in NVDL

  <mode name="extensions">
    <namespace ns="http://www.w3.org/1998/Math/MathML">
      <attach/>
    </namespace>
    <namespace ns="http://www.w3.org/2000/svg">
      <attach/>
    </namespace>
  </mode>
  <mode name="allow">
    <anyNamespace>
      <allow/>
    </anyNamespace>
  </mode>
</rules>

Wrap both schemas in NVDL

Looking good so far in oXygen 18.1:

'chapter' and 'part' as choices for a top-level div

Wrap both schemas in NVDL

However:

oXygen 18: Completion suggestions = union of individual schema suggestions
oXygen 19: Completion suggestions = intersection of individual schema suggestions

Thanks, George Bina!

Alternative Epischema Approaches

  • XSD
    • possible with XSD 1.1 (thanks to xs:any/@notQName)
    • however, associating multiple XSDs not supported by tools
  • HyTime Architectural Forms
    • just kidding
  • Good news: You can put a Relax NG epischema on top of any other base schema association mechanism (YMMV wrt your tools’ compound validation / content completion capabilities)

Other Epischema use cases

  • Within TEI: limit what is allowed in text/body/div
  • Prescriptive HTML authoring schemas
    • don’t use div, use section
    • no paragraph-like content after a section ends
    • context-aware @class values
  • DITA maybe?

Conclusion

Epischema

  • it can be done
  • it is useful
  • tool support is on its way
    • oXygen 19 due out in April, 2017
  • it has a cool name

Thank you!

http://tiny.cc/epischema

The full paper is on p. 195 of the XML Prague 2017 Proceedings