Skip to content

Lesson: make your terminology aware of xml namespaces

Mark Bussey edited this page Aug 27, 2014 · 9 revisions

This Tutorial is known to work with om version 3.0.4.
Please update this wiki to reflect any other versions that have been tested.

Goals

  • Define Terms in a Terminology that refer to XML elements with specific attribute values

Explanation

Steps

Step 1: Think about what the XML is going to look like

Unlike the previous lessons, what if you want to create mods xml with the mods namespace and a root node of <mods> instead of <fields>

<mods version="3.0" xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <mods:name>
   <mods:namePart type="given">Zoia</mods:namePart>
   <mods:namePart type="family">Horn</mods:namePart>
   <mods:role>
     <mods:roleTerm type="text">Author</mods:roleTerm>
     <mods:roleTerm type="code">AUT</mods:roleTerm>
   </mods:role>
 </mods:name>
</mods>

Disclaimer about MODS We are not actually creating valid MODS XML here. We're just using the mods namespace as an example.

Step 2: Modify the Terminology

Reopen fancy_book_metadata.rb and modify the line that calls t.root so it declares a path of "mods" instead of "fields" and declares an :xmlns that points to the uri of your namespace.

t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3")

You also need to update the .xml_template method to match these changes

  def self.xml_template
    Nokogiri::XML.parse('<mods xmlns="http://www.loc.gov/mods/v3"/>')
  end

Setting the :xmlns on the root of an OM Terminology makes it the default namespace for all of the Terms.

Step 3: Try modifying an OM Document based on the Terminology

bundle console

Require the FancyBookMetadata class definition.

require "./fancy_book_metadata"
fancybook = FancyBookMetadata.new

Now rerun the same commands we ran in the last lesson and see what's different about the resulting XML.

fancybook.name.given_name = "Zoia"
 => "Zoia" 
fancybook.name.family_name = "Horn"
 => "Horn" 
fancybook.name.role.text = "author"
 => "author" 
fancybook.name.role.code = "AUT"
 => "AUT" 
fancybook.name(1).family_name = "Caesar"
 => "Caesar" 
fancybook.name(1).given_name = "Julius"
 => "Julius" 
fancybook.name(1).role.text = "Contributor"
 => "Contributor" 
fancybook.name(1).role.code = "CON"
 => "CON" 
puts fancybook.to_xml
<mods xmlns="http://www.loc.gov/mods/v3">
  <name><namePart type="given">Zoia</namePart><namePart type="family">Horn</namePart><role><roleTerm type="text">author</roleTerm><roleTerm type="code">AUT</roleTerm></role></name>
  <name><namePart type="family">Caesar</namePart><namePart type="given">Julius</namePart><role><roleTerm type="text">Contributor</roleTerm><roleTerm type="code">CON</roleTerm></role></name>
</mods>
 => nil 

Now the mods namespace is declared as the xmlns on the root of the XML document. Though this looks like a small change, it has an important impact on how XPath queries are run against any XML documents you parse with this Terminology.

Put the following into a file called funny_sample.xml

<mods xmlns:mods="http://www.loc.gov/mods/v3">
  <name>
    <namePart type="given">Zoia</namePart>
    <namePart type="family">Horn</namePart>
    <role>
      <roleTerm type="text">author</roleTerm>
      <roleTerm type="code">AUT</roleTerm>
      </role>
  </name>
  <name>
    <namePart type="family">Caesar</namePart>
    <namePart type="given">Julius</namePart>
    <role>
      <roleTerm type="text">Contributor</roleTerm>
      <roleTerm type="code">CON</roleTerm>
    </role>
  </name>
</mods>

Notice that the document declares the mods namespace but none of the nodes use that namespace. This means that our terminology will not find them.

bundle console
require "./fancy_book_metadata"

file = File.new("funny_sample.xml")
funnysample = FancyBookMetadata.from_xml(file)
funnysample.name.count
 => 0

Now edit the funny_sample.xml file so that your elements are all in the mods namespace

<mods xmlns:mods="http://www.loc.gov/mods/v3">
  <mods:name>
    <mods:namePart type="given">Zoia</mods:namePart>
    <mods:namePart type="family">Horn</mods:namePart>
    <mods:role>
      <mods:roleTerm type="text">author</mods:roleTerm>
      <mods:roleTerm type="code">AUT</mods:roleTerm>
      </mods:role>
  </mods:name>
  <mods:name>
    <mods:namePart type="family">Caesar</mods:namePart>
    <mods:namePart type="given">Julius</mods:namePart>
    <mods:role>
      <mods:roleTerm type="text">Contributor</mods:roleTerm>
      <mods:roleTerm type="code">CON</mods:roleTerm>
    </mods:role>
  </mods:name>
</mods>

Now re-open the file, parse it again and re-run the query for names. Because the <name> nodes are now in the mods namespace, they will be found by the XPath query.

bundle console
require "./fancy_book_metadata"

file = File.new("funny_sample.xml")
funnysample = FancyBookMetadata.from_xml(file)
funnysample.name.count
 => 2 

How does this work? How does OM handle these namespaces? In short, when you have declared a namespace on your Terminology, OM injects that namespace into its XPath queries. Look at the XPath for the name Term.

funnysample.name.xpath
 => "//oxns:name" 

If you remember in an earlier lesson when you had a Terminology that didn't have :xmlns defined, calling funnysample.name.xpath would have returned //name, but now it returns //oxns:name. This "oxns" is a placeholder that OM uses to signify "whichever namespace has been set as the default namespace on the Terminology". To see what namespaces are being used by a Terminology, use the namespaces method on the OM Document's associated Terminology.

FancyBookMetadata.terminology.namespaces
 => {"xmlns"=>"http://www.loc.gov/mods/v3", "oxns"=>"http://www.loc.gov/mods/v3"} 
funnysample.class.terminology.namespaces
 => {"xmlns"=>"http://www.loc.gov/mods/v3", "oxns"=>"http://www.loc.gov/mods/v3"} 

Next Step

Go on to Lesson: Parse an Existing XML File with a Terminology or return to the Tame your XML with OM page.