RAD VI: Java APIs from JRuby

[RAD stands for Ruby Ape Diaries, of which this is part VI.] The reason I first built the Ape in JRuby was so I could get at all those nice Java APIs, and that turned out to be a good reason. Of course, there is a bit of impedence mismach, and I ended up writing some glue code. I kind of suspect that, should JRuby catch on, there’s going to be scope for quite a bit of this glue-ware. This fragment is just a few examples of the genre, to provide examples and perhaps provoke thought.

All but one of these examples omit an initial require 'java' and a bunch of lines of include_class.

Table of Contents · I can’t imagine that anyone would actually want to read the whole thing.

Parsing an XML Document
XML Escaping and Unescaping
Serializing XML
XPath
Attribute Hashing
Rubyfying a NodeList
Calling Jing

Parsing an XML Document · There are lots of ways to do this, but I’m using about the plainest-vanilla Java approach. This is somewhat but not entirely unlike REXML::Document.new.

def Document.new(text)
  begin
    unless @@dbf
      @@dbf = DocumentBuilderFactory.newInstance
      @@dbf.setNamespaceAware true
    end

    db = @@dbf.newDocumentBuilder
    @dom = db.parse(InputSource.new(StringReader.new(text)))

  rescue NativeException
    @last_error = $!.to_s
  end
end

XML Escaping and Unescaping · Not strictly Java-related, but kind of interesting. I’m sure this is somewhere in Ruby but for some reason I couldn’t find it, so this code survives in the native-Ruby version of the Ape.

def Parser.escape(text)
  text.gsub(/([&<'">])/) do
    case $1
    when '&' then '&amp;'
    when '<' then '&lt;'
    when "'" then '&apos;'
    when '"' then '&quot;'
    when '>' then '&gt;'
    end
  end
end

def Parser.unescape(text)
  text.gsub(/&([^;]*);/) do
    case $1
    when 'lt' then '<'
    when 'amp' then '&'
    when 'gt' then '>'
    when 'apos' then "'"
    when 'quot' then '"'
    end
  end
end

Serializing XML · The XML is stored in an org.w3c.dom structure. This is just a partial implementation; it works for me because I produce an entity-free DOM. This is more elaborate than it needs to be because I thought I was going to have to take care of all the namespace-declaration book-keeping, but no, it seems that the DOM has all the xmlns goo appearing as if they were actual attributes. Go figure.

It’s interesting to note that you access Java fields using Ruby’s :: separator.

  class Unparser

    def initialize output
      @output = output
    end

    def unparse doc
      unparse1 doc.dom
    end

    def unparse1 node
      case node.getNodeType
      when Node::CDATA_SECTION_NODE, Node::TEXT_NODE
        out Parser.escape(node.getNodeValue)
      when Node::COMMENT_NODE, Node::PROCESSING_INSTRUCTION_NODE
        # bah
      when Node::DOCUMENT_NODE
        node = node.getFirstChild
        unparse1 node
      when Node::DOCUMENT_TYPE_NODE
        node = node.getNextSibling
        unparse1 node
      when Node::ELEMENT_NODE
        unparseStartTag node
        Nodes.each_node(node.getChildNodes) { |child| unparse1(child) }
        unparseEndTag node
      when Node::ENTITY_NODE, Node::ENTITY_REFERENCE_NODE, Node::NOTATION_NODE
        raise(ArgumentError, "Floating XML goo, can't serialize")
      else
        raise(ArgumentError, "Unrecognized node type #{node.getNodeType}")
      end
    end

    def out str
      @output << str.to_s
    end

    def unparseStartTag node
      out '<'
      unparseName node
      Nodes.each_node(node.getAttributes) { |a| unparseAttribute(a) }
      out '>'
    end

    def unparseAttribute attr  
      out ' '
      unparseName attr
      out '="'
      out Parser.escape(attr.getNodeValue)
      out '"'
    end

    def unparseName node
      out node.getNodeName
    end

    def unparseEndTag node
      out '</'
      unparseName node
      out '>'
    end
  end

XPath · The three useful XPath functions in REXML are each, first, and match. Here are the JRuby versions.

  class XPath 
    @@xpf = nil

    def XPath.match node, path, namespaces={}
      node = fixNode node
      xp = XPath.newXP namespaces
      collect xp.evaluate(path, node, XPathConstants::NODESET)    
    end

    def XPath.each node, path, namespaces = {}
      node = fixNode node
      xp = XPath.newXP namespaces
      list = xp.evaluate(path, node, XPathConstants::NODESET)
      collect(list).each { |node| yield node }
    end

    def XPath.first node, path, namespaces = {}
      node = fixNode node
      xp = XPath.newXP namespaces
      xp.evaluate(path, node, XPathConstants::NODE)
    end

I’ve tested them and they do what you’d expect, the real Ruby versions dropped in without anything breaking. Now, there are some interesting housekeeping functions, all private. In particular, newXP, which needs to exist because a Java XPath needs an interface to call back to get namespace-prefix mappings. (Cognoscenti will note the theft of a line of code from REXML:Xpath).


    def XPath.newXP namespaces
      raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash
      @@xpf ||= XPathFactory.newInstance
      xp = @@xpf.newXPath

      xp.namespaceContext= NSCT.new namespaces
      return xp
    end

You might wonder how you implement a Java Interface in JRuby? No problem, inherit from it just like any other class-like thingie. No need to implement any of the interface’s methods if you don’t use them. This feels strange, exotic, and cool to me.

  class NSCT < NamespaceContext

    def initialize namespaces
      super()  # Required due to bug JRuby-66
      @namespaces = namespaces
    end

    def getNamespaceURI prefix
      if prefix == 'xml' 
        XMLConstants::XML_NS_URI
      else
        @namespaces[prefix]
      end
    end

Attribute Hashing · REXML gives each element an attributes member which is a hash of attribute value by name. Here’s the Java-based namespace-sensitive version.

  class Attributes

    # should only be called by Element
    def initialize node
      @attrsNode = node.getAttributes
    end

    def [] name
      if name =~ /^(.*):(.*)$/
        ns = node.getNamespaceURI $1
        anode = @attrsNode.getNamedItemNS(ns, $2)
      else
        anode = @attrsNode.getNamedItem(name)
      end
      anode.getNodeValue
    end

  end

Rubyfying a NodeList · The problem is that lots of things in the Java DOM are NodeLists, which are retrieved by number. You can’t do anything in Ruby without an each-like method, so here it is.

    def each_node(list)
      len = list.getLength
      (0 ... len).each do |i|
        yield list.item(i)
      end
    end

Calling Jing · As I said before, the Jing API is fairly gnarly. This is perhaps the most extreme example of brute-forcing the way across the impedence mismatch. In this case I include all the includes.

require 'java'
include_class 'com.thaiopensource.validate.rng.CompactSchemaReader'
include_class 'com.thaiopensource.validate.ValidationDriver'
include_class 'org.xml.sax.InputSource'
include_class 'java.io.StringReader'
include_class 'java.io.StringWriter'
include_class 'com.thaiopensource.xml.sax.ErrorHandlerImpl'
include_class 'com.thaiopensource.util.PropertyMapBuilder'
include_class 'com.thaiopensource.validate.ValidateProperty'

class Validator
  attr_reader :error

  def initialize(text)
    @error = false
    @schemaError = StringWriter.new
    schemaEH = ErrorHandlerImpl.new(@schemaError)
    properties = PropertyMapBuilder.new
    properties.put(ValidateProperty::ERROR_HANDLER, schemaEH)

    @driver = ValidationDriver.new(properties.toPropertyMap,
                                   CompactSchemaReader.getInstance)
    if !@driver.loadSchema(InputSource.new(StringReader.new(text)))
      @error = @schemaError.toString;
    end
  end

  def validate(text)
    if @driver.validate(InputSource.new(StringReader.new(text)))
      return true
    else
      @error = @schemaError.toString
      return false
    end
  end
end

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

August 22, 2006
· Technology (90 fragments)
· · Dynamic Languages (45 more)
· · Java (123 more)
· · Ruby (93 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!