Object Commando languages, development and design


Clojure Protocols Part 3

Recently there have been some changes to the Clojure Protocols code out on Github. Not huge changes, but enough that the examples I wrote from Part 1 and Part 2 will no longer work. I thought I'd finish out my protocol blog entries by showing how I used it and include the new syntax. I also have a better understanding on how reify can be used (thanks Meikel) and will include some of that. First the goal of protocol usage. I have been working on some comparisons and evaluations of triplestores. Triplestores can be used to store RDF data which is a series of subject/predicate (or property)/object triples. There are many triplestores out there and of the triplestores that are out there, many have several interfaces. For example, Oracle has a JDBC interface that uses stored procedures and a Jena API that incorporates pieces of the Jena framework. This was some pretty low hanging fruit from an abstraction perspective. Whether inserting a new triple in Oracle JDBC, Jena (with Oracle) or one of the other triplestore impelementations, on the surface, it is the same. Take this subject, predicate and object and store it. The same could be said for querying it with SPARQL or deleting entries. I ended up with a protocol named TriplestoreOperations like below:

(ns revelytix.triplestore-operations)

(defprotocol TriplestoreOperations
  "Interface for the various operations allowed by a triple store"
  (create-graph [impl graph-name] "Creates a new graph of name graph-name")
  (delete-graph [impl graph-name] "Deletes graph graph-name if graph exists")
  (insert-quad [impl graph-name subject predicate object]
    "Creates a new triple, data is assumed to be a full URI")

This syntax is the same. The first argument is used to pass in the implementation of TriplestoreOperations. The graph-name or model in Oracle terms, is what is going to hold the triples. The protocol exists in one namespace (called triplestore-operations above) and the implementations of the interfaces are in separate namespaces. The first is an Oracle JDBC implementation of TriplestoreOperations. It's parameterized by the database connection details and the name of the table to store the data in.

(ns oracle.oracle-jdbc
  (:use clojure.contrib.sql

(deftype OracleJdbcOperations [db table-name]  TriplestoreOperations
  (delete-graph [impl graph-name]
	(let [drop-model-string (create-sql-string DROP-MODEL-SQL graph-name)
	      drop-table-string (create-sql-string DROP-TABLE-SQL table-name)]
	  (with-connection db
	      (with-open [drop-model-statement (.prepareCall (connection) drop-model-string)]
		  (drop-entailment-if-exists db graph-name "RDFS")
		  (.execute drop-model-statement)
		  (do-commands drop-table-string))))))
  (create-graph [impl graph-name]
      (let [createModelString (create-sql-string CREATE-MODEL-SQL graph-name table-name)
	    createTableString (create-sql-string CREATE-TABLE-SQL table-name)]
	(do (with-connection db
	      (with-open [createModelStatement
                                    (.prepareCall (connection) createModelString)]
		(do-commands createTableString)
		(.execute createModelStatement))))))
  (insert-quad [impl graph-name subject predicate object]
	       (create-family-triple table-name db graph-name subject predicate object))

  (defn create-oracle-jdbc-triplestore-instance [table-name]
           (OracleJdbcOperations *oracle-jdbc-props* table-name)) ;;Awkward see below

One difference between the above code and the code in Part 1 or Part 2 is that the implementation parameter in the previous version of deftype disappeared. So the create-graph function above would have had only had a single parameter. I like the change, I found the original code a little confusing, wondering where the first parameter went etc. The next implementation of the TriplestoreOperations protocol was a Jena implementation of the protocol. The below code makes use of the reify function and feels a little more idomatic Clojure and less like the implementation of a protocol is something special and different from just functions. I like the refiy syntax over deftype and I've been moving my code over to use it. I'm going to cut a decent portion of the implementation below because it mostly calls Java APIs and is a bit noisy:

(ns jena-operations
  (:use triplestore-operations)

(defn create-jena-operations-instance [jena-support-impl]
  (reify TriplestoreOperations
	  (create-graph [impl modelString] nil)
	  (delete-graph [impl modelString]
			(with-triplestore-connection ;...)
	  (insert-quad [impl modelString subject predicate object]
		       (with-triplestore-connection ;;...)

The reify function call above also creates a new instance of the protocol TriplestoreOperations with the functions defined in line. There's also not a need to create an instance of the type like is being done in the previous example. The end result, deftype or reify from a functionality perspective is the same, there's just a different way to get there. Reading through some of the docs, it looks like reify is more dynamic and deftype results in generated code. One difference between Jena and the Oracle JDBC interface is that graphs don't need to be created explicitly using Jena, so that method does nothing. The above code is slightly different as well in that the implementation parameter no longer disappears. Another interesting part is that the JenaOperations instance is parametrized by another protocol called JenaSupport. What I have found is that many vendors support the Jena APIs, but they implement it slightly different. It's definitely not as pluggable as something like JDBC. This JenaOperations implementation is generic for the Jena APIs and is used by several triplestores with Jena implementations. The JenaSupport protocol abstracts things like getting a Jena connection, creating the correct implementation of Model etc which is different from implementation to implementation.

Development Gotchas

I have found a few issues when developing Clojure code that uses protocols. I'm using Leiningen and Lein Swank for development of the code. First I found that if I had AOT compilation enabled, and had run lein install, the protocol definition results in compiled code in the classes directory of the project. Where this caused a problem was when I tried to change a protocol definition. I'd make a change in Emacs, load the file with the updated protocol code and behaviour of the code would be such that I made no change to the protocol at all. What was happening was the old version of the code, the one that had the interface code generated, was still on the class path in the classes directory. Removing that code (through lein clean or something similar) allowed my changes to take affect. This problem stumped me for a couple of hours. I can avoid this entirely by just not using the AOT compilation (I don't really need it) but others might not.

Another gotcha I found was in the loading of files that use implementations of protocols. In the example above, let's say I have a test file (I'll call it test-A) that executes functions from TriplestoreOperations on the JenaOperations implementation that in turn uses the Oracle implementation of JenaSupport. Just loading test-A.clj file does not cause the loading of the Jena implementation of the TriplestoreOperations, or the Oracle version of JenaSupport. Rather it just complains that there is not an implementation of TriplestoreOperations for 'nil'. Loading those files individually fixes the problem, it just doesn't do that automatically for me.

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment


No trackbacks yet.