Object Commando languages, development and design

4Nov/116

The Magical Island of Kanren – core.logic Intro Part 1

Overview

Previously I blogged on Appendo the Great without going into much detail on core.logic, what it is, how to use it etc. Hopefully this blog post will begin to fix that. This post was also inspired by a tweet from Ambrose Bonnaire-Sergeant on explaining a basic logic program with an unbound variable. That tweet got me thinking about how I would explain it.

First, core.logic is a Clojure implementation of miniKanren. miniKanren is a Logic Programming library embedded in Scheme. As the site states, it's mini in that there is another Kanren system with more bells and whistles, but is quite a bit more complex. The core.logic implementation follows the Scheme version pretty closely from a library user's perspective, so examples in books/papers such as Byrd's dissertation or The Reasoned Schemer translate pretty closely to the equivalent Clojure code.

The approach taken by the miniKanren folks gives the developer a lot of power by letting them pick the right tool for the job. Embedding the logic programming paradigm into a functional language allows you to use functional tools to solve your problems and delve into logic when it makes things easier. There's no need to interface with another language, you're in Clojure and you can use it directly.

Environment

Getting core.logic incorporated into your project works the same as any other Java or Clojure library, just include it in your project dependencies. Including it in your Leiningen project.clj would look like:

  :dependencies [...
                  [org.clojure/core.logic "0.6.5"]
                  ...]

Then lein deps and you're set. The examples below assume having a namespace setup like:

(ns logic-examples.ancenstors
  (:refer-clojure :exclude [==])
  (:use [clojure.core.logic]))

The Island of Kanren

Imagine the magical island of Kanren. On this island, just declaring things makes them so. There can be multiple right answers to a question and sometimes there are no answers at all. Here, seemingly familiar operations take on a different meaning and the natives speak with a unique dialect. Your code lives in functions and macros, just like any Clojure program. The key difference is your Kanren programs are really more like puzzles that you want the Kanren-ites to solve. Your logic functions on their own aren't much good, but the Kanren natives can do amazing things with them to solve your puzzle. The only way to get answers off of the island is to send your puzzle over with a raft [ ]. Unfortunately, there are some small holes in the raft and we need to put a bucket on it before we send the raft over. I'll call this bucket q, but you can call it whatever you like. Armed with our raft and bucket we can send it over with a puzzle and the Kanren natives will solve the program and send the bucket back. This bucket will contain the things (bindings) that the natives needed to solve the program. The easiest, yet still confusing program, just declares something that is always true:

(run* [q]
  (== 1 1))
= > (_.0)

It may seem strange at first, but the natives of Kanren have sent back a bucket with a single value _.0 in it. In the language of Kanren, this means anything and nothing. It means to solve the puzzle I sent them (which just declares that 1 is 1) q can take on any value. The number 7, a list of strings, nil etc. This makes sense as the Kanren natives can solve the puzzle with anything since the q does not actually play a role in solving the puzzle. We could make it a little more difficult and send over another raft asking for a 1

(run* [q]
  (== q 1))
= > (1)

Here we sent over a puzzle that could only be solved with q having the value of 1. There is no other way for the Kanren-ites to solve the puzzle (and still have it work), so they sent back a value of 1 for q. The reason I relate q to a bucket in a raft, is that's our only way of getting information off of the island. All other information stays on the island:

(run* [q]
  (fresh [x y]
    (== x "foo")
    (== y "bar")
    (== q 1)))
= > (1)

The fresh call here is just a way to get a new variable (er, new buckets). This new variable is just like q, but it always stays on the island. Note our answer is the same as before because nothing has changed with the bucket vs. the program before. It's possible that q has more than one value (examples will be below) and it's also possible for q to be a collection:

(run* [q]
  (fresh [x y]
    (== x "foo")
    (== y "bar")
    (== q [1 x y])))
= > ([1 "foo" "bar"])

The above gets the information from x and y by stuffing it in q. So q is where we want the answer to be stored since that is what comes back on the raft. If we don't put the answer in the q bucket, we won't get anything useful. It's also possible that q has no answer. We could play a trick on the Kanren-ites and send them an unsolvable puzzle:

(run* [q]
  (== q 1)
  (== q 2))
= > ()

They saw through my ruse. It's not possible to solve a puzzle where q is both 1 and 2, so they send back an empty bucket.

Less Metaphor more Examples

Above, I covered the mechanics of running a core.logic program, er, puzzle. A logic program follows some different rules, so they need to be ran inside the run* block. The above examples only used run* which asks for all possible solutions to the program. Another key point from the above program is that I never assigned a value to q. I declared that q should equal 1. The Kanren-ites needed to take it from there. Instead I could have declared it equal to anything, a string, list, character etc. To cover some of the core.logic building blocks, I'll side-step integers and lists for the time being. To demonstrate some of these basic features, I'll be using some of the relation and fact features of core.logic. First I'll start with a father and mother relation:

(defrel father Father Child)
(defrel mother Mother Child)

(facts father [['Vito 'Michael]
               ['Vito 'Sonny]
               ['Vito 'Fredo]
               ['Michael 'Anthony]
               ['Michael 'Mary]
               ['Sonny 'Vicent]
               ['Sonny 'Francesca]
               ['Sonny 'Kathryn]
               ['Sonny 'Frank]
               ['Sonny 'Santino]])

(facts mother [['Carmela 'Michael]
               ['Carmela 'Sonny]
               ['Carmela 'Fredo]
               ['Kay 'Mary]
               ['Kay 'Anthony]
               ['Sandra 'Francesca]
               ['Sandra 'Kathryn]
               ['Sandra 'Frank]
               ['Sandra 'Santino]])

From this basic setup we can ask some basic questions:

(run* [q]
  (father 'Vito q))
= > (Sonny Michael Fredo)

(run* [q]
  (father q 'Michael))
= > (Vito)

These questions use the father relation. Given Vito is the father, what values of q would satisfy the father relation? We could do the same thing for the reverse. Given a child, who's the father? Note in the first example that there are three valid solutions to the relation, and any of them are correct. If I only asked for one solution:


(run 1 [q]
  (father 'Vito q))
= > (Sonny)

I would just get Sonny.

Conde

To ask better questions, we'll need conde. First lets define as a function, what the definition of parent is:

  (defn parent [p child]
    (conde
      ((father p child))
      ((mother p child))))

(run* [q]
  (parent q 'Michael))
= > (Vito Carmela)

The function parent above uses something new. It's one of the core building blocks of core.logic, called conde. I think of conde as a very fancy or. If you think of an or statement in a functional or imperative languages, it finds the first true value, then stops. On the Island of Kanren, there can be many correct answers to a puzzle, and again the Kanren-ites are trying to solve the puzzle every way possible. So in the above, there are two possibilities when using parent according to the defined relation. It tries the father relation first. If that passes, it will count as one answer. It will then come back through, act as if the first condition failed and will run the second statement. In this case, the first and second line are mutually exclusive, so it's pretty similar to a straight or.

(run* [q]
  (fresh [x y z]
    (conde
      ((== x "1st"))
      ((== y "2nd"))
      ((== z "3rd")))
    (== q [x y z])))
= > (["1st" _.0 _.1] [_.0 "2nd" _.1] [_.0 _.1 "3rd"])

The above illustrates this trying of the various clauses. Each of the above three clauses will always pass if executed. The three answers returned are all separate, so the first answer only has a value for x. The y and z are still fresh (and could be any value). The same is true for the second and third answer.

More Relations

These functions can also stack on each other:

(defn grandparent [gparent child]
  (fresh [p]
    (parent gparent p)
    (parent p child)))

(run* [q]
  (grandparent q 'Anthony))
= > (Vito Carmela)

(run* [q]
  (grandparent 'Vito q))
= > (Francesca Frank Vicent Mary Kathryn Santino Anthony)

The definition of grandparent is that the grandparent gparent is the parent of some new person p and that person p is the parent of child. The Kanren-ites can also solve more complex puzzles where the grandparent and child is unknown but the parent is known:

(run* [q]
  (fresh [gparent p child]
    (== p 'Michael)
    (parent p child)
    (grandparent gparent child)
    (== q {:grandparent gparent :parent p :child child})))

= > ({:grandparent Vito, :parent Michael, :child Mary}
     {:grandparent Vito, :parent Michael, :child Anthony}
     {:grandparent Carmela, :parent Michael, :child Mary}
     {:grandparent Carmela, :parent Michael, :child Anthony})

There are a few things interesting about the above example. First no child or grandparent was specified. In this puzzle, we specified the parent (in this case Michael) but nothing else. The Kanren-ites had to use this constrant in trying to find the grandparent(s) and children. The second interesting piece of the above is the usage of the map as the value of q. Since there are several answers and several pieces of information for each answer, I stuffed the results in a map. For the puzzle to be solved, the appropriate values needed to be plugged into the map.

What Next?

There's a lot more to cover in core.logic. Specifically, I was pretty light on the magic and "special-ness" of relation functions, variables etc. I'll cover that in my next post along with doing useful things with lists and matching.

Since the library is fairly new, there's not a ton of resources on it yet. For more information on core.logic specifically, Ambrose's Logic Starter has good information and there's some in the core.logic README. Byrd's dissertation and The Reasoned Schemer are very good. They are written in Scheme, but they translate to Clojure easily.

13Oct/114

Appendo the Great

I have been spending a some time recently trying to get a better understanding of logic programming. I was very much inspired by David Nolen's core.logic library. I am reading several different papers and books trying to understand the paradigm. All of them talked about an append function, but I was having difficulty understanding why it was so important.

From the Reasoned Schemer, conso, the relational form of cons, gets the title conso the magnificent. And conso certainly is magnificent, but I think appendo should get some credit too. In core.logic and miniKanren, append is called appendo. Like concat in Clojure it appends two lists together to create a third list. It can be used as a building block for many other functions. The definition of the function (from core.logic) is pretty small:

(defne appendo [x y z]
  ([() _ y])
  ([[?a . ?d] _ [?a . ?r]] (appendo ?d y ?r)))

defne above is syntax sugar around matche. It creates a function (named appendo) that accepts three arguments. It matches x, y and z with the three arguments in the list at the beginning of each clause. In this case the first clause is matched cons style. ?a is declared as the first item and ?d is declared as the rest. The second item is ignored, the third is split in the same way. Note that the ?a is shared here on the first and third parameter of the match. This makes the first item in x, the first item in z. Assuming the x and z can be matched as above, appendo is recursively called on the rest of each list (including y, untouched). Something similar to the shared ?a is going on when the recursion bottoms out (when x is empty). In this clause y is declared as z. This makes y the tail of a cons like expression (specifically the ?r) from the previous recursive call. That's a lot packed into such a small amount of code! Below I will build the case for appendo's new title.

Building Blocks

To run the below code, you need to use the right namespaces. Something like this:

(ns logic-examples.seq
  (:refer-clojure :exclude [inc reify ==])
  (:use [clojure.core.logic minikanren prelude nonrel match disequality]))

I thought I understood the appendo function, but I didn't understand why each source spent a lot of time on it. It seemed straight-forward enough. Then I had that ah-ha moment while reading up on Prolog's prefix and suffix functions. Here's something roughly equivalent in core.logic that I wrote before my ah-ha:

(defne prefixo [x y]
  ([[] _])
  ([[?a . ?xs] [?a . ?ys]] (prefixo ?xs ?ys)))

This declares that x is a prefix of y. So it could be used to delcare an x as [1 2] and y as [1 2 3 4 5]. Like above, it uses defne, matching in a similar way. I'm using the shared ?a? to ensure the first item of x is the first item of y and continue recursively. I then began to think of appendo as less like concat in Clojure and more like what list could I concat with this other list to yield some result list (user specified or not). That realization led me to change my prefixo function to be the following:

  (defn prefixo2 [x y]
    (fresh [a]
           (appendo x a y)))

This function appends x with a new (or fresh) variable to yield the value y. So substituting values into the appendo call (using the above example): (appendo [1 2 3] a [1 2 3 4 5]). When unified, a is obviously [4 5] because appending [1 2 3] with [4 5] would satisfy the result list [1 2 3 4 5]. With some great core.logic magic, a will match anything that would appear after items in x in order to satisfy y. Part of the mind-bending nature of logic programming is the mixture of declaring, binding, predicates and the wiggle room that result variable gives you. As an example, there's no need to provide a defined y in the above call:

(run* [q]
  (prefixo2 [1 2 3] q))
= > ((1 2 3 . _.0))

This code runs prefixo but leaves y fresh. This lets core.logic decide what the value should be. So it is asking what will satisfy prefixo2 when x is [1 2 3]. The answer from core.logic is (1 2 3 . _.0). ._0 is the tail of the list and could represent anything, meaning that y could be a list with 1, 2, 3 followed by nothing or 4 or ["Klaatu" "Barada" "Nikto"]. These arbitrary values are numbered to distinguish between different values in the case that there could be multiple arbitrary values that could be (but don't have to be) different. More examples of _.0 are below. The above code will also handle other constraints on the list:

(defn lasto
  "Declares x as the last item in y"
  [x y]
  (fresh [a]
         (appendo a [x] y)))

(run 3 [q]
  (lasto "Hail to the king baby" q)
  (prefixo2 [1 2 3] q))
= > ((1 2 3 "Hail to the king baby")
        (1 2 3 _.0 "Hail to the king baby")
        (1 2 3 _.0 _.1 "Hail to the king baby"))

This basically says give me three lists where 1, 2, 3 is the prefix and the last element is "Hail to the king baby". A proper [ Army of Darkness ending. The easiest one is the list containing only those four items, which is the first returned. The next solution says any one element can exist between 3 and "Hail to the king baby". The next solution says any two elements can be between the theird and the last elements etc. Passing 3 to run tells core.logic how many solutions you want it to come up with. Using run* will ask for all solutions (not a good idea with this example code). This code has two fresh variables _.0 and _.1. These could be any value. They could be different values or they could both be the number 42. They are labeled distinctly because they could have distinct values.

More appendo fun

Seeing what I could do with prefix, I then followed that with suffix:

(defn suffixo
  "Declares the elements of x as the end of list y"
  [x y]
  (fresh [a]
         (appendo a x y)))

Nothing new there. membero is a function in core.logic that declares that x (an element) occurs in a list y. A version can also be built out of appendo (though probably not as efficiently):

(defn my-membero
  "A (probably inefficient) function stating x is an element in y"
  [x y]
  (fresh [a b]
         (appendo a (llist x b) y)))         

(run* [q]
     (my-membero q [1 2 3 4 5]))
= > (1 2 3 4 5)

(run 5 [q]
  (my-membero 3 q))
= > ((3 . _.0) (_.0 3 . _.1) (_.0 _.1 3 . _.2) (_.0 _.1 _.2 3 . _.3) (_.0 _.1 _.2 _.3 3 . _.4))

The first example of it's use gives a list of [1 2 3 4 5] and asks core.logic to come up with all of the members (which is just each element). The second asks the opposite question. Declaring that 3 is in the list, what are the lists that would satisfy it. The first result has 3 as the first element (followed by anything), the second list has 3 as the second element (followed by anything) and so on.

Using appendo to find sublists was interesting:

(defn sublisto
  "The list x (in sequence) appears in y"
  [x y]
  (fresh [a b c]
         (appendo a b y)
         (appendo x c b)))

= > (set (run* [q]
        (sublisto q [1 2 3 4 5])))
= > #{() (1) (2) (1 2 3) (2 3) (3) (3 4) (4 5) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (4) (3 4 5) (1 2 3 4) (5)}

Chaining it together with my-membero can find all sublists containing a particular number:

(set (run* [q]
       (sublisto q [1 2 3 4 5])
       (my-membero 2 q)))
= > #{(2) (1 2 3) (2 3) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (1 2 3 4)}

Or with the help of adjacent, we can find all sublists where two numbers occur next to each other:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
    (appendo a (llist x y b) z)))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

This will only find sublists that have x followed by y, and it must be in that order. To make adjacent a little more flexible, the below version will find x and y next to each other, regardless of order:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
         (conde
          ((appendo a (llist x y b) z))
          ((appendo a (llist y x b) z)))))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 1 5 q)))

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(= (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 5 1 q)))
   (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 1 5 q))))

= > true

That's a lot of power from a little function. I think appendo deserves the title. The code for the functions above can also be found here.

12Apr/100

Clojure Protocols Part 3

Recently there have been some changes to the Clojure Protocols code out on Github. Not huge changes, but enough that the examples I wrote from Part 1 and Part 2 will no longer work. I thought I'd finish out my protocol blog entries by showing how I used it and include the new syntax. I also have a better understanding on how reify can be used (thanks Meikel) and will include some of that. First the goal of protocol usage. I have been working on some comparisons and evaluations of triplestores. Triplestores can be used to store RDF data which is a series of subject/predicate (or property)/object triples. There are many triplestores out there and of the triplestores that are out there, many have several interfaces. For example, Oracle has a JDBC interface that uses stored procedures and a Jena API that incorporates pieces of the Jena framework. This was some pretty low hanging fruit from an abstraction perspective. Whether inserting a new triple in Oracle JDBC, Jena (with Oracle) or one of the other triplestore impelementations, on the surface, it is the same. Take this subject, predicate and object and store it. The same could be said for querying it with SPARQL or deleting entries. I ended up with a protocol named TriplestoreOperations like below:

(ns revelytix.triplestore-operations)

(defprotocol TriplestoreOperations
  "Interface for the various operations allowed by a triple store"
  (create-graph [impl graph-name] "Creates a new graph of name graph-name")
  (delete-graph [impl graph-name] "Deletes graph graph-name if graph exists")
  (insert-quad [impl graph-name subject predicate object]
    "Creates a new triple, data is assumed to be a full URI")
  ;;...)

This syntax is the same. The first argument is used to pass in the implementation of TriplestoreOperations. The graph-name or model in Oracle terms, is what is going to hold the triples. The protocol exists in one namespace (called triplestore-operations above) and the implementations of the interfaces are in separate namespaces. The first is an Oracle JDBC implementation of TriplestoreOperations. It's parameterized by the database connection details and the name of the table to store the data in.

(ns oracle.oracle-jdbc
  (:use clojure.contrib.sql
	triplestore-operations))

(deftype OracleJdbcOperations [db table-name]  TriplestoreOperations
  (delete-graph [impl graph-name]
	(let [drop-model-string (create-sql-string DROP-MODEL-SQL graph-name)
	      drop-table-string (create-sql-string DROP-TABLE-SQL table-name)]
	  (with-connection db
	      (with-open [drop-model-statement (.prepareCall (connection) drop-model-string)]
		(do
		  (drop-entailment-if-exists db graph-name "RDFS")
		  (.execute drop-model-statement)
		  (do-commands drop-table-string))))))
  (create-graph [impl graph-name]
      (let [createModelString (create-sql-string CREATE-MODEL-SQL graph-name table-name)
	    createTableString (create-sql-string CREATE-TABLE-SQL table-name)]
	(do (with-connection db
	      (with-open [createModelStatement
                                    (.prepareCall (connection) createModelString)]
		(do-commands createTableString)
		(.execute createModelStatement))))))
  (insert-quad [impl graph-name subject predicate object]
	       (create-family-triple table-name db graph-name subject predicate object))
  ;;...)

  (defn create-oracle-jdbc-triplestore-instance [table-name]
           (OracleJdbcOperations *oracle-jdbc-props* table-name)) ;;Awkward see below

One difference between the above code and the code in Part 1 or Part 2 is that the implementation parameter in the previous version of deftype disappeared. So the create-graph function above would have had only had a single parameter. I like the change, I found the original code a little confusing, wondering where the first parameter went etc. The next implementation of the TriplestoreOperations protocol was a Jena implementation of the protocol. The below code makes use of the reify function and feels a little more idomatic Clojure and less like the implementation of a protocol is something special and different from just functions. I like the refiy syntax over deftype and I've been moving my code over to use it. I'm going to cut a decent portion of the implementation below because it mostly calls Java APIs and is a bit noisy:

(ns jena-operations
  (:use triplestore-operations)
  ;;...)

(defn create-jena-operations-instance [jena-support-impl]
  (reify TriplestoreOperations
	  (create-graph [impl modelString] nil)
	  (delete-graph [impl modelString]
			(with-triplestore-connection ;...)
	  (insert-quad [impl modelString subject predicate object]
		       (with-triplestore-connection ;;...)
          ;;...))

The reify function call above also creates a new instance of the protocol TriplestoreOperations with the functions defined in line. There's also not a need to create an instance of the type like is being done in the previous example. The end result, deftype or reify from a functionality perspective is the same, there's just a different way to get there. Reading through some of the docs, it looks like reify is more dynamic and deftype results in generated code. One difference between Jena and the Oracle JDBC interface is that graphs don't need to be created explicitly using Jena, so that method does nothing. The above code is slightly different as well in that the implementation parameter no longer disappears. Another interesting part is that the JenaOperations instance is parametrized by another protocol called JenaSupport. What I have found is that many vendors support the Jena APIs, but they implement it slightly different. It's definitely not as pluggable as something like JDBC. This JenaOperations implementation is generic for the Jena APIs and is used by several triplestores with Jena implementations. The JenaSupport protocol abstracts things like getting a Jena connection, creating the correct implementation of Model etc which is different from implementation to implementation.

Development Gotchas

I have found a few issues when developing Clojure code that uses protocols. I'm using Leiningen and Lein Swank for development of the code. First I found that if I had AOT compilation enabled, and had run lein install, the protocol definition results in compiled code in the classes directory of the project. Where this caused a problem was when I tried to change a protocol definition. I'd make a change in Emacs, load the file with the updated protocol code and behaviour of the code would be such that I made no change to the protocol at all. What was happening was the old version of the code, the one that had the interface code generated, was still on the class path in the classes directory. Removing that code (through lein clean or something similar) allowed my changes to take affect. This problem stumped me for a couple of hours. I can avoid this entirely by just not using the AOT compilation (I don't really need it) but others might not.

Another gotcha I found was in the loading of files that use implementations of protocols. In the example above, let's say I have a test file (I'll call it test-A) that executes functions from TriplestoreOperations on the JenaOperations implementation that in turn uses the Oracle implementation of JenaSupport. Just loading test-A.clj file does not cause the loading of the Jena implementation of the TriplestoreOperations, or the Oracle version of JenaSupport. Rather it just complains that there is not an implementation of TriplestoreOperations for 'nil'. Loading those files individually fixes the problem, it just doesn't do that automatically for me.

Filed under: Clojure, Languages No Comments
29Mar/102

Clojure Protocols Part 2

Stale code warning

There have been small changes to the protocols code in Clojure. The below post is still useful, but a few details of the example code is different. See part 3 for the updated syntax.

Clojure Protocols Part 2

This is the second in the series of blog entries on Clojure protocols. The first can be found here. This entry continues by using protocols to implement Java interfaces and reify interfaces/protocols inline in a function invocation. First I'll use reify to define an implementation of the TextOutput interface in-line of the function call. I'll change the italics syntax to the MediaWiki italics format:

(println (output-string (reify TextOutput
		      (output-string [x] (str "''" x "''"))) "stuff"))
''stuff''

The acceptable things to reify are Interfaces (in Java) protocols or Object. I've not yet find a use for reify in code that I have written. One of the things that can be passed to reify are regular Java interfaces. This can also be passed to deftype to define Clojure implementations of Java interfaces. An implementation of Comparator looks like below:

(deftype ThreeCompare [] java.util.Comparator
	       (compare [o1 o2]
			(cond (= o1 3) -1
			      (= o2 3) 1
			      :else (.compareTo o1 o2)))
	       (equals [other] (isa? other ThreeCompare)))

The deftype above implements the protocol java.util.Comparator that when sorting a list of numbers will always put any values of 3 first in the list followed by the rest in ascending order. This can be used like any Java implementation of Comparator:

(def java-list (java.util.ArrayList. (list 1 2 3 4 5 6 7 8)))
(java.util.Collections/sort java-list (ThreeCompare))
(println java-list)

A nice side benefit of deftype is something that reminded me of records in OCaml:

(deftype Point [x y])
(defn midpoint [point1 point2]
	    (Point (/ (+ (:x point1) (:x point2)) 2)
		   (/ (+ (:y point1) (:y point2)) 2)))
(println (midpoint (Point -1 2) (Point 3 -6)))
#:Point{:x 1, :y -2}

The above code defines a new type Point, a midpoint function that takes two points and return a new Point that represents the midpoint of the two points.

Default Implementations

One feature I was looking for when I first incorporated protocols into some existing Clojure code was the concept of a default implementation of a protocol within a namespace. I think this would be a pretty typical usage of a protocol, you might have several implementations, but generally you're only working with one at a time. In my case, I was testing three implementations of a protocol for an integration test. I wanted to run the same tests on all three implementations of the protocol. This presented a problem because the deftest macro doesn't allow passed in parameters and yet each function that I called needed to be parametrized based on the implementation I was testing. I first attacked this problem with a bound variable and then just had all functions called on the protocol use the bound variable as their implementation. Then when a switch to another implementation was needed, I'd change the implementation assigned to the bound variable. This worked for me because it was just test code, but I think this will come up more in the future.

26Mar/103

Clojure Protocols Part 1

Stale code warning

There have been small changes to the protocols code in Clojure. The below post is still useful, but a few details of the example code is different. See part 3 for the updated syntax.

Clojure Protocols

Protocols are a new feature in Clojure, set to be released in the next version. They provide polymorphism in a very Clojure-ish way. I think it's a great lightweight polymorphism implementation that has a lot of potential. In true Clojure style I think it meets the polymorphism objective and yet doesn't need to totally change the way you already write your code in Clojure. I'm breaking this entry into more than one piece to show some different ways that Clojure protocols can be used. Because it's so new, there are not a lot of docs out there on it, but Rich does some good documentation on the macros themselves. If you want to try these examples, make sure you're running off of the 1.2 version of Clojure (from Clojars or a local build from the Clojure git repo). First I'll start by defining a simple protocol:

(defprotocol TextOutput
	  (output-string [x string]))

In Java terms, I'm defining a TextOutput interface (actually a Java interface is being created, but more on that later), that has a single function named output-string that includes no implementation details. The input to this function is a little tricky though. I specified a parameter x and another one called string. The first parameter will be used to pass the implementation of the interface into the function. You don't need to write code to handle the parameter x and when you write your implementation, you'll act like it doesn't exist. A wiki type text output of an italics string would look like:

(deftype ItalicsOutput [] TextOutput
	       (output-string [string] (str "_" string "_")))

I have begun thinking about this in Java terms as a class ItalicsOutput that implements the TextOutput interface. Here in the output-string function, I only specify one parameter (not two). Next you can use this implementation with the following code:

(output-string (ItalicsOutput) "stuff")
"_stuff_"

I'm telling Clojure I want it to execute the output-string function, on the implementation (ItalicsOutput) (more in this below) with the argument "stuff". I think that below is a little more readable:

(def italics-impl (ItalicsOutput))
(output-string italics-impl "stuff")
"_stuff_"

Which just assigns the instantiated implementation to a variable which can then be used. These implementations can also have parameters, like:

(deftype PrefixedOutput [prefix-string] TextOutput
	       (output-string [string] (str prefix-string " " string)))

I think passing a variable in makes the instantiation step make a little more sense:

(def prefix-with-more (PrefixedOutput "more"))
(output-string prefix-with-more "stuff")
"more stuff"

Both implementations can be used together as well:

(defn print-all []
	(let [italics-impl (ItalicsOutput)
	     prefix-with-more (PrefixedOutput "more")]
	     (println (output-string italics-impl "stuff"))
	     (println (output-string prefix-with-more "stuff"))))

With output that would look like:

(print-all)
_stuff_
more stuff