Object Commando languages, development and design


Clojure Protocols Part 1

Stale code warning

There have been small changes to the protocols code in Clojure. The below post is still useful, but a few details of the example code is different. See part 3 for the updated syntax.

Clojure Protocols

Protocols are a new feature in Clojure, set to be released in the next version. They provide polymorphism in a very Clojure-ish way. I think it's a great lightweight polymorphism implementation that has a lot of potential. In true Clojure style I think it meets the polymorphism objective and yet doesn't need to totally change the way you already write your code in Clojure. I'm breaking this entry into more than one piece to show some different ways that Clojure protocols can be used. Because it's so new, there are not a lot of docs out there on it, but Rich does some good documentation on the macros themselves. If you want to try these examples, make sure you're running off of the 1.2 version of Clojure (from Clojars or a local build from the Clojure git repo). First I'll start by defining a simple protocol:

(defprotocol TextOutput
	  (output-string [x string]))

In Java terms, I'm defining a TextOutput interface (actually a Java interface is being created, but more on that later), that has a single function named output-string that includes no implementation details. The input to this function is a little tricky though. I specified a parameter x and another one called string. The first parameter will be used to pass the implementation of the interface into the function. You don't need to write code to handle the parameter x and when you write your implementation, you'll act like it doesn't exist. A wiki type text output of an italics string would look like:

(deftype ItalicsOutput [] TextOutput
	       (output-string [string] (str "_" string "_")))

I have begun thinking about this in Java terms as a class ItalicsOutput that implements the TextOutput interface. Here in the output-string function, I only specify one parameter (not two). Next you can use this implementation with the following code:

(output-string (ItalicsOutput) "stuff")

I'm telling Clojure I want it to execute the output-string function, on the implementation (ItalicsOutput) (more in this below) with the argument "stuff". I think that below is a little more readable:

(def italics-impl (ItalicsOutput))
(output-string italics-impl "stuff")

Which just assigns the instantiated implementation to a variable which can then be used. These implementations can also have parameters, like:

(deftype PrefixedOutput [prefix-string] TextOutput
	       (output-string [string] (str prefix-string " " string)))

I think passing a variable in makes the instantiation step make a little more sense:

(def prefix-with-more (PrefixedOutput "more"))
(output-string prefix-with-more "stuff")
"more stuff"

Both implementations can be used together as well:

(defn print-all []
	(let [italics-impl (ItalicsOutput)
	     prefix-with-more (PrefixedOutput "more")]
	     (println (output-string italics-impl "stuff"))
	     (println (output-string prefix-with-more "stuff"))))

With output that would look like:

more stuff

OCaml in the Real World

There's a great video of a talk given by Yaron Minsky on his blog that discusses why Jane Street chose to go with OCaml. It's definitely worth the time to watch. What's interesting about the decision was the casualness of it and the fact that it was based on merit and success. Often times at organizations the standard language is decreed. Yaron describes more of a decision based on track record. OCaml was used by Yaron for a research project there, it went well, so more people were brought in to work on it. It turned out it was easier to find good developers if you advertised for OCaml developers. I thought this was interesting, because it goes against what is a fairly commonly held belief in our industry.

Lower Cost for Reuse

One interesting point Yaron makes on the technical merits of OCaml was on reuse. He describes a very typical copy and paste problem in larger code bases. He talks about how this was much worse in object oriented languages than it was in OCaml. When pressed for more details he didn't have any hard proof but discussed higher order functions contributes to this. I agree with him on the less code duplication and cleaner code of OCaml. It's true that duplicate code is a sign that some refactoring needs to take place, but why doesn't it? We all know that duplicated code is bad. I think the reason this is worse in object oriented languages is because the barrier to entry is high on reuse. To have a small piece of reusable code, it must be put into a Class, if it's an instance method, you need to create an instance of it etc. If you have a few lines that are duplicated between two classes, what do you do?

  • Create a new class C (3 - 5 lines of code)
  • Create a method in C that wraps those few lines (3 lines of code)
  • Create an instance of that class in Class A (1 line of code)
  • Swap out the duplicate code for a call to the shared code (swap 3 lines for 1)
  • Create an instance of that class in Class B (1 line of code)
  • Swap out the duplicate code for a call to the shared code (swap 3 lines for 1)

Is this worth it? We just traded 3 lines of duplicate code for 10+. In the Java world this is still worth it, but it did take quite a bit of work to refactor those 3 lines. Coincidentally, we have just about the same amount of duplicate code as before (i.e. creating the instance of class C is the same, so is the method call). The benefit here is that the business logic isn't what is duplicated, it's the fluff code. So if the business logic is centralized, we have achieved quite a bit, but what about the duplication of the fluff code? In OCaml the steps would be

  • Create a new function (1 line)
  • Add those few common lines (3 lines)
  • Swap out the duplicate code in A for a call to the shared code (3 lines for 1)
  • Swap out the duplicate code in B for a call to the shared code (3 lines for 1)

The fluff in defining this shared code is minimal, because defining a function is minimal.