Object Commando languages, development and design


First Clojure Conj Highlights

The last couple of weeks has been busy! First Strange Loop then the first Clojure Conj. The conference was the first Clojure conference of it's kind and was packed with technical content. It was single track so I didn't have to worry about missing anything. Below I'll summarize some of the highlights of the conference from my perspective.

(not= DSL macros) Christophe Grand

A very valuable talk. Christophe's central theme was that many of the gains in syntax that can be obtained from macros can also be achieved using functions and other Clojure fundamentals. This was the beginning of a theme for the conference. He gave specific examples of this with his framework enlive. His stories were around users wanting to extend some functionality provided for by the library. Since much of this was implemented with macros, the users of his library ended up being frustrated and were not able to do what they wanted. He then made some substantial changes to the codebase to implement much of the functionality that was in the macros as functions. Backing these by functions with a smaller macro layer on top enabled his users to better take advantage of his framework. This boiled down to the realization that the DSL can exist in functions through clear naming and composibility as well as in the macros. He gave some excellent examples in the construction of a function based DSL for regular expressions (code here)that ended up having functionality not possible with the normal regular expression DSL (or with macros). I've got it on my TODO list to took over Christophe's enlive and regular expression code more in-depth.

Clojure Protocols - Sean Devlin

I'm pretty familiar with protocols and have used them at Revelytix quite a bit, but I still got a lot of value out of Sean's talk. His talk was mainly around creating a protocol of java.util.Date and other date like types in Java. His example created a protocol with a to-ms function that basically converted some sort of date representation to a long representation of that date. From that small abstraction he built many functions that were consumers of that abstraction. I think this was interesting because it was a small abstraction that was really only used internally in that source file. The functions that were more useful to consumers of the abstraction were not extensions of the protocol, but rather used the protocol internally. I thought this was particularly elegant because they need to know nothing about protocols or even that code used them.

Finger Trees - Chris Houser

Chouser gave a great talk on a data structure I had not heard of before, called Finger Trees. This is a new (not quite done yet) feature that will be included in Clojure that can have benefits over the other Clojure persistent data structures for specific use cases. His slides can be found out on github for more info. Seemed to me like the biggest wins of the data structure are the amortized constant time splitting/appending and counting. It achieves this through storing metadata at each tree root node, which makes summarization very easy. I read over some of the code as he was speaking (the code can be found here) and am looking forward to going over it more in-depth.

Keynote - Rich Hickey

It was good to hear Rich Hickey speak in person. This was the first time I've heard him. He went over some of the things to come in the near term for Clojure. I would say the main focus of his talk was on improving the performance of Clojure. One of those performance improvements was declaring when variables are allowed to be rebound. He discussed the need to do some checks to see if something has been rebound and most of the time there is little to no chance that something has been rebound. A good examples of this are the vast majority of the time functions are not rebound or redefined in production code (typically only at the REPL when doing development), yet this overhead always happens. He went over a new bit of metadata that would declare a variable as able to be rebound. I think this is a good change even without performance in mind. Right now we already have a convention that we put asterisks (or ear muffs) around variables that we intend for rebinding. This just codifies that convention and we get a performance boost as a bonus. He also went in depth on some Java primitive performance changes he was making. This also follows a similar theme in that it is a reduction in the complexity of the various primitive types and a performance boost as a bonus. The mismatch being Java primitives, auto-boxing and Clojure can be pretty confusing. His changes simplify many of those problems and through some new function subclasses, auto-boxing can be avoided. There was a lot of useful nuggets from Rich's talk.

One Ring to Bind Them - Mark McGranaghan

This was a talk on ring specifically but I would say it had more to do with design of good Clojure APIs. He covered how abstractions built on top of Clojure sequences with well-named functions could create a very appealing API. This talk was the crescendo of the theme that using Clojure fundamentals can lead to great code. As a testament to the ease of use of the library, there are quite a few other libraries that have been built on top of it.

From Concurrency to Parallelism - David Liebke

David gave a talk on some future Clojure functionality of providing more parallelism options. This was a great talk to have attended after hearing Guy Steele at Strange Loop. Much of what Guy discussed is in this experimental branch of Clojure. Slides of his talk can be found here. He started with the comparing map and pmap and then went over how pmap is different from the new parallel reduce stuff. He gave some example stats on the performance characteristics of these things. He had a pretty big caveat on how useful the stats were, but I think they did a good job of illustrating his point. Implementing this was definitely complex and David went into that a bit, but using the functions seemed very idiomatic and not much different from using plain reduce. It will be exciting to see this progress.

Step Away from the Computer - Rich Hickey

It was the keynote of the conference from Rich. It was a non-technical talk that focused on thinking things through without distractions before writing code or "solving" a problem. He went over some "how your brain works" that seemed similar to Pragmatic Thinking and Learning. He stressed the importance of thorough research, notes and evaluation when solving a problem. I think the areas that he discussed are something that developers can always improve upon.


There were many other good talks that I did not mention. It was good to get an update on Clojure support in Eclipse, hear some of the motivations behind lazy test and zippers. The lightening talks were also good, especially the ones covering Aleph and Infer. There was also a lightening talk by Alex Miller on zippers over records that we have been doing here at Revelytix. There was definitely a lot of excitement around Clojure at the conference. It was a very friendly atmosphere and lots of discussion in-between and after sessions. I talked with several people interested in semantic web technologies, triplestores and the work we're doing at Revelytix. It has encouraged me to blog more about it!


Strange Loop Highlights

I thought I'd punch up my thoughts on Strange Loop last week here. I think the conference went very well. Sitting next to Alex at work keeps a steady stream of Strange Loop excitement going for months before the conference and I think it delivered. The Pageant was a great venue, with plenty of space. The Moonrise rooms could get a little cramped (standing room only, people sitting on the floor etc) but the bigger rooms were definitely adequate. Having lunch and dinner open is a nice bonus because the Loop has a ton of great restaurants. I didn't watch much of the Strange Passion Talks, but enjoyed having a few beers and talking with other developers. Below are the highlights of the conference from my perspective.

Hilary Mason - Machine Learning: A Love Story

I really enjoyed Hilary’s talk and her presentation style. She wouldn’t necessarily have bulleted lists of what she was going to cover, rather just a background image and she would talk through it (slides here). I thought that this was a good high-level overview of data mining basics. It brought back memories of grad school working through Data Mining by Jiawei Han which she mentioned after a question from the audience (it's the "purple" data mining book she referred to). The talk had several very practical applications of data mining and some good pointers for developers wanted to learn more.

Java Provisioning in the Cloud - Adrian Cole

Adrian gave a rundown of the goals of the JClouds project, the niche that it's trying to fill and some of companies using it (along with info on how they are using it). He also covered some of the philosophy of what goes into JClouds and what doesn't. It really helped me appreciate the fine line that the JClouds folks walk in working with the various cloud providers. There was an example or two in Java with several in Clojure. It was nice to meet Adrian face to face after quite a few discussions online!

Expression problem - Chouser

Great talk. He walked through a somewhat typical code path in Java that would necessitate a “wrapper” class. He then improved the code slightly by monkey patching in a Java like syntax. He then did the same code in Clojure, walking through the implementation then moving it into multi-methods. He then walked through a refactor of multi-methods to protocols (something we have been doing recently at Revelytix as well). He then gave a rundown of the pros/cons of protocols vs. multi-methods. I found most of this talk review, but the material was covered well. The slides from the talk are here, there should also be video of it soon.

How to Think about Parallel Programming: Not! - Guy Steele

Guy Steele's talk was on how we can make programs parallel. His focus was on certain operations having wiggle room in how the code is executed. Specifically his example was around a reduce type of operation where the calculations are not linear but more divide and conquer, where the divide happens on separate threads and the conquer step merges the results from the threads. His examples were in Fortress but seems easily applicable to other languages. I found that I have a big lack of knowledge in this area and am looking forward to doing more research in this topic.

Enterprise NoSQL: Silver Bullet or Poison Pill? - Billy Newport

It's easy to get carried away with new technology. With all the hype around NoSQL, it's certainly ripe for misuse. I liked Billy's reality check on the technology. Slides from the talk are here. His point wasn't that you shouldn't use a NoSQL database, but rather there are trade-offs. He enumerated examples of clear wins and clear losses. He reminded the audience several times that the laws of physics don't change, even with Map/Reduce. I think I knew a lot of what he was saying already, it just wasn't as firmed up as after his talk. Thanks for keeping our feet on the ground Billy!

Querying Big Data Rapidly and Robustly with Cascalog - Nathan Marz

This was a talk given by Nathan Marz about his Clojure based Hadoop/Cascade library. There was a lot of excitement during/after this talk. Nathan created a nice Clojure DSL that made a very succinct representation of Hadoop queries. It shows what the power of a language with as flexible syntax as Clojure can do. For those familiar with SPARQL, the syntax will look especially familiar since his DSL and SPARQL are both based on Datalog. I definitely have that library on my "To Learn" list.

Outside In TDD - Brian Marick

Brian Marick went through some examples of his Clojure based Midje mocking framework. I had some difficulty following some of the syntax and examples, so I think I'll need to download it and try it out before coming to any conclusions about the framework. I did like how Brian had molded Emacs and Slime into a even quicker REPL environment and included it with Midje. His point on filing down the rough edges of your development process (in this case Emacs extensions) is a good take-away from the talk. The combination of some Emacs extensions and the Midje Clojure code looks like you could focus on the testing task at hand in a pretty fast and slick way.


In conclusion, Strange Loop was a great investment of time with a lot of top notch speakers and great topics. There was a lot of technical content with very little marketing. There are several other talks I wanted to make and am hoping to catch the video of when they are released. I think Strange Loop's content is broad enough to appeal to any passionate developer and would definitely recommend going next year.

Filed under: Uncategorized No Comments

Strange Loop Talk – Triplestore Testing

I gave a talk yesterday at Strange Loop giving a high level of what I've been working on for the last 6 months. The abstract of the talk was:

Determining what RDF repository to use for a project can be a
daunting task. With so many repository choices, benchmarks and usage
scenarios, where do you start? This talk discusses how Revelytix
answered that question. The talk will cover the test framework
written by Revelytix in Clojure, including a language for defining
tests, a harness for executing the tests and using CouchDB to store
the results. Example Clojure code will be included along with a
discussion around the available RDF benchmarks. The talk will also
discuss the test harness using EC2 instances for cheap performance
testing and how we interpreted those results using Incanter.

It was a short talk, so I went through the material pretty fast, but the slides can be found here.

Tagged as: No Comments

Clojure 1.2 New Functions

Version 1.2 of Clojure was released not too long ago with lots of new features. Things like protocols, a metadata reader macro change etc are some of the bigger differences (nice slides on these big changes can be found here). In addition to these bigger changes, there are also a lot of very useful new functions added to Clojure core. Below are some of the new ones that I have found useful along with some info about them. Note that most of these existed in contrib before moving to core.


The group-by function takes a function as a first argument and a collection as the second:

(doc group-by)
([f coll])
  Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll.

A basic example of this would be to group a list of numbers by those that are odd and those that are even.

(group-by odd? (range 1 10))
=> {true [1 3 5 7 9], false [2 4 6 8]}

The above code calls odd? on each item from (range 1 10) and then puts the number in the true or false slot of the map, depending on whether or not it's odd. The important part here is that the function that is passed in could return anything, not just true and false. Similar to the above code, we could create groups with the string "odd" and "even"

(group-by #(if (odd? %) "odd" "even") (range 1 10))
=> {"odd" [1 3 5 7 9], "even" [2 4 6 8]}

Maybe we have a list of address data structures, which is just a list of the street, followed by the city, then state. We could group by city with a call like:

(def addresses (list '("742 Evergreen" "Springfield" "MO")
                             '("31 Spooner Street" "Quahog" "RI")
                             '("742 Evergreen" "Springfield" "VT")
                             '("742 Evergreen" "Springfield" "NT")))

(group-by second addresses)
=> {"Springfield" [("742 Evergreen" "Springfield" "MO")
                          ("742 Evergreen" "Springfield" "VT")
                          ("742 Evergreen" "Springfield" "NT")],
     "Quahog" [("31 Spooner Street" "Quahog" "RI")]}

The usage with the address like above is very similar to how I have used it.


shuffle is a basic function that pseudo-randomly rearranges the elements of a collection using the basic java.util.Collections shuffle method.

(doc shuffle)
  Return a random permutation of coll

The usage of it is pretty intuitive:

(shuffle (range 1 10))
=> [6 8 1 2 4 3 7 9 5]

Pretty basic, but saved me some additional work.


reductions is an interesting function in that it's kind of a mingling between the map function and the reduce function. It's lazy like map, but you pass in an accumulator like in reduce. It outputs a list of the intermediate accumulator values. As an example

(reductions + 0 (range 1 10))
=> (0 1 3 6 10 15 21 28 36 45)

What I think is the most useful aspect of this is that it can maintain a lazy flow. So we can swap the 10 number range above with an infinite sequence

(def natural-numbers (iterate inc 1))
(take 100 (reductions + 0 natural-numbers))
=> (0 1 3 6 10 15 21 28 36 45 55 66 78...)

One gotcha is the first element in the returned list. In the above two results, the initial value (0) was specific in the function call and it is also the first item returned in the result list.


This is a useful function for getting items out of nested maps. Let's say we have a nested map that has a single letter as a key in the first map. The value at those keys are a map keyed by a two character key which has a value of a map with a three character key etc. To get the nested value of the third map, we could use get-in like below;

(def x {:a {:ab {:abc "123"}}, :b {:bc {:bcd "234"}}})
(get-in x [:a :ab :abc])
=> "123"

Thanks to Nate for showing me this function, I've since used it several times. Similar to this function is the assoc-in and the update-in function. They are similar in that they operate on nested maps, but they modify the nested map, rather than retrieve a value. The documentation for get-in is below:

(doc get-in)
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of ke(ys. Returns nil if the key is not present,
or the not-found value if supplied.


spit is a very convenient function for writing the contents of a string to a file. Here's the docs for the function

(doc spit)
([f content & options])
  Opposite of slurp.  Opens f with writer, writes content, then
  closes f. Options passed to clojure.java.io/writer.

Pretty self explainatory. You don't need to worry about opening, flushing or closing a stream. Below is code that takes a string named info and outputs it to a file:

(def info "some info to be written to a file...")
(spit "/path/to/file/info.txt" info)

Circular Lists with Clojure

I was working on some code that repeatedly executed a defined set of queries against a database for a give amount of time. Think throughput testing. The list of queries was very small (like 5) but the number of times each was executed was pretty high (500+). After it executed the 5th query, I wanted it to execute the first one again and continue on. So I thought the easiest way to go about this was to have a circular-type list. I figured in Clojure it probably wouldn't be circular, but rather a lazy list that repeated itself. I looked around the source a bit (the internet connection at work was down, so there was no Google searching) and I wasn't coming up with anything. Not finding anything that fit my needs, I thought about it and made a couple of attempts at a solution. I was pretty unhappy with the solutions I was coming up with. I went back to the Clojure source trying to figure out how best to implement this and I stumbled on cycle. It was exactly what I needed. What struck me about cycle, was how concise and simple it was. First, it's use:

(take 10 (cycle (list 1 2 3 4)))
;=> (1 2 3 4 1 2 3 4 1 2)

It takes the collection passed in and creates a lazy sequence of the items in the list repeated infinitely. I would keep consuming items from this infinite list for some defined amount of time and then stop:

(doseq [query (cycle (query-list...))
               :while (keep-going? @run-state)]
    (execute-query query...))

In the above code, run-state is an atom that is updated once a particular point of time has been reached and then the query run completes. What struck me about the cycle function, was how simple it was:

(defn cycle
  "Returns a lazy (infinite!) sequence of repetitions of the
   items in coll."
  {:added "1.0"}
          (when-let [s (seq coll)]
              (concat s (cycle s)))))

It's a very concise function. Here's a rundown of what it does. The lazy-seq macro takes as it's body an expression that yields a sequence (or nil). It's lazy, so it doesn't actually fully realize the sequence. Inside the lazy-seq call is a when-let which just ensures that what is passed in is a collection that has at least one element in it. If it is not a collection of at least one element, the lazy-seq is just nil. If it does, the sequence that the lazy-seq function is looking for comes from the concat statement. Concat is simple, it takes one or more collections and returns a single collection containing all of the elements in each passed in collection:

(concat (list 1 2 3 4) (list 5 6 7 8))
; =>(1 2 3 4 5 6 7 8)

With that info, the concat in cycle glues together the entire collection that was passed in, followed by a recursive call to itself (passing in the same collection). So with that, if the same 1, 2, 3, 4 list is passed in to cycle, the last line looks like:

(concat (list 1 2 3 4) (cycle (list 1 2 3 4)))

After the first four items are consumed, the recursive invocation of cycle will happen, again producing (list 1 2 3 4) followed by another recursive invocation. Since it's lazy and infinite, this can continue until the query execution time is up, whether it's 5 seconds or 5 hours.