Object Commando languages, development and design

14Sep/100

Clojure 1.2 New Functions

Version 1.2 of Clojure was released not too long ago with lots of new features. Things like protocols, a metadata reader macro change etc are some of the bigger differences (nice slides on these big changes can be found here). In addition to these bigger changes, there are also a lot of very useful new functions added to Clojure core. Below are some of the new ones that I have found useful along with some info about them. Note that most of these existed in contrib before moving to core.

group-by

The group-by function takes a function as a first argument and a collection as the second:

(doc group-by)
-------------------------
clojure.core/group-by
([f coll])
  Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll.

A basic example of this would be to group a list of numbers by those that are odd and those that are even.

(group-by odd? (range 1 10))
=> {true [1 3 5 7 9], false [2 4 6 8]}

The above code calls odd? on each item from (range 1 10) and then puts the number in the true or false slot of the map, depending on whether or not it's odd. The important part here is that the function that is passed in could return anything, not just true and false. Similar to the above code, we could create groups with the string "odd" and "even"

(group-by #(if (odd? %) "odd" "even") (range 1 10))
=> {"odd" [1 3 5 7 9], "even" [2 4 6 8]}

Maybe we have a list of address data structures, which is just a list of the street, followed by the city, then state. We could group by city with a call like:

(def addresses (list '("742 Evergreen" "Springfield" "MO")
                             '("31 Spooner Street" "Quahog" "RI")
                             '("742 Evergreen" "Springfield" "VT")
                             '("742 Evergreen" "Springfield" "NT")))

(group-by second addresses)
=> {"Springfield" [("742 Evergreen" "Springfield" "MO")
                          ("742 Evergreen" "Springfield" "VT")
                          ("742 Evergreen" "Springfield" "NT")],
     "Quahog" [("31 Spooner Street" "Quahog" "RI")]}

The usage with the address like above is very similar to how I have used it.

shuffle

shuffle is a basic function that pseudo-randomly rearranges the elements of a collection using the basic java.util.Collections shuffle method.

(doc shuffle)
-------------------------
clojure.core/shuffle
([coll])
  Return a random permutation of coll

The usage of it is pretty intuitive:

(shuffle (range 1 10))
=> [6 8 1 2 4 3 7 9 5]

Pretty basic, but saved me some additional work.

reductions

reductions is an interesting function in that it's kind of a mingling between the map function and the reduce function. It's lazy like map, but you pass in an accumulator like in reduce. It outputs a list of the intermediate accumulator values. As an example

(reductions + 0 (range 1 10))
=> (0 1 3 6 10 15 21 28 36 45)

What I think is the most useful aspect of this is that it can maintain a lazy flow. So we can swap the 10 number range above with an infinite sequence

(def natural-numbers (iterate inc 1))
(take 100 (reductions + 0 natural-numbers))
=> (0 1 3 6 10 15 21 28 36 45 55 66 78...)

One gotcha is the first element in the returned list. In the above two results, the initial value (0) was specific in the function call and it is also the first item returned in the result list.

get-in

This is a useful function for getting items out of nested maps. Let's say we have a nested map that has a single letter as a key in the first map. The value at those keys are a map keyed by a two character key which has a value of a map with a three character key etc. To get the nested value of the third map, we could use get-in like below;

(def x {:a {:ab {:abc "123"}}, :b {:bc {:bcd "234"}}})
(get-in x [:a :ab :abc])
=> "123"

Thanks to Nate for showing me this function, I've since used it several times. Similar to this function is the assoc-in and the update-in function. They are similar in that they operate on nested maps, but they modify the nested map, rather than retrieve a value. The documentation for get-in is below:

(doc get-in)
-------------------------
clojure.core/get-in
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of ke(ys. Returns nil if the key is not present,
or the not-found value if supplied.

spit

spit is a very convenient function for writing the contents of a string to a file. Here's the docs for the function

(doc spit)
-------------------------
clojure.core/spit
([f content & options])
  Opposite of slurp.  Opens f with writer, writes content, then
  closes f. Options passed to clojure.java.io/writer.

Pretty self explainatory. You don't need to worry about opening, flushing or closing a stream. Below is code that takes a string named info and outputs it to a file:

(def info "some info to be written to a file...")
(spit "/path/to/file/info.txt" info)
12Aug/101

Circular Lists with Clojure

I was working on some code that repeatedly executed a defined set of queries against a database for a give amount of time. Think throughput testing. The list of queries was very small (like 5) but the number of times each was executed was pretty high (500+). After it executed the 5th query, I wanted it to execute the first one again and continue on. So I thought the easiest way to go about this was to have a circular-type list. I figured in Clojure it probably wouldn't be circular, but rather a lazy list that repeated itself. I looked around the source a bit (the internet connection at work was down, so there was no Google searching) and I wasn't coming up with anything. Not finding anything that fit my needs, I thought about it and made a couple of attempts at a solution. I was pretty unhappy with the solutions I was coming up with. I went back to the Clojure source trying to figure out how best to implement this and I stumbled on cycle. It was exactly what I needed. What struck me about cycle, was how concise and simple it was. First, it's use:

(take 10 (cycle (list 1 2 3 4)))
;=> (1 2 3 4 1 2 3 4 1 2)

It takes the collection passed in and creates a lazy sequence of the items in the list repeated infinitely. I would keep consuming items from this infinite list for some defined amount of time and then stop:

(doseq [query (cycle (query-list...))
               :while (keep-going? @run-state)]
    (execute-query query...))

In the above code, run-state is an atom that is updated once a particular point of time has been reached and then the query run completes. What struck me about the cycle function, was how simple it was:

(defn cycle
  "Returns a lazy (infinite!) sequence of repetitions of the
   items in coll."
  {:added "1.0"}
  [coll]
 (lazy-seq
          (when-let [s (seq coll)]
              (concat s (cycle s)))))

It's a very concise function. Here's a rundown of what it does. The lazy-seq macro takes as it's body an expression that yields a sequence (or nil). It's lazy, so it doesn't actually fully realize the sequence. Inside the lazy-seq call is a when-let which just ensures that what is passed in is a collection that has at least one element in it. If it is not a collection of at least one element, the lazy-seq is just nil. If it does, the sequence that the lazy-seq function is looking for comes from the concat statement. Concat is simple, it takes one or more collections and returns a single collection containing all of the elements in each passed in collection:

(concat (list 1 2 3 4) (list 5 6 7 8))
; =>(1 2 3 4 5 6 7 8)

With that info, the concat in cycle glues together the entire collection that was passed in, followed by a recursive call to itself (passing in the same collection). So with that, if the same 1, 2, 3, 4 list is passed in to cycle, the last line looks like:

(concat (list 1 2 3 4) (cycle (list 1 2 3 4)))

After the first four items are consumed, the recursive invocation of cycle will happen, again producing (list 1 2 3 4) followed by another recursive invocation. Since it's lazy and infinite, this can continue until the query execution time is up, whether it's 5 seconds or 5 hours.

13Jun/100

Mocking Clojure Protocols with Atticus

Atticus is a mocking library written by Hugo Duncan. For more information on Atticus, you can see Hugo's blog post from May here. I have added the ability to mock protocols to Atticus and would like some feedback as to the best approach for binding the mocked protocol instance. There's a survey below, but first a little background on the implementation.

Mocking Protocols

What makes the protocol instances a bit tough to mock is that they're not just straight Clojure functions. They dip a bit into Java and a bit into Clojure. With that in mind, using Atticus without modification, wouldn't allow the mocking of those protocols. Below is an example of some code that uses this new functionality:

(defprotocol Squared
  (square [impl x]))

(deftest mock-protocol-test
  (expects
   [(instance Squared
	      (square [impl y] (once (* y y))))]
   (is (= 9 (square instance 3)))))

Originally I had a marker function name mock-protocol-instance which didn't serve much purpose and was a bit awkward. Talking with Hugo, I switched it to the above syntax. The first item in that list is instance and is the symbol that the mocked protocol instance is bound to. The next item is the protocol followed by functions. The (once...) wrapped around the body of the function is existing functionality in Atticus that expands into the code to ensure the function was called exactly once. An example of mocking a regular function in Atticus is below:

(deftest test-cube
  (expects
   [(cube [y] (once (* y y y)))]
   (is (= 8 (cube 2)))))

The difference in this code and the first is that the binding syntax is familiar because it is similar to letfn. Below is an example of letfn:

(letfn [(add5 [x] (+ x 5))
	(subtract5 [x] (- x 5))]
  (= 7 (add5 (subtract5 7))))

In the style above, the first argument is the function name and so anything that refers to add5 in the body of the letfn gets the function bound above in the letfn. This letfn type of binding makes sense for Atticus when mocking functions. They both have similar goals, binding a function temporarily. Where this is a little more tricky is in mocking the protocol. In the first example above, the first element in the list is special. It's the symbol bound to the protocol instance. This is really more appropriate for a let style of binding. Where one element is the symbol and the other is an expression. Unfortunately, switching to a let style binding for the expects macro will make the syntax a little more cumbersome for mocking functions because you would have to add "fn". This would probably look something like this:

(deftest mock-protocol-test-alt
  (expects
    [instance (Squared
	       (square [impl y] (once (* y y))))
     cube (fn [y] (once (* y y y)))]
        (is (= 9 (square instance 3)))
	(is (= 8 (cube y 2)))))

The above is just spit balling, but the key point is that expects would use a let binding style. The first is a let style binding of the protocol instance and the second is binding a function.

Put it to a vote!

The question is, which one is better? Is sticking with the letfn binding and the brevity that it allows worth it even though the protocol mocking is a bit different? The letfn style is the first and second code examples above. Or is it confusing enough to warrant a little extra code around mocking functions (the example immediately above). Is there another approach that would be better? Below is just a quick survey on which one is preferable. Thanks for giving your input!


10Jun/103

Clojure Futures

Futures in Clojure are basically a way that you can execute some bit of code on a background thread. I was using it as a way to allow timeouts for long running code. In this entry I'll give a run down on how to use futures.

Future Basics

It is pretty easy to start using futures in Clojure. Most of the function calls start with (future...). First I'll start by creating a future that calls a long running sleep function:

(def f
  (future
    (println "Starting to sleep...")
    (Thread/sleep 600000)
    (println "Done sleeping.")))

There are two ways via Clojure to create a future, the first is with the macro future (used above), the other is the future-call function. The future macro is just syntax sugar around future-call. Whatever you pass in to future with be wrapped in a no-arg function and passed to future-call. The future-call function just invokes the passed in function on background type of thread. The above code, even though it sleeps for an hour will return immediately. The object returned is a future (intentionally not getting into details about what is returned right now). This returned future allows you to peek inside the execution of the code passed in and get information like is it done, or has it been canceled:

(future-done? f)
=> false

(future-cancelled? f)
=> false

You can then decide to cancel the future:

(future-cancel f)
=>true

(future-cancel f)
=>false

The above will return false when it was not able to cancel the future. As an example, if we try to cancel the future again, it will return false, because it has already been canceled. Another set of useful functions is getting the value returned by the executing future. Let's say we're still computing Fibonacci numbers in the slow, recursive way:

(defn fib [x]
  (cond (zero? x) 0
            (= 1 x) 1
            :else
	    (+ (fib (- x 1)) (fib (- x 2)))))

Computing (fib 40) on my laptop takes about 10 seconds. Below is code to execute the code in a future and then use deref to pull the value out:

(def f (future-call #(fib 40)))
=> #'user/f

(time @f)
=> "Elapsed time: 8075.452326 msecs"
102334155

(time @f)
=> "Elapsed time: 0.082342 msecs"
102334155

The first deref (@) call will block until the future has completed running and then return the value. So timing the deref, it took about 8 seconds until returning the value. But if it has already computed the value, it will just return that computed value. So the second deref returns very quickly.

Timeouts

One thing that is hidden from users of futures in Clojure is what is actually returned when calling (future...). I say that it's hidden because when you use future-cancel, future-done? etc, it doesn't matter what kind of object is returned from the future function call. It's some object that you are able to pass to these other functions and it just works. Where you actually do need to know more about the implementation of this is when you want to have your future timeout. The object returned by future-call is an implementation of java.util.concurrent.Future. With this information, you can use the Java APIs for the Future. Below is some code that creates a future that will timeout before it finishes:

(def f (future-call #(fib 50)))
(.get f 10 (java.util.concurrent.TimeUnit/SECONDS))

The last line does the same thing that we did before when we deref'd the future, but this time there is a timeout on how long we will wait for that deref to happen. If the timeout is reached (in this case 10 seconds), it will throw a java.util.concurrent.TimeoutException. Armed with the knowledge that the future is actually a java.util.concurrent.Future, the deref is just a more Clojure way of calling the get method on the future:

(def f (future-call #(fib 40)))
=> #'user/f

(time (.get f))
=> "Elapsed time: 10713.176672 msecs"
102334155

(time (.get f))
=> "Elapsed time: 0.075569 msecs"
102334155

Hey, where did my console output go?

I use Emacs and Slime for my development environment. One thing that I noticed was that code running in a future does not write to the console like code running from the Slime REPL. This is because code running in the Slime REPL gets the bound variables from the REPL thread, whereas the future code runs in a different thread that does not have those variables bound. bound-fn makes it easy to fix this problem:

(def f (future-call #(println "Hello World")))
=>#'user/f

(def f (future-call (bound-fn [] (println "Hello World"))))
=>Hello World
#'user/f
Filed under: Clojure 3 Comments
17May/104

Easy Java Interop with Clojure

This past week I started writing some code to work with Amazon EC2 Instances. I started using the JClouds library . It's a great library for spinning up public AMIs in a cloud neutral way, however didn't it do everything that I needed. Some of the things I needed were EC2 specific, so that's not so surprising. I fell back to the AWS SDK for Java, which basically just wraps calls to the Amazon web services. Using that library, I wrote some Clojure functions that wrapped the Java calls to do what I needed. Examples of what I needed would be to start up an existing EC2 EBS backed instance, stop an EBS instance and determine what state an EBS VM is in. This led to Clojure code that would build up some request objects and interpret some response POJOs. The API is a little awkward, even using Java. Starting, stopping and describing an instance via the API all require one thing, one or more instance ids. In the Amazon API, they have created a separate class for each (DescribeInstancesRequest, StopInstancesRequest and StartInstancesRequest) and have those classes include a method where you setInstanceIds rather than just calling a method and passing a list of Strings (or something similar). Working with this API helped me learn more about the Clojure Java Interop functions.

dot dot

The first feature that made my life easier was .. . This is a macro for chaining Java calls. It takes code that in Java would look like this:

instance.getState().getName()

And puts a similar feel in Clojure:

(.. instance getState getName)

Without this macro, you would have to:

(.getName (.getState instance))

It takes the return value of the first part (in the inner expression above) and passes it into the second. The code above works great for chained method calls, but doesn't help much with side affects.

doto

I found doto useful when I needed to call setters in constructing POJOs. Calls to determine the status of a running EC2 instance returns an object graph of several nested objects and were particularly awkward to test, since there were a decent amount of objects to construct. Before I realized doto could help me, I had code that looked like below:

(defn single-instance-result-example []
    (let [reservation-list (ArrayList.)
           instance-list (ArrayList.)
           reservation (Reservation.)
           instance (Instance.)
           instance-result (DescribeInstancesResult.)]
       (.setInstanceId instance "testinstance")
       (.add instance-list instance)
       (.setInstances reservation instance-list)
       (.add reservation-list reservation)
       (.setReservations instance-result reservation-list)
       instance-result))

I then transformed this into some code that used some nested dotos

(defn single-instance-result-example []
    (doto (DescribeInstancesResult.)
        (.setReservations
        (doto (ArrayList.)
            (.add
                (doto (Reservation.)
	            (.setInstances
	                (doto (ArrayList.)
	                    (.add
                                (doto (Instance.)
                                    (.setInstanceId "testinstance")))))))))))

I must admit that the dotos here took some time to get comfortable with. Some interesting things to note is that the above does not include all of the intermediate references to objects. In the first example above I always passed the objects in, such as (.setInstanceId instance "testinstance"). This is no longer necessary with doto. The intermediate let-bound variables are also not necessary. I seeing the above code, I felt like I still had some room for improvement. In the above example and in other areas of my code I was seeing a common pattern:

(doto (ArrayList.)
    (.add (doto...))
    (.add (doto...))
     ...)

So I created a quick macro that I called doto-list that would bundle that piece up:

(defmacro doto-list [& forms]
     `(doto (ArrayList.)
         ~@(map (fn [item] `(.add ~item)) forms)))

Which then made the function look like:

(defn single-instance-result-example []
    (doto (DescribeInstancesResult.)
        (.setReservations
        (doto-list
            (doto (Reservation.)
	        (.setInstances
	            (doto-list
	                (doto (Instance.)
	                    (.setInstanceId "test")))))))))

Which I think is a nice improvement when I'm creating a decent amount of ArrayLists.

memfn

The next piece I used integration with the AWS APIs was memfn. The call to describeInstances returns a List of Instance objects which have a few fields I'm interested in. There are quite a few fields on the Instance object (20+) and I was only interested in a few. Furthermore, I also did not want the callers of the functions to have to know they were dealing with a Java objects. One was bean, which transforms an object into a map of it's bean properties. It seemed like it would work for me, but would pull over fields I cared about and a lot of ones that I didn't. It would also require knowledge of the Java object and the generated map structure. I thought a better way to do this would be memfn. The memfn macro takes an method name (and optionally arguments) and returns a function that takes an object as a parameter. The function then invokes the method on that object when called. It basically translates the above call into:

((memfn getPublicIpAddress) some-object)

This seemed closer to what I wanted, but what I really wanted was a function called "public-ip" that you could pass an instance to. So I ended up attaching a name to the memfn function that was returned by creating a little macro:

(defmacro defmemfn
     [name method-name & args]
     `(def ~name (memfn ~method-name ~@args)))

A call to defmemfn looks like:

(defmemfn public-ip getPublicIpAddress)

And asking for a public ip looks like:

(public-ip instance)

Up Next - Testing

In conclusion, I think that working with the AWS APIs was easy thanks to Clojure's great Java integration. Testing this code was also much easier than I thought which I'll post next.