Object Commando languages, development and design

13Oct/114

Appendo the Great

I have been spending a some time recently trying to get a better understanding of logic programming. I was very much inspired by David Nolen's core.logic library. I am reading several different papers and books trying to understand the paradigm. All of them talked about an append function, but I was having difficulty understanding why it was so important.

From the Reasoned Schemer, conso, the relational form of cons, gets the title conso the magnificent. And conso certainly is magnificent, but I think appendo should get some credit too. In core.logic and miniKanren, append is called appendo. Like concat in Clojure it appends two lists together to create a third list. It can be used as a building block for many other functions. The definition of the function (from core.logic) is pretty small:

(defne appendo [x y z]
  ([() _ y])
  ([[?a . ?d] _ [?a . ?r]] (appendo ?d y ?r)))

defne above is syntax sugar around matche. It creates a function (named appendo) that accepts three arguments. It matches x, y and z with the three arguments in the list at the beginning of each clause. In this case the first clause is matched cons style. ?a is declared as the first item and ?d is declared as the rest. The second item is ignored, the third is split in the same way. Note that the ?a is shared here on the first and third parameter of the match. This makes the first item in x, the first item in z. Assuming the x and z can be matched as above, appendo is recursively called on the rest of each list (including y, untouched). Something similar to the shared ?a is going on when the recursion bottoms out (when x is empty). In this clause y is declared as z. This makes y the tail of a cons like expression (specifically the ?r) from the previous recursive call. That's a lot packed into such a small amount of code! Below I will build the case for appendo's new title.

Building Blocks

To run the below code, you need to use the right namespaces. Something like this:

(ns logic-examples.seq
  (:refer-clojure :exclude [inc reify ==])
  (:use [clojure.core.logic minikanren prelude nonrel match disequality]))

I thought I understood the appendo function, but I didn't understand why each source spent a lot of time on it. It seemed straight-forward enough. Then I had that ah-ha moment while reading up on Prolog's prefix and suffix functions. Here's something roughly equivalent in core.logic that I wrote before my ah-ha:

(defne prefixo [x y]
  ([[] _])
  ([[?a . ?xs] [?a . ?ys]] (prefixo ?xs ?ys)))

This declares that x is a prefix of y. So it could be used to delcare an x as [1 2] and y as [1 2 3 4 5]. Like above, it uses defne, matching in a similar way. I'm using the shared ?a? to ensure the first item of x is the first item of y and continue recursively. I then began to think of appendo as less like concat in Clojure and more like what list could I concat with this other list to yield some result list (user specified or not). That realization led me to change my prefixo function to be the following:

  (defn prefixo2 [x y]
    (fresh [a]
           (appendo x a y)))

This function appends x with a new (or fresh) variable to yield the value y. So substituting values into the appendo call (using the above example): (appendo [1 2 3] a [1 2 3 4 5]). When unified, a is obviously [4 5] because appending [1 2 3] with [4 5] would satisfy the result list [1 2 3 4 5]. With some great core.logic magic, a will match anything that would appear after items in x in order to satisfy y. Part of the mind-bending nature of logic programming is the mixture of declaring, binding, predicates and the wiggle room that result variable gives you. As an example, there's no need to provide a defined y in the above call:

(run* [q]
  (prefixo2 [1 2 3] q))
= > ((1 2 3 . _.0))

This code runs prefixo but leaves y fresh. This lets core.logic decide what the value should be. So it is asking what will satisfy prefixo2 when x is [1 2 3]. The answer from core.logic is (1 2 3 . _.0). ._0 is the tail of the list and could represent anything, meaning that y could be a list with 1, 2, 3 followed by nothing or 4 or ["Klaatu" "Barada" "Nikto"]. These arbitrary values are numbered to distinguish between different values in the case that there could be multiple arbitrary values that could be (but don't have to be) different. More examples of _.0 are below. The above code will also handle other constraints on the list:

(defn lasto
  "Declares x as the last item in y"
  [x y]
  (fresh [a]
         (appendo a [x] y)))

(run 3 [q]
  (lasto "Hail to the king baby" q)
  (prefixo2 [1 2 3] q))
= > ((1 2 3 "Hail to the king baby")
        (1 2 3 _.0 "Hail to the king baby")
        (1 2 3 _.0 _.1 "Hail to the king baby"))

This basically says give me three lists where 1, 2, 3 is the prefix and the last element is "Hail to the king baby". A proper [ Army of Darkness ending. The easiest one is the list containing only those four items, which is the first returned. The next solution says any one element can exist between 3 and "Hail to the king baby". The next solution says any two elements can be between the theird and the last elements etc. Passing 3 to run tells core.logic how many solutions you want it to come up with. Using run* will ask for all solutions (not a good idea with this example code). This code has two fresh variables _.0 and _.1. These could be any value. They could be different values or they could both be the number 42. They are labeled distinctly because they could have distinct values.

More appendo fun

Seeing what I could do with prefix, I then followed that with suffix:

(defn suffixo
  "Declares the elements of x as the end of list y"
  [x y]
  (fresh [a]
         (appendo a x y)))

Nothing new there. membero is a function in core.logic that declares that x (an element) occurs in a list y. A version can also be built out of appendo (though probably not as efficiently):

(defn my-membero
  "A (probably inefficient) function stating x is an element in y"
  [x y]
  (fresh [a b]
         (appendo a (llist x b) y)))         

(run* [q]
     (my-membero q [1 2 3 4 5]))
= > (1 2 3 4 5)

(run 5 [q]
  (my-membero 3 q))
= > ((3 . _.0) (_.0 3 . _.1) (_.0 _.1 3 . _.2) (_.0 _.1 _.2 3 . _.3) (_.0 _.1 _.2 _.3 3 . _.4))

The first example of it's use gives a list of [1 2 3 4 5] and asks core.logic to come up with all of the members (which is just each element). The second asks the opposite question. Declaring that 3 is in the list, what are the lists that would satisfy it. The first result has 3 as the first element (followed by anything), the second list has 3 as the second element (followed by anything) and so on.

Using appendo to find sublists was interesting:

(defn sublisto
  "The list x (in sequence) appears in y"
  [x y]
  (fresh [a b c]
         (appendo a b y)
         (appendo x c b)))

= > (set (run* [q]
        (sublisto q [1 2 3 4 5])))
= > #{() (1) (2) (1 2 3) (2 3) (3) (3 4) (4 5) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (4) (3 4 5) (1 2 3 4) (5)}

Chaining it together with my-membero can find all sublists containing a particular number:

(set (run* [q]
       (sublisto q [1 2 3 4 5])
       (my-membero 2 q)))
= > #{(2) (1 2 3) (2 3) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (1 2 3 4)}

Or with the help of adjacent, we can find all sublists where two numbers occur next to each other:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
    (appendo a (llist x y b) z)))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

This will only find sublists that have x followed by y, and it must be in that order. To make adjacent a little more flexible, the below version will find x and y next to each other, regardless of order:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
         (conde
          ((appendo a (llist x y b) z))
          ((appendo a (llist y x b) z)))))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 1 5 q)))

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(= (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 5 1 q)))
   (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 1 5 q))))

= > true

That's a lot of power from a little function. I think appendo deserves the title. The code for the functions above can also be found here.

22Feb/110

CouchDB with Clojure

IBM developerWorks just published an article I wrote on CouchDB and Clojure. It uses clj-http, Clutch and CouchDB. It aims to cover the REST APIs of CouchDB by way of Clojure, using parallel examples in Clutch and straight HTTP calls using clj-http. It also discusses CouchDB views and replication. Big thanks to Tunde for Clutch, Mark for clj-http, the CouchDB folks and Alex for reviewing the article.

25Oct/102

First Clojure Conj Highlights

The last couple of weeks has been busy! First Strange Loop then the first Clojure Conj. The conference was the first Clojure conference of it's kind and was packed with technical content. It was single track so I didn't have to worry about missing anything. Below I'll summarize some of the highlights of the conference from my perspective.

(not= DSL macros) Christophe Grand

A very valuable talk. Christophe's central theme was that many of the gains in syntax that can be obtained from macros can also be achieved using functions and other Clojure fundamentals. This was the beginning of a theme for the conference. He gave specific examples of this with his framework enlive. His stories were around users wanting to extend some functionality provided for by the library. Since much of this was implemented with macros, the users of his library ended up being frustrated and were not able to do what they wanted. He then made some substantial changes to the codebase to implement much of the functionality that was in the macros as functions. Backing these by functions with a smaller macro layer on top enabled his users to better take advantage of his framework. This boiled down to the realization that the DSL can exist in functions through clear naming and composibility as well as in the macros. He gave some excellent examples in the construction of a function based DSL for regular expressions (code here)that ended up having functionality not possible with the normal regular expression DSL (or with macros). I've got it on my TODO list to took over Christophe's enlive and regular expression code more in-depth.

Clojure Protocols - Sean Devlin

I'm pretty familiar with protocols and have used them at Revelytix quite a bit, but I still got a lot of value out of Sean's talk. His talk was mainly around creating a protocol of java.util.Date and other date like types in Java. His example created a protocol with a to-ms function that basically converted some sort of date representation to a long representation of that date. From that small abstraction he built many functions that were consumers of that abstraction. I think this was interesting because it was a small abstraction that was really only used internally in that source file. The functions that were more useful to consumers of the abstraction were not extensions of the protocol, but rather used the protocol internally. I thought this was particularly elegant because they need to know nothing about protocols or even that code used them.

Finger Trees - Chris Houser

Chouser gave a great talk on a data structure I had not heard of before, called Finger Trees. This is a new (not quite done yet) feature that will be included in Clojure that can have benefits over the other Clojure persistent data structures for specific use cases. His slides can be found out on github for more info. Seemed to me like the biggest wins of the data structure are the amortized constant time splitting/appending and counting. It achieves this through storing metadata at each tree root node, which makes summarization very easy. I read over some of the code as he was speaking (the code can be found here) and am looking forward to going over it more in-depth.

Keynote - Rich Hickey

It was good to hear Rich Hickey speak in person. This was the first time I've heard him. He went over some of the things to come in the near term for Clojure. I would say the main focus of his talk was on improving the performance of Clojure. One of those performance improvements was declaring when variables are allowed to be rebound. He discussed the need to do some checks to see if something has been rebound and most of the time there is little to no chance that something has been rebound. A good examples of this are the vast majority of the time functions are not rebound or redefined in production code (typically only at the REPL when doing development), yet this overhead always happens. He went over a new bit of metadata that would declare a variable as able to be rebound. I think this is a good change even without performance in mind. Right now we already have a convention that we put asterisks (or ear muffs) around variables that we intend for rebinding. This just codifies that convention and we get a performance boost as a bonus. He also went in depth on some Java primitive performance changes he was making. This also follows a similar theme in that it is a reduction in the complexity of the various primitive types and a performance boost as a bonus. The mismatch being Java primitives, auto-boxing and Clojure can be pretty confusing. His changes simplify many of those problems and through some new function subclasses, auto-boxing can be avoided. There was a lot of useful nuggets from Rich's talk.

One Ring to Bind Them - Mark McGranaghan

This was a talk on ring specifically but I would say it had more to do with design of good Clojure APIs. He covered how abstractions built on top of Clojure sequences with well-named functions could create a very appealing API. This talk was the crescendo of the theme that using Clojure fundamentals can lead to great code. As a testament to the ease of use of the library, there are quite a few other libraries that have been built on top of it.

From Concurrency to Parallelism - David Liebke

David gave a talk on some future Clojure functionality of providing more parallelism options. This was a great talk to have attended after hearing Guy Steele at Strange Loop. Much of what Guy discussed is in this experimental branch of Clojure. Slides of his talk can be found here. He started with the comparing map and pmap and then went over how pmap is different from the new parallel reduce stuff. He gave some example stats on the performance characteristics of these things. He had a pretty big caveat on how useful the stats were, but I think they did a good job of illustrating his point. Implementing this was definitely complex and David went into that a bit, but using the functions seemed very idiomatic and not much different from using plain reduce. It will be exciting to see this progress.

Step Away from the Computer - Rich Hickey

It was the keynote of the conference from Rich. It was a non-technical talk that focused on thinking things through without distractions before writing code or "solving" a problem. He went over some "how your brain works" that seemed similar to Pragmatic Thinking and Learning. He stressed the importance of thorough research, notes and evaluation when solving a problem. I think the areas that he discussed are something that developers can always improve upon.

Conclusion

There were many other good talks that I did not mention. It was good to get an update on Clojure support in Eclipse, hear some of the motivations behind lazy test and zippers. The lightening talks were also good, especially the ones covering Aleph and Infer. There was also a lightening talk by Alex Miller on zippers over records that we have been doing here at Revelytix. There was definitely a lot of excitement around Clojure at the conference. It was a very friendly atmosphere and lots of discussion in-between and after sessions. I talked with several people interested in semantic web technologies, triplestores and the work we're doing at Revelytix. It has encouraged me to blog more about it!

14Sep/100

Clojure 1.2 New Functions

Version 1.2 of Clojure was released not too long ago with lots of new features. Things like protocols, a metadata reader macro change etc are some of the bigger differences (nice slides on these big changes can be found here). In addition to these bigger changes, there are also a lot of very useful new functions added to Clojure core. Below are some of the new ones that I have found useful along with some info about them. Note that most of these existed in contrib before moving to core.

group-by

The group-by function takes a function as a first argument and a collection as the second:

(doc group-by)
-------------------------
clojure.core/group-by
([f coll])
  Returns a map of the elements of coll keyed by the result of
  f on each element. The value at each key will be a vector of the
  corresponding elements, in the order they appeared in coll.

A basic example of this would be to group a list of numbers by those that are odd and those that are even.

(group-by odd? (range 1 10))
=> {true [1 3 5 7 9], false [2 4 6 8]}

The above code calls odd? on each item from (range 1 10) and then puts the number in the true or false slot of the map, depending on whether or not it's odd. The important part here is that the function that is passed in could return anything, not just true and false. Similar to the above code, we could create groups with the string "odd" and "even"

(group-by #(if (odd? %) "odd" "even") (range 1 10))
=> {"odd" [1 3 5 7 9], "even" [2 4 6 8]}

Maybe we have a list of address data structures, which is just a list of the street, followed by the city, then state. We could group by city with a call like:

(def addresses (list '("742 Evergreen" "Springfield" "MO")
                             '("31 Spooner Street" "Quahog" "RI")
                             '("742 Evergreen" "Springfield" "VT")
                             '("742 Evergreen" "Springfield" "NT")))

(group-by second addresses)
=> {"Springfield" [("742 Evergreen" "Springfield" "MO")
                          ("742 Evergreen" "Springfield" "VT")
                          ("742 Evergreen" "Springfield" "NT")],
     "Quahog" [("31 Spooner Street" "Quahog" "RI")]}

The usage with the address like above is very similar to how I have used it.

shuffle

shuffle is a basic function that pseudo-randomly rearranges the elements of a collection using the basic java.util.Collections shuffle method.

(doc shuffle)
-------------------------
clojure.core/shuffle
([coll])
  Return a random permutation of coll

The usage of it is pretty intuitive:

(shuffle (range 1 10))
=> [6 8 1 2 4 3 7 9 5]

Pretty basic, but saved me some additional work.

reductions

reductions is an interesting function in that it's kind of a mingling between the map function and the reduce function. It's lazy like map, but you pass in an accumulator like in reduce. It outputs a list of the intermediate accumulator values. As an example

(reductions + 0 (range 1 10))
=> (0 1 3 6 10 15 21 28 36 45)

What I think is the most useful aspect of this is that it can maintain a lazy flow. So we can swap the 10 number range above with an infinite sequence

(def natural-numbers (iterate inc 1))
(take 100 (reductions + 0 natural-numbers))
=> (0 1 3 6 10 15 21 28 36 45 55 66 78...)

One gotcha is the first element in the returned list. In the above two results, the initial value (0) was specific in the function call and it is also the first item returned in the result list.

get-in

This is a useful function for getting items out of nested maps. Let's say we have a nested map that has a single letter as a key in the first map. The value at those keys are a map keyed by a two character key which has a value of a map with a three character key etc. To get the nested value of the third map, we could use get-in like below;

(def x {:a {:ab {:abc "123"}}, :b {:bc {:bcd "234"}}})
(get-in x [:a :ab :abc])
=> "123"

Thanks to Nate for showing me this function, I've since used it several times. Similar to this function is the assoc-in and the update-in function. They are similar in that they operate on nested maps, but they modify the nested map, rather than retrieve a value. The documentation for get-in is below:

(doc get-in)
-------------------------
clojure.core/get-in
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of ke(ys. Returns nil if the key is not present,
or the not-found value if supplied.

spit

spit is a very convenient function for writing the contents of a string to a file. Here's the docs for the function

(doc spit)
-------------------------
clojure.core/spit
([f content & options])
  Opposite of slurp.  Opens f with writer, writes content, then
  closes f. Options passed to clojure.java.io/writer.

Pretty self explainatory. You don't need to worry about opening, flushing or closing a stream. Below is code that takes a string named info and outputs it to a file:

(def info "some info to be written to a file...")
(spit "/path/to/file/info.txt" info)
12Aug/101

Circular Lists with Clojure

I was working on some code that repeatedly executed a defined set of queries against a database for a give amount of time. Think throughput testing. The list of queries was very small (like 5) but the number of times each was executed was pretty high (500+). After it executed the 5th query, I wanted it to execute the first one again and continue on. So I thought the easiest way to go about this was to have a circular-type list. I figured in Clojure it probably wouldn't be circular, but rather a lazy list that repeated itself. I looked around the source a bit (the internet connection at work was down, so there was no Google searching) and I wasn't coming up with anything. Not finding anything that fit my needs, I thought about it and made a couple of attempts at a solution. I was pretty unhappy with the solutions I was coming up with. I went back to the Clojure source trying to figure out how best to implement this and I stumbled on cycle. It was exactly what I needed. What struck me about cycle, was how concise and simple it was. First, it's use:

(take 10 (cycle (list 1 2 3 4)))
;=> (1 2 3 4 1 2 3 4 1 2)

It takes the collection passed in and creates a lazy sequence of the items in the list repeated infinitely. I would keep consuming items from this infinite list for some defined amount of time and then stop:

(doseq [query (cycle (query-list...))
               :while (keep-going? @run-state)]
    (execute-query query...))

In the above code, run-state is an atom that is updated once a particular point of time has been reached and then the query run completes. What struck me about the cycle function, was how simple it was:

(defn cycle
  "Returns a lazy (infinite!) sequence of repetitions of the
   items in coll."
  {:added "1.0"}
  [coll]
 (lazy-seq
          (when-let [s (seq coll)]
              (concat s (cycle s)))))

It's a very concise function. Here's a rundown of what it does. The lazy-seq macro takes as it's body an expression that yields a sequence (or nil). It's lazy, so it doesn't actually fully realize the sequence. Inside the lazy-seq call is a when-let which just ensures that what is passed in is a collection that has at least one element in it. If it is not a collection of at least one element, the lazy-seq is just nil. If it does, the sequence that the lazy-seq function is looking for comes from the concat statement. Concat is simple, it takes one or more collections and returns a single collection containing all of the elements in each passed in collection:

(concat (list 1 2 3 4) (list 5 6 7 8))
; =>(1 2 3 4 5 6 7 8)

With that info, the concat in cycle glues together the entire collection that was passed in, followed by a recursive call to itself (passing in the same collection). So with that, if the same 1, 2, 3, 4 list is passed in to cycle, the last line looks like:

(concat (list 1 2 3 4) (cycle (list 1 2 3 4)))

After the first four items are consumed, the recursive invocation of cycle will happen, again producing (list 1 2 3 4) followed by another recursive invocation. Since it's lazy and infinite, this can continue until the query execution time is up, whether it's 5 seconds or 5 hours.