Object Commando languages, development and design

4Nov/116

The Magical Island of Kanren – core.logic Intro Part 1

Overview

Previously I blogged on Appendo the Great without going into much detail on core.logic, what it is, how to use it etc. Hopefully this blog post will begin to fix that. This post was also inspired by a tweet from Ambrose Bonnaire-Sergeant on explaining a basic logic program with an unbound variable. That tweet got me thinking about how I would explain it.

First, core.logic is a Clojure implementation of miniKanren. miniKanren is a Logic Programming library embedded in Scheme. As the site states, it's mini in that there is another Kanren system with more bells and whistles, but is quite a bit more complex. The core.logic implementation follows the Scheme version pretty closely from a library user's perspective, so examples in books/papers such as Byrd's dissertation or The Reasoned Schemer translate pretty closely to the equivalent Clojure code.

The approach taken by the miniKanren folks gives the developer a lot of power by letting them pick the right tool for the job. Embedding the logic programming paradigm into a functional language allows you to use functional tools to solve your problems and delve into logic when it makes things easier. There's no need to interface with another language, you're in Clojure and you can use it directly.

Environment

Getting core.logic incorporated into your project works the same as any other Java or Clojure library, just include it in your project dependencies. Including it in your Leiningen project.clj would look like:

  :dependencies [...
                  [org.clojure/core.logic "0.6.5"]
                  ...]

Then lein deps and you're set. The examples below assume having a namespace setup like:

(ns logic-examples.ancenstors
  (:refer-clojure :exclude [==])
  (:use [clojure.core.logic]))

The Island of Kanren

Imagine the magical island of Kanren. On this island, just declaring things makes them so. There can be multiple right answers to a question and sometimes there are no answers at all. Here, seemingly familiar operations take on a different meaning and the natives speak with a unique dialect. Your code lives in functions and macros, just like any Clojure program. The key difference is your Kanren programs are really more like puzzles that you want the Kanren-ites to solve. Your logic functions on their own aren't much good, but the Kanren natives can do amazing things with them to solve your puzzle. The only way to get answers off of the island is to send your puzzle over with a raft [ ]. Unfortunately, there are some small holes in the raft and we need to put a bucket on it before we send the raft over. I'll call this bucket q, but you can call it whatever you like. Armed with our raft and bucket we can send it over with a puzzle and the Kanren natives will solve the program and send the bucket back. This bucket will contain the things (bindings) that the natives needed to solve the program. The easiest, yet still confusing program, just declares something that is always true:

(run* [q]
  (== 1 1))
= > (_.0)

It may seem strange at first, but the natives of Kanren have sent back a bucket with a single value _.0 in it. In the language of Kanren, this means anything and nothing. It means to solve the puzzle I sent them (which just declares that 1 is 1) q can take on any value. The number 7, a list of strings, nil etc. This makes sense as the Kanren natives can solve the puzzle with anything since the q does not actually play a role in solving the puzzle. We could make it a little more difficult and send over another raft asking for a 1

(run* [q]
  (== q 1))
= > (1)

Here we sent over a puzzle that could only be solved with q having the value of 1. There is no other way for the Kanren-ites to solve the puzzle (and still have it work), so they sent back a value of 1 for q. The reason I relate q to a bucket in a raft, is that's our only way of getting information off of the island. All other information stays on the island:

(run* [q]
  (fresh [x y]
    (== x "foo")
    (== y "bar")
    (== q 1)))
= > (1)

The fresh call here is just a way to get a new variable (er, new buckets). This new variable is just like q, but it always stays on the island. Note our answer is the same as before because nothing has changed with the bucket vs. the program before. It's possible that q has more than one value (examples will be below) and it's also possible for q to be a collection:

(run* [q]
  (fresh [x y]
    (== x "foo")
    (== y "bar")
    (== q [1 x y])))
= > ([1 "foo" "bar"])

The above gets the information from x and y by stuffing it in q. So q is where we want the answer to be stored since that is what comes back on the raft. If we don't put the answer in the q bucket, we won't get anything useful. It's also possible that q has no answer. We could play a trick on the Kanren-ites and send them an unsolvable puzzle:

(run* [q]
  (== q 1)
  (== q 2))
= > ()

They saw through my ruse. It's not possible to solve a puzzle where q is both 1 and 2, so they send back an empty bucket.

Less Metaphor more Examples

Above, I covered the mechanics of running a core.logic program, er, puzzle. A logic program follows some different rules, so they need to be ran inside the run* block. The above examples only used run* which asks for all possible solutions to the program. Another key point from the above program is that I never assigned a value to q. I declared that q should equal 1. The Kanren-ites needed to take it from there. Instead I could have declared it equal to anything, a string, list, character etc. To cover some of the core.logic building blocks, I'll side-step integers and lists for the time being. To demonstrate some of these basic features, I'll be using some of the relation and fact features of core.logic. First I'll start with a father and mother relation:

(defrel father Father Child)
(defrel mother Mother Child)

(facts father [['Vito 'Michael]
               ['Vito 'Sonny]
               ['Vito 'Fredo]
               ['Michael 'Anthony]
               ['Michael 'Mary]
               ['Sonny 'Vicent]
               ['Sonny 'Francesca]
               ['Sonny 'Kathryn]
               ['Sonny 'Frank]
               ['Sonny 'Santino]])

(facts mother [['Carmela 'Michael]
               ['Carmela 'Sonny]
               ['Carmela 'Fredo]
               ['Kay 'Mary]
               ['Kay 'Anthony]
               ['Sandra 'Francesca]
               ['Sandra 'Kathryn]
               ['Sandra 'Frank]
               ['Sandra 'Santino]])

From this basic setup we can ask some basic questions:

(run* [q]
  (father 'Vito q))
= > (Sonny Michael Fredo)

(run* [q]
  (father q 'Michael))
= > (Vito)

These questions use the father relation. Given Vito is the father, what values of q would satisfy the father relation? We could do the same thing for the reverse. Given a child, who's the father? Note in the first example that there are three valid solutions to the relation, and any of them are correct. If I only asked for one solution:


(run 1 [q]
  (father 'Vito q))
= > (Sonny)

I would just get Sonny.

Conde

To ask better questions, we'll need conde. First lets define as a function, what the definition of parent is:

  (defn parent [p child]
    (conde
      ((father p child))
      ((mother p child))))

(run* [q]
  (parent q 'Michael))
= > (Vito Carmela)

The function parent above uses something new. It's one of the core building blocks of core.logic, called conde. I think of conde as a very fancy or. If you think of an or statement in a functional or imperative languages, it finds the first true value, then stops. On the Island of Kanren, there can be many correct answers to a puzzle, and again the Kanren-ites are trying to solve the puzzle every way possible. So in the above, there are two possibilities when using parent according to the defined relation. It tries the father relation first. If that passes, it will count as one answer. It will then come back through, act as if the first condition failed and will run the second statement. In this case, the first and second line are mutually exclusive, so it's pretty similar to a straight or.

(run* [q]
  (fresh [x y z]
    (conde
      ((== x "1st"))
      ((== y "2nd"))
      ((== z "3rd")))
    (== q [x y z])))
= > (["1st" _.0 _.1] [_.0 "2nd" _.1] [_.0 _.1 "3rd"])

The above illustrates this trying of the various clauses. Each of the above three clauses will always pass if executed. The three answers returned are all separate, so the first answer only has a value for x. The y and z are still fresh (and could be any value). The same is true for the second and third answer.

More Relations

These functions can also stack on each other:

(defn grandparent [gparent child]
  (fresh [p]
    (parent gparent p)
    (parent p child)))

(run* [q]
  (grandparent q 'Anthony))
= > (Vito Carmela)

(run* [q]
  (grandparent 'Vito q))
= > (Francesca Frank Vicent Mary Kathryn Santino Anthony)

The definition of grandparent is that the grandparent gparent is the parent of some new person p and that person p is the parent of child. The Kanren-ites can also solve more complex puzzles where the grandparent and child is unknown but the parent is known:

(run* [q]
  (fresh [gparent p child]
    (== p 'Michael)
    (parent p child)
    (grandparent gparent child)
    (== q {:grandparent gparent :parent p :child child})))

= > ({:grandparent Vito, :parent Michael, :child Mary}
     {:grandparent Vito, :parent Michael, :child Anthony}
     {:grandparent Carmela, :parent Michael, :child Mary}
     {:grandparent Carmela, :parent Michael, :child Anthony})

There are a few things interesting about the above example. First no child or grandparent was specified. In this puzzle, we specified the parent (in this case Michael) but nothing else. The Kanren-ites had to use this constrant in trying to find the grandparent(s) and children. The second interesting piece of the above is the usage of the map as the value of q. Since there are several answers and several pieces of information for each answer, I stuffed the results in a map. For the puzzle to be solved, the appropriate values needed to be plugged into the map.

What Next?

There's a lot more to cover in core.logic. Specifically, I was pretty light on the magic and "special-ness" of relation functions, variables etc. I'll cover that in my next post along with doing useful things with lists and matching.

Since the library is fairly new, there's not a ton of resources on it yet. For more information on core.logic specifically, Ambrose's Logic Starter has good information and there's some in the core.logic README. Byrd's dissertation and The Reasoned Schemer are very good. They are written in Scheme, but they translate to Clojure easily.

13Oct/114

Appendo the Great

I have been spending a some time recently trying to get a better understanding of logic programming. I was very much inspired by David Nolen's core.logic library. I am reading several different papers and books trying to understand the paradigm. All of them talked about an append function, but I was having difficulty understanding why it was so important.

From the Reasoned Schemer, conso, the relational form of cons, gets the title conso the magnificent. And conso certainly is magnificent, but I think appendo should get some credit too. In core.logic and miniKanren, append is called appendo. Like concat in Clojure it appends two lists together to create a third list. It can be used as a building block for many other functions. The definition of the function (from core.logic) is pretty small:

(defne appendo [x y z]
  ([() _ y])
  ([[?a . ?d] _ [?a . ?r]] (appendo ?d y ?r)))

defne above is syntax sugar around matche. It creates a function (named appendo) that accepts three arguments. It matches x, y and z with the three arguments in the list at the beginning of each clause. In this case the first clause is matched cons style. ?a is declared as the first item and ?d is declared as the rest. The second item is ignored, the third is split in the same way. Note that the ?a is shared here on the first and third parameter of the match. This makes the first item in x, the first item in z. Assuming the x and z can be matched as above, appendo is recursively called on the rest of each list (including y, untouched). Something similar to the shared ?a is going on when the recursion bottoms out (when x is empty). In this clause y is declared as z. This makes y the tail of a cons like expression (specifically the ?r) from the previous recursive call. That's a lot packed into such a small amount of code! Below I will build the case for appendo's new title.

Building Blocks

To run the below code, you need to use the right namespaces. Something like this:

(ns logic-examples.seq
  (:refer-clojure :exclude [inc reify ==])
  (:use [clojure.core.logic minikanren prelude nonrel match disequality]))

I thought I understood the appendo function, but I didn't understand why each source spent a lot of time on it. It seemed straight-forward enough. Then I had that ah-ha moment while reading up on Prolog's prefix and suffix functions. Here's something roughly equivalent in core.logic that I wrote before my ah-ha:

(defne prefixo [x y]
  ([[] _])
  ([[?a . ?xs] [?a . ?ys]] (prefixo ?xs ?ys)))

This declares that x is a prefix of y. So it could be used to delcare an x as [1 2] and y as [1 2 3 4 5]. Like above, it uses defne, matching in a similar way. I'm using the shared ?a? to ensure the first item of x is the first item of y and continue recursively. I then began to think of appendo as less like concat in Clojure and more like what list could I concat with this other list to yield some result list (user specified or not). That realization led me to change my prefixo function to be the following:

  (defn prefixo2 [x y]
    (fresh [a]
           (appendo x a y)))

This function appends x with a new (or fresh) variable to yield the value y. So substituting values into the appendo call (using the above example): (appendo [1 2 3] a [1 2 3 4 5]). When unified, a is obviously [4 5] because appending [1 2 3] with [4 5] would satisfy the result list [1 2 3 4 5]. With some great core.logic magic, a will match anything that would appear after items in x in order to satisfy y. Part of the mind-bending nature of logic programming is the mixture of declaring, binding, predicates and the wiggle room that result variable gives you. As an example, there's no need to provide a defined y in the above call:

(run* [q]
  (prefixo2 [1 2 3] q))
= > ((1 2 3 . _.0))

This code runs prefixo but leaves y fresh. This lets core.logic decide what the value should be. So it is asking what will satisfy prefixo2 when x is [1 2 3]. The answer from core.logic is (1 2 3 . _.0). ._0 is the tail of the list and could represent anything, meaning that y could be a list with 1, 2, 3 followed by nothing or 4 or ["Klaatu" "Barada" "Nikto"]. These arbitrary values are numbered to distinguish between different values in the case that there could be multiple arbitrary values that could be (but don't have to be) different. More examples of _.0 are below. The above code will also handle other constraints on the list:

(defn lasto
  "Declares x as the last item in y"
  [x y]
  (fresh [a]
         (appendo a [x] y)))

(run 3 [q]
  (lasto "Hail to the king baby" q)
  (prefixo2 [1 2 3] q))
= > ((1 2 3 "Hail to the king baby")
        (1 2 3 _.0 "Hail to the king baby")
        (1 2 3 _.0 _.1 "Hail to the king baby"))

This basically says give me three lists where 1, 2, 3 is the prefix and the last element is "Hail to the king baby". A proper [ Army of Darkness ending. The easiest one is the list containing only those four items, which is the first returned. The next solution says any one element can exist between 3 and "Hail to the king baby". The next solution says any two elements can be between the theird and the last elements etc. Passing 3 to run tells core.logic how many solutions you want it to come up with. Using run* will ask for all solutions (not a good idea with this example code). This code has two fresh variables _.0 and _.1. These could be any value. They could be different values or they could both be the number 42. They are labeled distinctly because they could have distinct values.

More appendo fun

Seeing what I could do with prefix, I then followed that with suffix:

(defn suffixo
  "Declares the elements of x as the end of list y"
  [x y]
  (fresh [a]
         (appendo a x y)))

Nothing new there. membero is a function in core.logic that declares that x (an element) occurs in a list y. A version can also be built out of appendo (though probably not as efficiently):

(defn my-membero
  "A (probably inefficient) function stating x is an element in y"
  [x y]
  (fresh [a b]
         (appendo a (llist x b) y)))         

(run* [q]
     (my-membero q [1 2 3 4 5]))
= > (1 2 3 4 5)

(run 5 [q]
  (my-membero 3 q))
= > ((3 . _.0) (_.0 3 . _.1) (_.0 _.1 3 . _.2) (_.0 _.1 _.2 3 . _.3) (_.0 _.1 _.2 _.3 3 . _.4))

The first example of it's use gives a list of [1 2 3 4 5] and asks core.logic to come up with all of the members (which is just each element). The second asks the opposite question. Declaring that 3 is in the list, what are the lists that would satisfy it. The first result has 3 as the first element (followed by anything), the second list has 3 as the second element (followed by anything) and so on.

Using appendo to find sublists was interesting:

(defn sublisto
  "The list x (in sequence) appears in y"
  [x y]
  (fresh [a b c]
         (appendo a b y)
         (appendo x c b)))

= > (set (run* [q]
        (sublisto q [1 2 3 4 5])))
= > #{() (1) (2) (1 2 3) (2 3) (3) (3 4) (4 5) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (4) (3 4 5) (1 2 3 4) (5)}

Chaining it together with my-membero can find all sublists containing a particular number:

(set (run* [q]
       (sublisto q [1 2 3 4 5])
       (my-membero 2 q)))
= > #{(2) (1 2 3) (2 3) (2 3 4) (1 2 3 4 5) (1 2) (2 3 4 5) (1 2 3 4)}

Or with the help of adjacent, we can find all sublists where two numbers occur next to each other:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
    (appendo a (llist x y b) z)))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

This will only find sublists that have x followed by y, and it must be in that order. To make adjacent a little more flexible, the below version will find x and y next to each other, regardless of order:

(defn adjacent
  "Declares x and y as adjacent items in list z"
  [x y z]
  (fresh [a b]
         (conde
          ((appendo a (llist x y b) z))
          ((appendo a (llist y x b) z)))))

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 5 1 q)))         

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(set (run* [q]
       (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
       (adjacent 1 5 q)))

= > #{(1 2 3 4 5 1 2 3) (4 5 1 2 3) (3 4 5 1 2 3 4 5) ...}

(= (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 5 1 q)))
   (set (run* [q]
          (sublisto q (apply concat (repeat 2 [1 2 3 4 5])))
          (adjacent 1 5 q))))

= > true

That's a lot of power from a little function. I think appendo deserves the title. The code for the functions above can also be found here.

22Feb/110

CouchDB with Clojure

IBM developerWorks just published an article I wrote on CouchDB and Clojure. It uses clj-http, Clutch and CouchDB. It aims to cover the REST APIs of CouchDB by way of Clojure, using parallel examples in Clutch and straight HTTP calls using clj-http. It also discusses CouchDB views and replication. Big thanks to Tunde for Clutch, Mark for clj-http, the CouchDB folks and Alex for reviewing the article.

4Feb/110

Getting Specific Document Revisions with Clutch 0.2.4

I had intended to get this out last week, but didn't quite make it. On 2/3/2011, version 0.2.4 of Clutch. One of the new features is support for getting a specific revision of a document. The code to do that looks like:

(clutch/get-document document-id {:rev revision-id} #{:rev})

The call to get a specific revision is the same as getting the most recent version except for the two extra paramters to the get-document function. The first is {:rev revision-id} and is a query-paramter map that Clutch uses when asking CouchDB for the document. In the example above, I'm asking CouchDB for a document with document-id document-id and a revision of revision-id. Typically when sending query paramters to CouchDB, you want to JSON encode them, however there are some special built-in CouchDB paramters (such as revision) that you do not want to encode using JSON. The last argument is #{:rev}, telling Clutch to not encode the :rev keypair in the query map. The third argument above can be any function that determines which revision to exclude.

CouchDB 1.0.1

Included in this release is changes needed for the Clojure view server to work with CouchDB 1.0.1 (thanks Tunde!). Previously, attempting to use the Clutch view server with the latest version of CouchDB would result in server side timeouts calling to the Clutch view code. For more information on setting up the Clojure view server see the README.

Tagged as: , No Comments
25Oct/102

First Clojure Conj Highlights

The last couple of weeks has been busy! First Strange Loop then the first Clojure Conj. The conference was the first Clojure conference of it's kind and was packed with technical content. It was single track so I didn't have to worry about missing anything. Below I'll summarize some of the highlights of the conference from my perspective.

(not= DSL macros) Christophe Grand

A very valuable talk. Christophe's central theme was that many of the gains in syntax that can be obtained from macros can also be achieved using functions and other Clojure fundamentals. This was the beginning of a theme for the conference. He gave specific examples of this with his framework enlive. His stories were around users wanting to extend some functionality provided for by the library. Since much of this was implemented with macros, the users of his library ended up being frustrated and were not able to do what they wanted. He then made some substantial changes to the codebase to implement much of the functionality that was in the macros as functions. Backing these by functions with a smaller macro layer on top enabled his users to better take advantage of his framework. This boiled down to the realization that the DSL can exist in functions through clear naming and composibility as well as in the macros. He gave some excellent examples in the construction of a function based DSL for regular expressions (code here)that ended up having functionality not possible with the normal regular expression DSL (or with macros). I've got it on my TODO list to took over Christophe's enlive and regular expression code more in-depth.

Clojure Protocols - Sean Devlin

I'm pretty familiar with protocols and have used them at Revelytix quite a bit, but I still got a lot of value out of Sean's talk. His talk was mainly around creating a protocol of java.util.Date and other date like types in Java. His example created a protocol with a to-ms function that basically converted some sort of date representation to a long representation of that date. From that small abstraction he built many functions that were consumers of that abstraction. I think this was interesting because it was a small abstraction that was really only used internally in that source file. The functions that were more useful to consumers of the abstraction were not extensions of the protocol, but rather used the protocol internally. I thought this was particularly elegant because they need to know nothing about protocols or even that code used them.

Finger Trees - Chris Houser

Chouser gave a great talk on a data structure I had not heard of before, called Finger Trees. This is a new (not quite done yet) feature that will be included in Clojure that can have benefits over the other Clojure persistent data structures for specific use cases. His slides can be found out on github for more info. Seemed to me like the biggest wins of the data structure are the amortized constant time splitting/appending and counting. It achieves this through storing metadata at each tree root node, which makes summarization very easy. I read over some of the code as he was speaking (the code can be found here) and am looking forward to going over it more in-depth.

Keynote - Rich Hickey

It was good to hear Rich Hickey speak in person. This was the first time I've heard him. He went over some of the things to come in the near term for Clojure. I would say the main focus of his talk was on improving the performance of Clojure. One of those performance improvements was declaring when variables are allowed to be rebound. He discussed the need to do some checks to see if something has been rebound and most of the time there is little to no chance that something has been rebound. A good examples of this are the vast majority of the time functions are not rebound or redefined in production code (typically only at the REPL when doing development), yet this overhead always happens. He went over a new bit of metadata that would declare a variable as able to be rebound. I think this is a good change even without performance in mind. Right now we already have a convention that we put asterisks (or ear muffs) around variables that we intend for rebinding. This just codifies that convention and we get a performance boost as a bonus. He also went in depth on some Java primitive performance changes he was making. This also follows a similar theme in that it is a reduction in the complexity of the various primitive types and a performance boost as a bonus. The mismatch being Java primitives, auto-boxing and Clojure can be pretty confusing. His changes simplify many of those problems and through some new function subclasses, auto-boxing can be avoided. There was a lot of useful nuggets from Rich's talk.

One Ring to Bind Them - Mark McGranaghan

This was a talk on ring specifically but I would say it had more to do with design of good Clojure APIs. He covered how abstractions built on top of Clojure sequences with well-named functions could create a very appealing API. This talk was the crescendo of the theme that using Clojure fundamentals can lead to great code. As a testament to the ease of use of the library, there are quite a few other libraries that have been built on top of it.

From Concurrency to Parallelism - David Liebke

David gave a talk on some future Clojure functionality of providing more parallelism options. This was a great talk to have attended after hearing Guy Steele at Strange Loop. Much of what Guy discussed is in this experimental branch of Clojure. Slides of his talk can be found here. He started with the comparing map and pmap and then went over how pmap is different from the new parallel reduce stuff. He gave some example stats on the performance characteristics of these things. He had a pretty big caveat on how useful the stats were, but I think they did a good job of illustrating his point. Implementing this was definitely complex and David went into that a bit, but using the functions seemed very idiomatic and not much different from using plain reduce. It will be exciting to see this progress.

Step Away from the Computer - Rich Hickey

It was the keynote of the conference from Rich. It was a non-technical talk that focused on thinking things through without distractions before writing code or "solving" a problem. He went over some "how your brain works" that seemed similar to Pragmatic Thinking and Learning. He stressed the importance of thorough research, notes and evaluation when solving a problem. I think the areas that he discussed are something that developers can always improve upon.

Conclusion

There were many other good talks that I did not mention. It was good to get an update on Clojure support in Eclipse, hear some of the motivations behind lazy test and zippers. The lightening talks were also good, especially the ones covering Aleph and Infer. There was also a lightening talk by Alex Miller on zippers over records that we have been doing here at Revelytix. There was definitely a lot of excitement around Clojure at the conference. It was a very friendly atmosphere and lots of discussion in-between and after sessions. I talked with several people interested in semantic web technologies, triplestores and the work we're doing at Revelytix. It has encouraged me to blog more about it!