Clojure and me has moved.

Thursday, August 13, 2009

About the subtitle

This post has moved, go to its new location
The subtitle for this blog may be misleading. Actually it reads "When the pupil is ready to learn, a teacher will appear."
I picked this Zen proverb because in early 2008 I was looking for a language with the following requirements: strong metaprogrammability, strong concurrency support, strong functional bias and bonus points for running on the JVM. I had prepared myself to not find the rare bird when I met Clojure (and it was nearly love at first sight).
So in that perspective I am the pupil and Clojure (the language, Rich Hickey and the whole community) the teacher.

Thursday, August 6, 2009

Clojure as fast as Java!

This post has moved, go to its new location
I wrote a prototypal implementation of IPersistentSet in Clojure written with new new and, surprisingly, on its first benchmark it's already on par with Java-based implementations.
(dotimes [i 10]
(let [n (int (Math/pow 2 (+ 10 i)))
_ (println "n=" n)
a (do
(print "clojure\t")
(time (doall (let [s (reduce conj empty-set (range 0 n 2))] (map #(get s %) (range n))))))
b (do
(print "java\t")
(time (doall (let [s (reduce conj #{} (range 0 n 2))] (map #(get s %) (range n))))))]
(println (= a b))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

n= 1024
clojure "Elapsed time: 1.006064 msecs"
java "Elapsed time: 0.736197 msecs"
true
n= 2048
clojure "Elapsed time: 1.813009 msecs"
java "Elapsed time: 1.328102 msecs"
true
n= 4096
clojure "Elapsed time: 3.590191 msecs"
java "Elapsed time: 2.608153 msecs"
true
n= 8192
clojure "Elapsed time: 7.046566 msecs"
java "Elapsed time: 5.807302 msecs"
true
n= 16384
clojure "Elapsed time: 16.015862 msecs"
java "Elapsed time: 10.284897 msecs"
true
n= 32768
clojure "Elapsed time: 29.803928 msecs"
java "Elapsed time: 23.850378 msecs"
true
n= 65536
clojure "Elapsed time: 68.79778 msecs"
java "Elapsed time: 63.947582 msecs"
true
n= 131072
clojure "Elapsed time: 132.082499 msecs"
java "Elapsed time: 113.433411 msecs"
true
n= 262144
clojure "Elapsed time: 292.149631 msecs"
java "Elapsed time: 265.39197 msecs"
true
n= 524288
clojure "Elapsed time: 595.265321 msecs"
java "Elapsed time: 698.711009 msecs"
true

To tell the truth, this benchmark is a bit unfair for Java: no dummy map and slightly different algorithms (half-full bitmap nodes become array-nodes, no leaf-node etc.).
Follow me on Twitter, here

What *warn-on-reflection* doesn't tell you about arrays

This post has moved, go to its new location
user=> (time (let [a (make-array Object 100)] (dotimes [i 10000000] (aset a (rem i 100) nil))))
Reflection warning, NO_SOURCE_PATH:840 - call to aset can't be resolved.
"Elapsed time: 136063.015272 msecs"

With aget/aset, the index must be hinted to be an int:
user=> (time (let [a (make-array Object 100)] (dotimes [i 10000000] (aset a (int (rem i 100)) nil))))
"Elapsed time: 1064.546402 msecs"

Wow, more than 100x faster (reflection is bad) but despite the compiler doesn't complain one can help it to choose a faster path:
user=> (time (let [a #^"[Ljava.lang.Object;" (make-array Object 100)] (dotimes [i 10000000] (aset a (int (rem i 100)) nil))))
"Elapsed time: 247.446882 msecs"

On the whole we get a 500x speed-up with only two type hints.

Thursday, July 30, 2009

Everybody loves the Sieve of Eratosthenes

This post has moved, go to its new location

If I judge by traffic logs for this blog, many newcomers want to compute prime numbers in Clojure.

A recent thread on the mailing list prompted me to test an idea I had for a while: to use a map to implement the Sieve of Eratosthenes.

The first implementation was this one:

(defn primes [max]
(let [enqueue (fn [sieve n factor]
(let [m (+ n factor)]
(assoc sieve m
(conj (sieve m) factor))))
next-sieve (fn [sieve candidate]
(if-let [factors (sieve candidate)]
(reduce #(enqueue %1 candidate %2)
(dissoc sieve candidate)
factors)
(enqueue sieve candidate candidate)))]
(apply concat (vals (reduce next-sieve {} (range 2 max))))))
where the sieve is a map from the next non-prime numbers to their factors. It's so naive that even numbers are tested for primality but it doesn't perform that bad: on my box, it takes 3s to compute all primes below 1,000,000 (it's roughly as fast as clojure.contrib.lazy-seqs/primes when the seq isn't yet memoized).

I wasn't happy with the way I handled the case where a non-prime was already checked off (ie was a key of the map): I was conjing onto the list of prime factors for this number. If instead I tried to check off n+p (where n is the already known non-prime and p the current prime) or n+2p or n+3p... until I found a yet unknown non-prime I wouldn't need to maintain a list of factors, nor to conj or reduce over it. And I was hoping that less allocations would yield better perfs.

Here is the second iteration:

(defn primes2 [max]
(let [enqueue (fn [sieve n factor]
(let [m (+ n factor)]
(if (sieve m)
(recur sieve m factor)
(assoc sieve m factor))))
next-sieve (fn [sieve candidate]
(if-let [factor (sieve candidate)]
(-> sieve
(dissoc candidate)
(enqueue candidate factor))
(enqueue sieve candidate candidate)))]
(vals (reduce next-sieve {} (range 2 max)))))
and it computes all the primes below 1,000,000 in 1.8s instead of 3s but it still tests even numbers for primality.

primes3 is primes2 modified to only test odd numbers:

(defn primes3 [max]
(let [enqueue (fn [sieve n factor]
(let [m (+ n (+ factor factor))]
(if (sieve m)
(recur sieve m factor)
(assoc sieve m factor))))
next-sieve (fn [sieve candidate]
(if-let [factor (sieve candidate)]
(-> sieve
(dissoc candidate)
(enqueue candidate factor))
(enqueue sieve candidate candidate)))]
(cons 2 (vals (reduce next-sieve {} (range 3 max 2))))))
and it computes the same list of primes in 1.5s.

Out of curiosity, I wrote a lazy version of primes3:

(defn lazy-primes3 []
(letfn [(enqueue [sieve n step]
(let [m (+ n step)]
(if (sieve m)
(recur sieve m step)
(assoc sieve m step))))
(next-sieve [sieve candidate]
(if-let [step (sieve candidate)]
(-> sieve
(dissoc candidate)
(enqueue candidate step))
(enqueue sieve candidate (+ candidate candidate))))
(next-primes [sieve candidate]
(if (sieve candidate)
(recur (next-sieve sieve candidate) (+ candidate 2))
(cons candidate
(lazy-seq (next-primes (next-sieve sieve candidate)
(+ candidate 2))))))]
(cons 2 (lazy-seq (next-primes {} 3)))))
and, surprisingly (better locality?), it computes the primes below 1,000,000 in 1s.

Monday, June 8, 2009

Linear Interpolation and sorted-map

This post has moved, go to its new location
Sorted collections (with subseq and rsubseq) can help when working with a partition of disjoint intervals, eg when you need to interpolate.
(defn interpolator 
"Takes a coll of 2D points (vectors) and returns
their linear interpolation function."
[points]
(let [m (into (sorted-map) points)]
(fn [x]
(let [[[x1 y1]] (rsubseq m <= x)
[[x2 y2]] (subseq m > x)]
(if x2
(+ y1 (* (- x x1) (/ (- y2 y1) (- x2 x1))))
y1)))))

;; => (map (interpolator [[0 0] [1 1] [3 2] [4 3]]) (range 0 9/2 1/2))
;; (0 1/2 1 5/4 3/2 7/4 2 5/2 3)

Moustache: syntax file and tests

This post has moved, go to its new location
Tests and a hairy syntax file.

Friday, June 5, 2009

Enlive selectors: documented!

This post has moved, go to its new location
Selectors syntax

Sunday, May 17, 2009

The Need for More Lack of Understanding

This post has moved, go to its new location
A recent post by Gilad Bracha echoed with my experience designing two small internal DSL in Clojure (Moustache and Enlive's selectors).

It's not the same kind of non-understanding which I have in mind. I'm talking about a macro/DSL being able to not understand what is passed to it.

I think it's an important feature for the user to be able to get out of your DSL and back in regular Clojure.

In both Moustache and Enlive, I avoid to give a special meaning to lists (in Clojure you also have vectors, maps and sets), hence lists always denote user-code and are embedded as-is in the macro expansion. (If I really had to use lists for a DSL, I would check a list doesn't start with a special value (eg do or unquote (~) — thanks to MB's comment) before processing it).

That's why in Enlive you can easily add your own selectors step [:p (my-selector-step arg1 arg2)] as long as (my-selector-step arg1 arg2) evaluates to a correct value (here a state machine).

That's also how Moustache supports wrapping another Ring handler or custom route validation.

Monday, May 4, 2009

Counting without numbers

This post has moved, go to its new location
I was writing something along these lines:
(loop [state init, n (count some-seq)]
(if (pos? n)
(recur value (dec n))
(ends here)))
when it struck me that seqs are numerals too!
(loop [state init, n some-seq]
(if (seq n)
(recur value (rest n))
(ends here)))
or:
(loop [state init, n (seq some-seq)]
(if n
(recur value (next n))
(ends here)))

Friday, May 1, 2009

Functionally growing a tree (2): insertion points and zippers

This post has moved, go to its new location
I just uploaded a library that builds on zippers but shifts emphasis from nodes to insertion-points (interstitial positions before, between and after nodes).
It eases growing trees from an empty root.
  ; show-ip represents the currently edited structure with * denoting the insertion-point
(defn show-ip [ip] (-> ip (insert-left '*) first z/root))
(def e (-> [] z/vector-zip (insertion-point :append)))
(-> e show-ip) ; [*]
(-> e (insert-left 1) show-ip) ; [1 *]
(-> e (insert-left 1) (insert-right 2) show-ip) ; [1 * 2]
(-> e (insert-left 1) (insert-right 2) left show-ip) ; [* 1 2]
(-> e (insert-left [1 2]) show-ip) ; [[1 2] *]
(-> e (insert-left [1 2]) left show-ip) ; [* [1 2]]
(-> e (insert-left [1 2]) left right show-ip) ; [[1 2] *]
(-> e (insert-left [1 2]) left next show-ip) ; [[* 1 2]]
(-> e (insert-left [1 2]) left next next show-ip) ; [[1 * 2]]
(-> e (insert-left [1 2]) left next next next show-ip) ; [[1 2 *]]
(-> e (insert-left [1 2]) left next next next next show-ip) ; [[1 2] *]
(-> e (insert-left [1 2]) left next next next next prev show-ip) ; [[1 2 *]]

Wednesday, April 29, 2009

.toUpperCase

This post has moved, go to its new location
I recently learnt that when you want to convert the case of a technical identifier (a tagname, a HTTP header etc.) you must not use plain .toUpperCase or .toLowerCase but specify Locale/ENGLISH.

Tuesday, April 28, 2009

Counting occurences (solution to the exercise)

This post has moved, go to its new location
What does (merge-with + {:a 12} {:b 4} {:a 3 :b 7}) return?
{:b 11, :a 15} when there are several values for a key, these values are merged (two at a time) using the specified function — here they are summed.
Can you count occurrences of each value in a collection using merge-with?
(apply merge-with + (map (fn [x] {x 1}) coll)) or, using for: (apply merge-with + (for [x coll] {x 1})).

Monday, April 27, 2009

Screenscraping with Enlive

This post has moved, go to its new location
(select (html-resource (java.net.URL. "http://clojure-log.n01se.net/")) [:#main [:a (attr? :href)]]) returns a seq of link nodes.

Counting occurences

This post has moved, go to its new location
One asked me how to count occurrences of each value in a collection. I answered (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} coll). Since it can take some time to get accustomed to the functional way of thought, here is how one can work such an expression out:
How to count all occurrences of 42?
(count (filter #{42} coll))
How to express count using reduce?
(defn my-count [coll] (reduce (fn [n _] (inc n)) 0 coll))
How to count all occurrences of 42 using reduce?
(reduce (fn [n _] (inc n)) 0 (filter #{42} coll))
Can you get rid of the filter?
(reduce (fn [n x] (if (= 42 x) (inc n) n)) 0 coll)
I'd like the result to be {42 occurences-count}.
(reduce (fn [m x] (if (= 42 x) (assoc m 42 (inc (m 42))) m)) {42 0} coll)
42 is hardcoded in four places, it's bad!
(reduce (fn [m x] (if (= 42 x) (assoc m x (inc (m x))) m)) {42 0} coll)
Can't you replace {42 0} with {}?
No (inc (m x)) would fail because (m x) would return nil.
How does one provide a default value when the key is not found ?
(a-map a-key default-value)
Can't you replace {42 0} with {}?
(reduce (fn [m x] (if (= 42 x) (assoc m x (inc (m x 0))) m)) {} coll)
I changed my mind I'd like you to count occurrences of each value.
Easy! (reduce (fn [m x] (assoc m x (inc (m x 0)))) {} coll) or, terser, (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} coll)

Exercise:
What does (merge-with + {:a 12} {:b 4} {:a 3 :b 7}) return?
Can you count occurrences of each value in a collection using merge-with?

Wednesday, April 22, 2009

Vicious bug

This post has moved, go to its new location
What's wrong with this code (do (defmacro foobar [x] (doto x println)) (foobar (+ 1 1)))?
Nothing.
Run it again.
The first time, foobar is treated as a function (its argument has been evaluated) because foobar is unknown before the whole top-level expression is compiled.
The second time, the var named foobar preexists and is a macro, (foobar (+ 1 1)) is expanded.
This behaviour bit me while using with-test to test macros. It's the kind of bug that goes unnoticed while developing incrementally and evaluating in the same REPL, it was waiting for me to start a fresh REPL.

Sanitizing HTML with Enlive

This post has moved, go to its new location
net.cgrand.enlive-html=> (sniptest "<div id=user-data>" 
[:#user-data] (html-content "code injection<script>alert('boo')</script>")
[:#user-data (but #{:p :br :a :strong :em})] nil)
"<html><body><div id=\"user-data\">code injection</div></body></html>"
You also need to remove most attributes but it's just a demo of something that was impossible with the old Enlive.

By the way, the old Enlive is no more. Long live the new Enlive!

Saturday, April 18, 2009

Make it work, make it right, make it fast

This post has moved, go to its new location
Make it work, make it right, make it fast.
Make it work
sort of done: the old Enlive,
Make it right
work in progress, the new Enlive,
Make it fast
once the "right" branch is merged into master.

See the README file to know what's new and here for an example.

Friday, April 17, 2009

Mapping every second item

This post has moved, go to its new location
I wanted to apply a function to every second item in a coll. I was considering writing something using interleave, take-nth and map or a combination of mapcat and partition when I thought of this:
(map #(%1 %2) (cycle [f identity]) coll)
I really love clojure's map parallel processing. (I should ask if every? and some could be allowed to take several colls.)

Wednesday, March 25, 2009

mapsplice

This post has moved, go to its new location
I needed to update a seq as follows: apply a given function to items at specified indices and insert the result (a seq) in place.
My first laborious attempt involved extracting subseqs of untouched items, mapping the function on the remaining items then joining everything back. Bleh! (Really, I don't like indices but this time I haven't found a way to work around.)
I was sad when, suddenly, I found this shortest solution:
(defn mapsplice [f coll indices]
(let [need-splice (map (set indices) (iterate inc 0))]
(mapcat #(if %2 (f %1) [%1]) coll need-splice)))
;; (mapsplice #(repeat % %) (range 10) [7 3])
;; (0 1 2 3 3 3 4 5 6 7 7 7 7 7 7 7 8 9)

Tuesday, March 3, 2009

Simple variations on state machines in Clojure

This post has moved, go to its new location
Given a transition function that takes the current state and an input value as arguments then (reduce transition-fn initial-state input) returns the final state.

If you are interested in intermediate states, you can use clojure.contrib.seq-utils/reductions instead of reduce.

If you want an asynchronous state machine, you can use an agent:
(def agt (agent initial-state))
;then each time you get an input-value:
(send agt transition-fn input-value)
If you add a watch to the agent, you can react to state transitions.

Monday, February 23, 2009

rest vs drop

This post has moved, go to its new location
Now that fully lazy sequences are in the trunk, (rest a-seq) and (drop 1 a-seq) aren't equivalent anymore:
user=> (def s (map #(do (println %) %) (range 10)))
#'user/s
user=> (def d (drop 1 s))
#'user/d
user=> (def r (rest s))
0
#'user/r

As one can see rest needs to realize the first element of s while drop doesn't. The corollary is that (drop n s) holds on the whole seq (including the nth first elements) while rest computes the first element and then discards it.

Saturday, January 31, 2009

Shadowing global names

This post has moved, go to its new location
A nice article on Clojure that gives one wrong tip: to use #'var-name to access a shadowed global.
Don't do that! You'd better use namespace-where-defined-or-alias/var-name.
There are several reasons not to use #'var-name:
  • it doesn't work if the value of your var is not invokable,
  • it evaluates to something different: it returns the var an not its value. It happens that vars proxy function calls to their values but a var can be rebound:
    (def a (map str (range 10)))
    (def b (map #'str (range 10)))
    (take 5 a)
    ("0" "1" "2" "3" "4")
    (take 5 b)
    ("0" "1" "2" "3" "4")
    (binding [str identity] (doall a))
    ("0" "1" "2" "3" "4" "5" "6" "7" "8" "9")
    (binding [str identity] (doall b))
    ("0" "1" "2" "3" "4" 5 6 7 8 9)
NB: @#'var-name is better than #'var-name but, really, just use the namespaced symbol.

Tuesday, January 27, 2009

Living on the bleeding edge

This post has moved, go to its new location
You can try clojure libs straight from the cloud:
(add-classpath "http://clojure-contrib.svn.sourceforge.net/viewvc/clojure-contrib/trunk/src/") ; sourceforge
(add-classpath "http://github.com/Lau-of-DK/clojureql/raw/master/src/"); github
(require '[clojure.contrib.duck-streams :as duck])
(require '[dk.bestinclass.clojureql :as ql])

Disclaimer:

  • add-classpath is not meant to be used anywhere but at the repl,
  • you'd better trust what you download,
  • if you are unlucky you can get an inconsistent snapshot,
  • use at your own risk!

Tuesday, January 20, 2009

Bindings and send

This post has moved, go to its new location
This morning on #clojure erohtar asked:
what is the best way to bind vars to a value when sending to an agent?

I don't know if it's the best way but it's mine:
(defmacro capture-and-send
"Capture the current value of the specified vars and rebind
them on the agent thread before executing the action."
[vars agent action & args]
(let [locals (map #(gensym (name %)) vars)]
`(let [~@(interleave locals vars)
action# (fn [& args#]
(binding [~@(interleave vars locals)]
(apply ~action args#)))]
(send ~agent action# ~@args))))
;; usage:
(capture-and-send [*out*] a f b c)

I post it here because erohtar needed it, I needed it once so others may need it.

Monday, January 19, 2009

Enlive: yet another HTML templating library

This post has moved, go to its new location
[UPDATE] I have rewritten Enlive, this posts doesn't work with the actual Enlive.
Enlive is a selector based templating library.
Its main design goal is to decouple html and presentation code, that's why Enlive templates are plain old html files (it will ease roundtripping with designers).
In code, you have to declare where (using CSS-like selectors) and how (using clojure code) to alter the html template.
The resulting templating functions are compiled (the html tree isn't transformed at runtime) and yields a seq of strings.

Here is an example:
(deftemplate microblog-template "net/cgrand/enlive_html/example.html" [title posts]
[:title] title
[:h1] title
[:div.no-msg] (when-not (seq posts) ~(html/show))
[:div.post] (for [{:keys [title body]} posts]
~(at
[:h2] title
[:p] body)))

;; at the repl:
net.cgrand.enlive-html.examples=> (apply str (microblog-template "Hello user!"
[{:title "post #1"
:body "hello with dangerous chars: <>&"}
{:title "post #2"
:body "dolor ipsum"}]))

"<html><head><title>Hello user!</title></head>
<body><h1>Hello user!</h1>
<div class=\"post\"><h2>post #1</h2>
<p>hello with dangerous chars: &lt;&gt;&amp;</p></div>
<div class=\"post\"><h2>post #2</h2>
<p>dolor ipsum</p></div></body></html>"
(NB: manually edited to add linebreaks.)

(Disclaimer: original idea by Ozzilee)

Sunday, January 11, 2009

Functionally growing a tree

This post has moved, go to its new location
I was trying to write a restartable parser in Clojure when it occured to me that I was doing it wrong by not using clojure.zip to build the parse tree.
Update: follow-up

Thursday, January 8, 2009

try-or, or-try, try-else or else-try?

This post has moved, go to its new location
I can't decide which name is best for this macro:
(defmacro try-or
"Evaluates exprs one at a time, from left to right. If a form returns a
value, this value is returned. If a form throws an exception, the next
form is evaluated.
If the last form throws an exception, the exception isn't caught."
([] nil)
([form] form)
([form & forms]
`(try
~form
(catch Exception e#
(try-or ~@forms)))))

Monday, January 5, 2009

Recursive seqs

This post has moved, go to its new location
Recursively defined sequences are pretty but difficult to get right when you don't want to assign the sequence to a var. Here is a couple of macros to ease recursive definitions.
(defmacro rec-cat 
"Similar to lazy-cat but binds the resulting sequence using the supplied
binding-form, allowing recursive expressions. The first collection
expression must not be recursive and must return a non-nil seq."
[binding-form expr & rec-exprs]
`(let [rec-rest# (atom nil)
result# (lazy-cat ~expr (force @rec-rest#))
~binding-form result#]
(swap! rec-rest# (constantly (delay (lazy-cat ~@rec-exprs))))
result#))

(defmacro rec-cons
"Similar to lazy-cons but binds the resulting sequence using the supplied
binding-form, allowing recursive expressions. The first expression must
not be recursive and must return a non-nil seq."
[binding-form expr & rec-exprs]
`(let [rec-rest# (atom nil)
result# (lazy-cons ~expr (force @rec-rest#))
~binding-form result#]
(swap! rec-rest# (constantly (delay (lazy-cat ~@rec-exprs))))
result#))

Examples:

Natural numbers (just like (iterate inc 0)):
(rec-cons naturals 0 (map inc naturals))
fibonnaci sequence:
(rec-cat fibs [0 1] (map + fibs (rest fibs)))
Chouser's cute reduction:
(defn reduction
"Returns a lazy seq of the intermediate values of the reduction (as
per reduce) of coll by f, starting with init."
([f coll]
(if (seq coll)
(rec-cons reductions (first coll) (map f reductions (rest coll)))
(cons (f) nil)))
([f init coll]
(rec-cons reductions init (map f reductions coll))))