Clojure and me has moved.

Friday, April 17, 2009

Mapping every second item

This post has moved, go to its new location
I wanted to apply a function to every second item in a coll. I was considering writing something using interleave, take-nth and map or a combination of mapcat and partition when I thought of this:
(map #(%1 %2) (cycle [f identity]) coll)
I really love clojure's map parallel processing. (I should ask if every? and some could be allowed to take several colls.)

3 comments:

michaelr said...

Hey CHRISTOPHE,

I really enjoy reading the examples you post.

Question: Suppose 'coll' is a collection of 10,000 items, this line of code will create another collection using 'cycle' of 10,000 items. Isn't it a bit expensive from performance point of view?

Christophe Grand said...

Hi Michael,

cycle would not create a 10,000-items collection but would lazily realize the first 10,000 conses of an infinite seq (and would not keep a reference to them, making them GCable as soon as they are processed so they have a very short life-span).

I didn't benchmark them but one can count how many transient objects are created (since it's what worries you) by each other approach:

(interleave (map f (take-nth 2 coll)) (take-nth 2 (rest coll))) would create 3*5,000 transient conses (each take-nth + map) -- and I'm not counting seq being called twice on coll.

(mapcat (fn [[a b]] [(f a) b]) (partition 2 coll)) would create 3*5,000 conses (partition yields a seq of 5000 seqs of 2 conses), the map part of mapcat would create 5000 conses and 5000 vectors. The concat part would create 10000 transient conses (seq on each vector)

NB: I used "cons" rather liberally to mean "objects which implement clojure.lang.ISeq".

NB2: A quick pointless benchmark on (range 100000) seems to show that (map #(%1 %2) (cycle [...]) coll) is indeed faster but what I was after when I wrote this post is conciseness.

Christophe Grand said...

Michael,

You can redefine cycle using seq-utils/rec-seq:
(defn my-cycle [coll] (rec-seq c (concat coll c))) which create a cyclic seq of only two conses.

(map #(%1 %2) (my-cycle [...]) coll) is obviously faster: no garbage trumps easily collectable garbage.