Friday, April 17, 2009

Mapping every second item

I wanted to apply a function to every second item in a coll. I was considering writing something using interleave, take-nth and map or a combination of mapcat and partition when I thought of this:
(map #(%1 %2) (cycle [f identity]) coll)
I really love clojure's map parallel processing. (I should ask if every? and some could be allowed to take several colls.)

3 comments:

  1. Hey CHRISTOPHE,

    I really enjoy reading the examples you post.

    Question: Suppose 'coll' is a collection of 10,000 items, this line of code will create another collection using 'cycle' of 10,000 items. Isn't it a bit expensive from performance point of view?

    ReplyDelete
  2. Hi Michael,

    cycle would not create a 10,000-items collection but would lazily realize the first 10,000 conses of an infinite seq (and would not keep a reference to them, making them GCable as soon as they are processed so they have a very short life-span).

    I didn't benchmark them but one can count how many transient objects are created (since it's what worries you) by each other approach:

    (interleave (map f (take-nth 2 coll)) (take-nth 2 (rest coll))) would create 3*5,000 transient conses (each take-nth + map) -- and I'm not counting seq being called twice on coll.

    (mapcat (fn [[a b]] [(f a) b]) (partition 2 coll)) would create 3*5,000 conses (partition yields a seq of 5000 seqs of 2 conses), the map part of mapcat would create 5000 conses and 5000 vectors. The concat part would create 10000 transient conses (seq on each vector)

    NB: I used "cons" rather liberally to mean "objects which implement clojure.lang.ISeq".

    NB2: A quick pointless benchmark on (range 100000) seems to show that (map #(%1 %2) (cycle [...]) coll) is indeed faster but what I was after when I wrote this post is conciseness.

    ReplyDelete
  3. Michael,

    You can redefine cycle using seq-utils/rec-seq:
    (defn my-cycle [coll] (rec-seq c (concat coll c))) which create a cyclic seq of only two conses.

    (map #(%1 %2) (my-cycle [...]) coll) is obviously faster: no garbage trumps easily collectable garbage.

    ReplyDelete

Note: Only a member of this blog may post a comment.