Skip to content

Closures

Matt Bossenbroek edited this page May 7, 2015 · 2 revisions

PigPen supports a number of different types of closures. For example, you can parameterize your functions like this:

(require '[pigpen.core :as pig])

(defn foo [p data]
  (->> data
    (pig/map (fn [x] (* p x)))))

In this example, p is a parameter to our function, but it's also available remotely for use in our map operation. For this to work, p must be an immutable data structure like a map or vector. Compiled functions and mutable structures like atoms won't work.

You can also use let bindings - in this case v:

(defn bar [data]
  (let [v 42]
    (->> data
      (pig/map (fn [x] (* v x))))))

PigPen also supports using functions defined elsewhere in your code:

(defn square [x]
  (* x x))

(defn baz [data]
  (->> data
    (pig/map square)))

You can use them directly or within another function:

(defn baz2 [data]
  (->> data
    (pig/map
      (fn [x]
        (+ 1 (square x))))))

This is all made possibly because the jar PigPen uses is an uberjar including both PigPen and your user code. When run remotely, functions are evaluated within the namespace they were originally defined in, and with the bindings that were present at that time.

This also means that aliases available when generating the script are also available within the remote funciton calls.

(require '[clojure.data.json :as json])

(defn foo [data]
  (->> data
    (pig/map
      (fn [x]
        (json/read-str x :key-fn keyword)))))

To prevent a local parameter from being serialized, mark it with ^:local:

(defn foo-local [^:local p data]
  (->> data
    #_(pig/map (fn [x] (* p x)))  ; this would fail
    (pig/map (fn [x] x))))

Or mark the data as local:

(->> (pig/return [1 2 3])
  (foo-local ^:local {:a 1, :b 2})
  (pig/store-tsv "output.tsv"))

However, there are limitations to this. If a function is already compiled, we can't serialize that and send it to the cluster. All of the core PigPen commands that take user functions also contain a macro that traps the user function before it is evaluated.

What does this mean? You normally can't do this:

(defn foo [f data]
  (pig/map f data))

This doesn't work because f is already compiled by the time we get it. To do something like this, you would need to trap f yourself and call the primitive version of map.

For example:

(require '[pigpen.core.fn :as pig-fn])

(defn foo [f data]
  (pig-fn/map* f data))

(->>
  (pig/return [1 2 3])
  (foo (pig-fn/trap (fn [x] (* x x))))
  (pig/dump))

Because we called pigpen.core.fn/trap on our function, we can now pass it to pigpen.core.fn/map*, which is a version of pigpen.core/map that expects a quoted function.