WSSCode Blog

A guide to Custom Map Types in Clojure

March 10, 2021

In Clojure, we use maps everywhere. Most of the time, these maps are the standard persistent maps implementation that comes with Clojure, but those maps have a protocol that allows us to define new map types with custom implementations.

In practical terms, it means you can make custom implementations for things like get, find, count, reduce-kv, etc…

What are custom map types?

One common usage for custom map types is to wrap some underlying type that behaves like a map.

The first example that comes to my mind is the cljs-bean project. This library is nice. Using it you can read straight from JS objects using the same map interface we are familiar with from Clojure.

(bean #js {:a 1, :b 2})
=> {:a 1, :b 2}

This library is inspired by the native bean function (that I learned about while looking up things for this post).

(bean (java.util.Date.))
=>
{:day 3,
 :date 10,
 :time 1615387053466,
 :month 2,
 :seconds 33,
 :year 121,
 :class java.util.Date,
 :timezoneOffset 180,
 :hours 11,
 :minutes 37}
note

I did some dig in the sources of bean to understand what does it look for in the classes. It’s interesting to see that it looks for specific method names. It exposes methods that starts with get and is. Examples are: getName, isVolatile, getAddress

Clojure itself already implements this on java.util.HashMap for example:

(get (java.util.HashMap. {"foo" "bar" "baz" "quux"}) "foo")
=> "bar"

Specialised Map-like Structures

What is cool about having the power to give your implementation details is that we can use the map concept and extend it to new ideas.

For example, we can create a map that will deref any value that implements IDeref, the result would be like this:

(def m (my-custom-map {:a 1 :b (atom 2) :c (delay 3)}))

(:a m) ; => 1
(:b m) ; => 2
(:c m) ; => 3

The library Plumbing implements a lazy graph resolution, also using a custom map type.

In Pathom 3, I introduce a new special map type called Smart Maps. This resembles what Plumbing does, but using the Pathom engine for attribute resolution.

To know more about Smart Maps, check the Smart Map documentation page.

How to create a Custom Map Type

The answer to this question is, unfortunately, not trivial.

Depending on which environment you run your code in, the options on how to make a custom map type will vary. Today I’ll cover implementation in Clojure, ClojureScript and Babashka environments. I’ll use CMT to refer to custom map types from now on.

CMT in Clojure

In my opinion, the best way to get a correct map type implementation in Clojure is using the Potemkin library.

Why we need a library for that? I’ll quote it from the Potemkin docs:

A Clojure map implements the following interfaces: clojure.lang.IPersistentCollection, clojure.lang.IPersistentMap, clojure.lang.Counted, clojure.lang.Seqable, clojure.lang.ILookup, clojure.lang.Associative, clojure.lang.IObj, java.lang.Object, java.util.Map, java.util.concurrent.Callable, java.lang.Runnable, and clojure.lang.IFn.

Between them, there’s a few dozen functions, many with overlapping functionality, all of which need to be correctly implemented.

Despite this, there are only six functions which really matter: get, assoc, dissoc, keys, meta, and with-meta. def-map-type is a variant of deftype which, if those six functions are implemented, will look and act like a Clojure map.

Let’s implement our auto-deref map described before using Potemkin:

(ns com.wsscode.ideref-map
  (:require [potemkin.collections :refer [def-map-type]])
  (:import (clojure.lang IDeref)))

(declare deref-map)

(defn auto-deref
  "If value implements IDeref, deref it, otherwise return original."
  [x]
  (if (instance? IDeref x)
    @x
    x))

(def-map-type DerefMapType [m]
  (get [_ k default-value] (auto-deref (get m k default-value)))
  (assoc [_ k v] (deref-map (assoc m k v)))
  (dissoc [_ k] (deref-map (dissoc m k)))
  (keys [_] (keys m))
  (meta [_] (meta m))
  (empty [_] (deref-map {}))
  (with-meta [_ new-meta] (deref-map (with-meta m new-meta))))

(defn deref-map [m]
  (->DerefMapType m))

(def m (deref-map {:a 1 :b (atom 2) :c (delay 3)}))

; reading keys, all IDeref evaluated
[(:a m)
 (:b m)
 (:c m)]
;=> [1 2 3]

; after assoc, it still a DerefMapType
(-> (assoc m :foo (delay "bar"))
    :foo)
; => "bar"

; after dissoc, it still a DerefMapType
(-> (dissoc m :b)
    :c)
; => 3

; empty returns the empty DerefMapType
(-> (assoc (empty m) :foo (delay "bar"))
    :foo)
; => "bar

This is neat! The surface is quite small, and you get a complete custom map. You can also extend other methods if you wish. In Smart Maps I override the behavior of find. To do that, I used the (entryAt [this k]) method. To learn about other signatures like this, I suggest you check the Potemkin collections file.

The hard way

What if you don’t use Potemkin? What a raw custom map implementation looks like? To have this experience, check the lazymap implementation (this is what Plumbing using on its custom maps). It doesn’t look very fun.

note

And note that the lazymap example is not a complete map replacement, no transients, no concurrent interfaces.

Another interesting thing you must consider when making any sort lazy map, is how to handle keys. Clojure doesn’t have any protocol for keys, instead if relies on the ISeq protocol. In the case of Maps, Clojure expects the ISeq result to be a sequence of MapEntry type.

The important thing to notice is that if you do a naive implementation using the standard MapEntry from Clojure, you are going to realize all the values at that moment. This is a bad thing for lazy structures.

An example to illustrate this point:

; a naive custom lazy map
(deftype NaiveLazyMap [m]
  clojure.lang.ILookup
  (valAt [_ k] (auto-deref (get m k)))

  ; we need to implement this for Clojure to detect that this can supports `keys`
  ; keeping it dummy for demo sake
  clojure.lang.IPersistentMap
  (assoc [this _ _] this)

  clojure.lang.ISeq
  (seq [this]
    ; this is the tricky part, because we need a sequence of MapEntry, which the
    ; val of it needs to be derrefed, but we don't want to deref until the user
    ; tries to read it.

    ; doing it the naive way
    (seq
      (map (fn [[k v]] (clojure.lang.MapEntry. k (get this k))) m)))

  ; the keys implementation also requires that this type implements Iterable
  java.lang.Iterable
  (iterator [this]
    ; using iter for demo purposes to make this easy, but I like to point
    ; out the following discussion around this: https://ask.clojure.org/index.php/10303/interop-clojure-pattern-clojure-consider-adding-iter-clojure
    (clojure.lang.RT/iter
      ; same as in seq
      (map (fn [[k v]] (clojure.lang.MapEntry. k (get this k))) m))))

(let [m (->NaiveLazyMap
          {:a (delay (println "A") 1)
           :b (delay (println "B") 2)
           :c 3})]
  (keys m))
; prints A and B, showing they are getting realized, although we never used any value
A
B
=> (:a :b :c)

; note if we try the same using the Potemkin implementation, they don't get realized:
(let [m (deref-map {:a (delay (println "A") 1)
                    :b (delay (println "B") 2)
                    :c 3})]
  (keys m))
; no prints
; => (:a :b :c)

This is why you can see a custom MapEntry definition in lazy-maps implementation. This allows the val part of the MapEntry to be lazy.

note

Potemkin also has its own custom MapEntry, and it behaves correctly in terms of keeping the values lazy.

Using proxy

I recently learned about this one when I found the bean implementation. It looks like this:

(defn proxy-deref-map
  {:added "1.0"}
  [m]
  (proxy [clojure.lang.APersistentMap]
         []
    (iterator []
      (clojure.lang.RT/iter
        ; same as in seq
        (map (fn [[k v]] (clojure.lang.MapEntry. k (auto-deref v))) m)))
    (containsKey [k] (contains? m k))
    (entryAt [k] (when (contains? m k) (clojure.lang.MapEntry/create k
                                         (auto-deref(get m k)))))
    (valAt ([k] (auto-deref (get m k)))
      ([k default] (auto-deref (get m k default))))
    (cons [v] (conj m v))
    (count [] (count m))
    (assoc [k v] (proxy-deref-map (assoc m k v)))
    (without [k] (proxy-deref-map (dissoc m k)))
    (seq [] (map (fn [[k v]] (clojure.lang.MapEntry. k (auto-deref v))) m))))

(let [m (proxy-deref-map
          {:a (delay 1)
           :b (delay 2)
           :c 3})]
  [(:a m)
   (:b m)
   (:c m)])
; => [1 2 3]

This is easier than the manual protocols, but still have the same problem as the NaiveLazyMap regarding the MapEntry evaluation.

CMT in ClojureScript

When you think you have everything figured out, comes Clojurescript, and that changes everything.

Jokes apart, Clojurescript had the hindsight advantage and used that to provide better protocols for the types. It’s good they are better, but that has a portability cost and requires that you learn a different thing to do it in Clojurescript.

For inspiration here, check the Bean implementation from cljs-bean.

Now that you know the names for everything, just go make yours!

The keys situation here is the same as in Clojure, and you also need some sort of custom LazyEntryMap to keep things really lazy.

You can find the Smart Map implementation of CMT at this link.

CMT in Babashka

The first thing I like to point out is that you can’t use deftype in Babashka. The option available is to use reify.

Other than that, the Babashka implementation is the same as in Clojure. Sadly, Potemkin doesn’t work with Babashka, so we have to use the hard way.

For reference on this, here is the code for Smart Maps in Babashka from Pathom 3, check it here.

important

The demos require Babashka v0.2.14 or up. At the time of this post, this version wasn’t released yet. If you are reading much later, it’s probably available. If you are here early, you can download the CI binary (mac, linux) and use it directly.

There was a lot of work involved in getting all of this working, a special shout out to borkdude for being so helpful when I came to bring all sorts of issues around it.

One important difference is how to detect the type of your custom map. Because we have to use reify instead of deftype, we don’t have an actual new type to check for using instance?.

To get around this issue, I created a new protocol just to use in this custom map type, only implement it there, and them use satisfies? to check for it.

EDIT 16/03/2021

A lot have happened to Babashka between the original post and now. Some problems were found. For a moment it seemed we may even have to let it go… But borkdude doesn’t give up so easy!

After a few more days, it came a new solution using proxy instead of reify, and that overcomes the previous challenges!

This also means everything I wrote about custom map types on Babashka don’t work anymore.

The current way is like the proxy version I demoed in the Clojure part. Here is an example of Lazy Map (non Naive, proper handling of lazy map entries) that works in Babashka (version 0.3.0 or up):

(defn auto-deref
  "If value implements IDeref, deref it, otherwise return original."
  [x]
  (if (instance? IDeref x)
    @x
    x))

(defn ->LazyMapEntry
  [key_ val_]
  (proxy [clojure.lang.AMapEntry] []
    (key [] key_)
    (val [] (force val_))
    (getKey [] key_)
    (getValue [] (force val_))))

(defn ->LazyMap [m]
  (proxy [clojure.lang.APersistentMap clojure.lang.IMeta clojure.lang.IObj]
         []
    (valAt
      ([k]
       (auto-deref (get m k)))
      ([k default-value]
       (auto-deref (get m k default-value))))
    (iterator []
      (.iterator ^java.lang.Iterable
        (eduction
          (map #(->LazyMapEntry % (delay (get this %))))
          (keys m))))

    (containsKey [k] (contains? m k))
    (entryAt [k] (if (contains? m k)
                   (->LazyMapEntry k (delay (get this k)))))
    (equiv [other] (= m other))
    (empty [] (->LazyMap (empty m)))
    (count [] (count m))
    (assoc [k v] (->LazyMap (assoc m k v)))
    (without [k] (->LazyMap (dissoc m k)))
    (seq [] (some->> (keys m)
              (map #(->LazyMapEntry % (delay (get this %))))))
    ; a lot of map users expect meta to work
    (meta [] (meta m))
    (withMeta [meta] (->LazyMap (with-meta m meta)))))

You can check that it’s doing the map entries lazy with:

(let [m (->LazyMap
          {:a (delay (println "A") 1)
           :b (delay (println "B") 2)
           :c 3})]
  (keys m))

Doesn’t print A or B, calling seq does:

(let [m (->LazyMap
          {:a (delay (println "A") 1)
           :b (delay (println "B") 2)
           :c 3})]
  (seq m))

As a piece of extra news, this means Smart Maps are fully compatible with Babashka now 🎉!


Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!

Current supporters

And here I like to give a thanks to my current supporters:

Albrecht Schmidt
Alister Lee
Austin Finlinson
Daemian Mack
Jochen Bedersdorfer
Kendall Buchanan
Mark Wardle
Michael Glaesemann
Oleg, Iar, Anton
West