Tying the Knot, Redux

tl;dr: Use `RWS` and feed the `W` to the `R` for great justice.

A few weeks ago, while working on a side project, I went through some epic gyrations with the help of Stack Overflow, trying to figure out a good way to tie a particular kind of “knot” in Haskell. A bit of background first:

In an imperative language, where mutable data structures are common, if you want to build a data structure containing cycles (such as a circular linked list), you typically construct your nodes and then fix up the links after construction. For example:

``````public class Node<T> {
private final T value;
private Node<T> next;

public Node(T value) { this.value = value; }
protected void setNext(Node<T> next) { this.next = next; }
}

Node<Integer> n0 = new Node<Integer>(0);
Node<Integer> n1 = new Node<Integer>(1);
n0.setNext(n1);
n1.setNext(n0);
``````

Fine, this was Java, so you got what you paid for. Let’s say you want to do this in Haskell now, in which data is immutable (unless you do some shenanigans with `STRef`s, for example). You’d declare your data type:

``````data Node a = Node {
value :: a,
next  :: Node a
}
``````

And you can, as if by magic, write:

``````let n0 = Node { value = 0, next = n1 }
n1 = Node { value = 1, next = n0 }
``````

Of course there isn’t any magic occurring here; I don’t believe in magic. There are two behaviors at work:

• First, `let` bindings are recursive, meaning that variables declared and bound within a single `let` block can all reference one another regardless of order. In other words, the variable name `n1` is in scope for the declaration of `n0` (and vice-versa, of course).
• Second, expressions in Haskell are evaluated lazily by default. When we declare `let n0 = Node 0 n1`, it merely promises that when somebody needs the value of `n0`, it will be computed as a node containing the integer 0 and referencing the node `n1`, whatever that ends up being. We know that there is a node named `n1`, because (as discussed above) it was declared within the same `let` binding, but we don’t care about its value until the point when somebody eventually “forces” it.

These are very different from a typical imperative language, in which you’d expect a statement like to construct a node immediately and blow up, because `n1` is “uninitialized” or “null” or some such nonsense. Anyway, this is a simple form of a technique called tying the knot. If you want to declare a bunch of interdependent variables, comprising some sort of recursive data structure, then declare them all in a single `let` binding, make sure nothing prematurely forces their strict evaluation in the scope of the `let`, and you’re golden.

But what if you’re trying to build up a big, possibly cyclic data structure based on some user input? You’re going to be doing some sort of parsing, probably using monadic parser combinators or something similar, and because your “knot” depends on user input, there isn’t any obvious way to structure your code such that the knot is tied in a single `let` binding. This problem was the essence of my thrashing and flailing about on Stack Overflow.

Let’s talk about monads for a minute. I won’t get into much detail on this, as it’s sort of a separate large-ish topic, but I want to introduce the `Reader`, `Writer` and `State` monads for anybody who’s not seen them before:

• The `Reader` monad, sometimes called the `Environment` monad, simply lets you ask for some bit of immutable “global” state in the midst of a computation. For example:

``````printenv :: Show r => Reader r String
printenv = do
return (show env)
``````

This example is simplified in applicative style:

``````printenv :: Show r => Reader r String
``````

But whatever. Question: how is this different from regular function application?

``````printenv :: Show r => r -> String
printenv env = show env
``````

Answer: it’s not. The `map` and `>>=` operations just represent function composition and application. You provide the function’s argument when you run the monad, and that argument is what `ask` gives you. That is all.

• The `Writer` monad sounds intuitively like it should be the opposite of `Reader` somehow, but it’s probably best understood as its own separate thing. So never mind the fact that it has a function called `tell` (the obvious counterpart to `ask`)! `Writer` gives you something like a stream you can output to in the midst of a computation. To be precise, that stream can be any monoid, but just for an intuitive real-world example let’s say it’s a list of strings representing some logging information. For example:

``````log :: Show a => a -> Writer [String] ()
log a = tell [show a]
``````

Show and tell, nice.

• Finally, `Reader` and `Writer` provide a good way of isolating their respective effects from some other stateful monadic computation. We have a number of common monads representing computations that may not be idempotent or referentially transparent, such as `IO` and `ST`, but one of the simplest and most common is `State`. So let’s roll with that.

``````preincrement :: State Int Int
preincrement = do
i <- get
put (i + 1)
return (i + 1)

postincrement :: State Int Int
postincrement = do
i <- get
put (i + 1)
return i
``````

Here we’re emulating the unary `++i` and `i++` operations from languages like C. The `get` and `put` functions are sort of the primitive operations to read and write the `State`. For fun, we can simplify these further using a common helper, `modify`, and applicative style (thanks to Reddit user LordGreyhawk for the comment):

``````preincrement = modify (+1) *> get
postincrement = get <* modify (+1)
``````

Now, `Reader`, `Writer` and `State` are three great tastes that taste so great together, that there’s another monad called `RWS` which simply lumps them into a single type. For example, we can write:

``````type Contrived a = RWS Int [Int] Int a

contrived :: Contrived ()
contrived = do
s0  <- get         -- grab the current state
let s1 = s0 + env  -- increment the current state by the environment value
put s1             -- update the state to the new, incremented value
tell [s1]          -- log the new value
``````

Finally, you probably want to run one of these things eventually, which looks like:

``````runRWS :: RWS r w s a -> r -> s -> (a, s, w)
``````

This is the combination of `runReader`, `runWriter` and `runState`:

``````runReader :: Reader r a -> r -> a     -- consume a reader environment, produce a value
runWriter :: Writer w a -> (a, w)     -- produce a value and some writer output
runState  :: State s a -> s -> (a, s) -- consume an initial state, produce a value and final state
``````

In other words, running an `RWS` computation consumes a reader environment and an initial state, and produces both a value, a final state and some writer output.

Okay, monadic digression time over. This all turns out to be very convenient for knot-tying in general: all we need to do is feed the writer output back into the reader environment. Doing so, exploiting lazy evaluation, makes it so that you can indirectly reference your eventual output before you have produced it! The Ouroboros in question looks like this:

``````type Knot k s a = RWS k k s a

tie :: Knot k s a -> s -> (a, s, k)
tie knot s0 =
let (a, s1, k) = runRWS knot k s0 -- this is fucking interesting
in (a, s1, k)
``````

First, we say that `Knot` is a type synonym for an `RWS` monad, in which the reader and writer types are declared to be the same `k` thing. We haven’t talked yet about what might be a useful concrete type for `k`; we’ll get to that shortly. Next, `tie` takes a `Knot` and some initial state, and just like `runRWS`, it produces a triple containing some output value, the final state, and again one of these `k` things.

So check out the `let` binding: we call `runRWS`, which produces a triple containing a value `k` as its writer output. And what parameter do we pass as the reader environment for `runRWS`? Aha: `k` again. This is how we feed the snake its tail.

Note that if you’re feeling sassy, you can also implement `tie` using the fixed-point combinator:

``````tie :: Knot k s a -> s -> (a, s, k)
tie knot s =
fix \$ \ ~(_, _, k) -> runRWS knot k s -- this is severely fucking interesting
``````

Now, let’s try to figure out an appropriate concrete data type for `k`. There’s no single correct answer for this, it depends on the problem you’re trying to solve, but let’s figure one example out just to get some satisfaction here.

Take our circular linked list example from before. I’ll simplify it just a little bit here, and add a `Show` instance so we can see what happens when we run our stuff:

``````data Node = Node {
value :: Int,
next  :: Node
} deriving Show
``````

Okay, now we want to parse a list of pairs of `Int`s into our structure. For example, we want `[(0, 1), (1, 2), (2, 0)]` to be equivalent to the following code:

``````let n0 = Node { value = 0, next = n1 }
n1 = Node { value = 1, next = n2 }
n2 = Node { value = 2, next = n0 }
``````

Since the second component of each pair is referencing some other node by its list offset, this means we need a `Map` from list offsets to the nodes that will eventually be found there. The `Map` will be our `k` type for this example. We’ll write to this map (which is conveniently a monoid) using `tell`, and query it using `ask`. We’ll also want some state to keep track of what our current offset is and what raw data we have left to parse. Let’s do this:

``````data ParserState = ParserState {
current   :: Int,
remaining :: [(Int, Int)]
}

type NodeParser a = Knot (Map Int Node) ParserState a
``````

So far so good…

``````parse :: NodeParser ()
parse = do
s <- get
case remaining s of
[] -> return ()
((val, ref) : rest) -> do
tell \$ Map.singleton (current s) (Node val \$ m ! ref)
put  \$ ParserState (current s + 1) rest
parse
``````

We get the current state and see how much input we have remaining. If we’re done, we’re done. Otherwise, pull the current node’s value and reference to the next node off of the list. Look the reference up in our map (which we will have constructed by the time we’re done) to find the next node, and write the newly constructed current node to the writer output. Update the state and recurse.

Finally, let’s feed this thing some data:

``````example :: Node
example =
let (_, _, m) = tie parse \$ ParserState 0 [(0, 1), (1, 2), (2, 0)]
in (m Map.! 0)

main :: IO ()
main = putStrLn \$ show example
``````

And we get an infinitely recursive traversal of our three-node, circularly linked list:

``````Node {value = 0, next = Node {value = 1, next = Node {value = 2, next = Node {value = 0, ...
``````

Awesome.

Prologue: Two quick comments as I wrap up here.

1. This was obviously a toy example. It would be much more realistic to be accepting input from some external source such as a file, in which case your input is monadic. The `Knot` monad can be easily adapted into a `KnotT` monad transformer, on top of whatever underlying monad is needed (e.g. `IO`). Just implement it in terms of `RWST` instead of `RWS`.

2. I demonstrated knot-tying here in the context of parsing, but it works just as well in the context of serialization. The `Map` in that case is flipped: you need to look up the offsets at which each `Node` will eventually be stored, before they have necessarily been written there. This requires a bit of care to ensure that you don’t force strict evaluation of these offsets while writing. That also probably means you can’t do any nice output buffering performance tricks, but I’ll just say “whatever” to that for now.