tl;dr: Use
RWSand feed theWto theRfor great justice.
A few weeks ago, while working on a side project, I went through some epic gyrations with the help of Stack Overflow, trying to figure out a good way to tie a particular kind of “knot” in Haskell. A bit of background first:
In an imperative language, where mutable data structures are common, if you want to build a data structure containing cycles (such as a circular linked list), you typically construct your nodes and then fix up the links after construction. For example:
public class Node<T> {
private final T value;
private Node<T> next;
public Node(T value) { this.value = value; }
protected void setNext(Node<T> next) { this.next = next; }
}
Node<Integer> n0 = new Node<Integer>(0);
Node<Integer> n1 = new Node<Integer>(1);
n0.setNext(n1);
n1.setNext(n0);
Fine, this was Java, so you got what you paid for. Let’s say you want to do this in Haskell now, in which data is immutable (unless you do some shenanigans with STRefs, for example). You’d declare your data type:
data Node a = Node {
value :: a,
next :: Node a
}
And you can, as if by magic, write:
let n0 = Node { value = 0, next = n1 }
n1 = Node { value = 1, next = n0 }
Of course there isn’t any magic occurring here; I don’t believe in magic. There are two behaviors at work:
- First,
letbindings are recursive, meaning that variables declared and bound within a singleletblock can all reference one another regardless of order. In other words, the variable namen1is in scope for the declaration ofn0(and vice-versa, of course). - Second, expressions in Haskell are evaluated lazily by default. When we declare
let n0 = Node 0 n1, it merely promises that when somebody needs the value ofn0, it will be computed as a node containing the integer 0 and referencing the noden1, whatever that ends up being. We know that there is a node namedn1, because (as discussed above) it was declared within the sameletbinding, but we don’t care about its value until the point when somebody eventually “forces” it.
These are very different from a typical imperative language, in which you’d expect a statement like to construct a node immediately and blow up, because n1 is “uninitialized” or “null” or some such nonsense. Anyway, this is a simple form of a technique called tying the knot. If you want to declare a bunch of interdependent variables, comprising some sort of recursive data structure, then declare them all in a single let binding, make sure nothing prematurely forces their strict evaluation in the scope of the let, and you’re golden.
But what if you’re trying to build up a big, possibly cyclic data structure based on some user input? You’re going to be doing some sort of parsing, probably using monadic parser combinators or something similar, and because your “knot” depends on user input, there isn’t any obvious way to structure your code such that the knot is tied in a single let binding. This problem was the essence of my thrashing and flailing about on Stack Overflow.
Let’s talk about monads for a minute. I won’t get into much detail on this, as it’s sort of a separate large-ish topic, but I want to introduce the Reader, Writer and State monads for anybody who’s not seen them before:
-
The
Readermonad, sometimes called theEnvironmentmonad, simply lets you ask for some bit of immutable “global” state in the midst of a computation. For example:printenv :: Show r => Reader r String printenv = do env <- ask return (show env)This example is simplified in applicative style:
printenv :: Show r => Reader r String printenv = show <$> askBut whatever. Question: how is this different from regular function application?
printenv :: Show r => r -> String printenv env = show envAnswer: it’s not. The
mapand>>=operations just represent function composition and application. You provide the function’s argument when you run the monad, and that argument is whataskgives you. That is all. -
The
Writermonad sounds intuitively like it should be the opposite ofReadersomehow, but it’s probably best understood as its own separate thing. So never mind the fact that it has a function calledtell(the obvious counterpart toask)!Writergives you something like a stream you can output to in the midst of a computation. To be precise, that stream can be any monoid, but just for an intuitive real-world example let’s say it’s a list of strings representing some logging information. For example:log :: Show a => a -> Writer [String] () log a = tell [show a]Show and tell, nice.
-
Finally,
ReaderandWriterprovide a good way of isolating their respective effects from some other stateful monadic computation. We have a number of common monads representing computations that may not be idempotent or referentially transparent, such asIOandST, but one of the simplest and most common isState. So let’s roll with that.preincrement :: State Int Int preincrement = do i <- get put (i + 1) return (i + 1) postincrement :: State Int Int postincrement = do i <- get put (i + 1) return iHere we’re emulating the unary
++iandi++operations from languages like C. Thegetandputfunctions are sort of the primitive operations to read and write theState. For fun, we can simplify these further using a common helper,modify, and applicative style (thanks to Reddit user LordGreyhawk for the comment):preincrement = modify (+1) *> get postincrement = get <* modify (+1)
Now, Reader, Writer and State are three great tastes that taste so great together, that there’s another monad called RWS which simply lumps them into a single type. For example, we can write:
type Contrived a = RWS Int [Int] Int a
contrived :: Contrived ()
contrived = do
env <- ask -- read from the fixed environment
s0 <- get -- grab the current state
let s1 = s0 + env -- increment the current state by the environment value
put s1 -- update the state to the new, incremented value
tell [s1] -- log the new value
Finally, you probably want to run one of these things eventually, which looks like:
runRWS :: RWS r w s a -> r -> s -> (a, s, w)
This is the combination of runReader, runWriter and runState:
runReader :: Reader r a -> r -> a -- consume a reader environment, produce a value
runWriter :: Writer w a -> (a, w) -- produce a value and some writer output
runState :: State s a -> s -> (a, s) -- consume an initial state, produce a value and final state
In other words, running an RWS computation consumes a reader environment and an initial state, and produces both a value, a final state and some writer output.
Okay, monadic digression time over. This all turns out to be very convenient for knot-tying in general: all we need to do is feed the writer output back into the reader environment. Doing so, exploiting lazy evaluation, makes it so that you can indirectly reference your eventual output before you have produced it! The Ouroboros in question looks like this:
type Knot k s a = RWS k k s a
tie :: Knot k s a -> s -> (a, s, k)
tie knot s0 =
let (a, s1, k) = runRWS knot k s0 -- this is fucking interesting
in (a, s1, k)
First, we say that Knot is a type synonym for an RWS monad, in which the reader and writer types are declared to be the same k thing. We haven’t talked yet about what might be a useful concrete type for k; we’ll get to that shortly. Next, tie takes a Knot and some initial state, and just like runRWS, it produces a triple containing some output value, the final state, and again one of these k things.
So check out the let binding: we call runRWS, which produces a triple containing a value k as its writer output. And what parameter do we pass as the reader environment for runRWS? Aha: k again. This is how we feed the snake its tail.
Note that if you’re feeling sassy, you can also implement tie using the fixed-point combinator:
tie :: Knot k s a -> s -> (a, s, k)
tie knot s =
fix $ \ ~(_, _, k) -> runRWS knot k s -- this is severely fucking interesting
Now, let’s try to figure out an appropriate concrete data type for k. There’s no single correct answer for this, it depends on the problem you’re trying to solve, but let’s figure one example out just to get some satisfaction here.
Take our circular linked list example from before. I’ll simplify it just a little bit here, and add a Show instance so we can see what happens when we run our stuff:
data Node = Node {
value :: Int,
next :: Node
} deriving Show
Okay, now we want to parse a list of pairs of Ints into our structure. For example, we want [(0, 1), (1, 2), (2, 0)] to be equivalent to the following code:
let n0 = Node { value = 0, next = n1 }
n1 = Node { value = 1, next = n2 }
n2 = Node { value = 2, next = n0 }
Since the second component of each pair is referencing some other node by its list offset, this means we need a Map from list offsets to the nodes that will eventually be found there. The Map will be our k type for this example. We’ll write to this map (which is conveniently a monoid) using tell, and query it using ask. We’ll also want some state to keep track of what our current offset is and what raw data we have left to parse. Let’s do this:
data ParserState = ParserState {
current :: Int,
remaining :: [(Int, Int)]
}
type NodeParser a = Knot (Map Int Node) ParserState a
So far so good…
parse :: NodeParser ()
parse = do
s <- get
case remaining s of
[] -> return ()
((val, ref) : rest) -> do
m <- ask
tell $ Map.singleton (current s) (Node val $ m ! ref)
put $ ParserState (current s + 1) rest
parse
We get the current state and see how much input we have remaining. If we’re done, we’re done. Otherwise, pull the current node’s value and reference to the next node off of the list. Look the reference up in our map (which we will have constructed by the time we’re done) to find the next node, and write the newly constructed current node to the writer output. Update the state and recurse.
Finally, let’s feed this thing some data:
example :: Node
example =
let (_, _, m) = tie parse $ ParserState 0 [(0, 1), (1, 2), (2, 0)]
in (m Map.! 0)
main :: IO ()
main = putStrLn $ show example
And we get an infinitely recursive traversal of our three-node, circularly linked list:
Node {value = 0, next = Node {value = 1, next = Node {value = 2, next = Node {value = 0, ...
Awesome.
Prologue: Two quick comments as I wrap up here.
-
This was obviously a toy example. It would be much more realistic to be accepting input from some external source such as a file, in which case your input is monadic. The
Knotmonad can be easily adapted into aKnotTmonad transformer, on top of whatever underlying monad is needed (e.g.IO). Just implement it in terms ofRWSTinstead ofRWS. -
I demonstrated knot-tying here in the context of parsing, but it works just as well in the context of serialization. The
Mapin that case is flipped: you need to look up the offsets at which eachNodewill eventually be stored, before they have necessarily been written there. This requires a bit of care to ensure that you don’t force strict evaluation of these offsets while writing. That also probably means you can’t do any nice output buffering performance tricks, but I’ll just say “whatever” to that for now.
