Reading Your Future

In the Scala courses I teach, I usually try to spend as much time as I can discussing monads. I usually like to start with Option, showing how you can reduce something like:

if (a != null) {
  val b = somethingThatCouldFail(a)
  if (b != null) {
    val c = somethingElseThatCouldFail(b)
    if (c != null) somethingAwesome(c) else null
  } else null
} else null

to:

optionA flatMap { a =>
  somethingThatCouldFail(a) flatMap { b =>
    somethingElseThatCouldFail(b) map { c =>
      somethingAwesome(c)
    }
  }
}

to:

for {
  a <- optionA
  b <- somethingThatCouldFail(a)
  c <- somethingElseThatCouldFail(b)
} yield somethingAwesome(c)

I try to use that to motivate the general principle that, when you structure your code this way, you disentangle some “pure” behavior from some separate “monadic” behavior. I avoid formalizing what I mean by “pure” and “monadic,” instead waving my hands a little bit and saying that the pure behavior is generally the stuff we care about; in this case:

(somethingThatCouldFail andThen somethingElseThatCouldFail andThen somethingAwesome)(a)

… and the monadic behavior is generally the stuff we want to hide; in this case, the null checks. In other words, we hide the monadic logic, indicating its presence only in the type.

We derive the appropriate map and flatMap methods for Option, and although I find in my courses that this is a pretty decent place to start from, I don’t like leaving it this way. This treatment of the topic alone could easily lead students to assume that monads are solely for error-handling, in the same way that many monad tutorials for Haskell led readers to assume they were solely for IO. So we quickly branch into other practical uses of this design principle.

Two topics that often elicit post-traumatic stress responses from any Java web development refugee are asynchronous tasks and inversion of control. The former is an issue as it relates to concurrency: you might want to kick off a chain of asynchronous tasks, each depending on the previous, all in the context of some ExecutorService. The latter is an issue because you invariably wind up using a dependency injection framework such as Spring, which makes your code tricky to debug. And it happens that we can model both of these as monadic constructs.

So… I might be a bit late to the party, but I just noticed that there’s a very cute similarity between how those two things are modeled. That’s what I want to talk about in this post.


Composing Asynchronous Tasks with Future

Let’s start with Future, which represents a computation that might block or somehow take a while to complete. The type of this computation is really just a function of arity zero:

() => A

So a Future is just a simple class to wrap this function, like

class Future[A](val run: () => A)

You might use this for something like:

import java.net.{ ServerSocket, Socket }

val server: ServerSocket = new ServerSocket(31337)
val futureClient: Future[Socket] = new Future(server.accept)

Okay, but not particularly handy yet. If we simply want to get a client socket, we could just call server.accept directly, instead of our extra level of indirection futureClient.run(). The point of wrapping it up this way is to eventually do something with the client socket, modeling something like a callback: once run completes, producing its A value, we want to immediately do some work f: A => B. In other words, we want to compose the original Future[A] with f to produce a new Future[B].

For example, let’s say that once futureClient eventually produces an actual client socket we can work with, we want to get the number of bytes available for reading from it. Since this blog post is about monads, and since we looked briefly at Option already, we know we want to write something that looks like:

val futureAvailable: Future[Int] =
  futureClient map { _.getInputStream.available }

We implement map by constructing a new Future[B], whose new run function will just run the old one and apply f to the result:

class Future[A](val run: () => A) {
  def map[B](f: A => B): Future[B] =
    new Future(() => f(run()))
}

This is pretty useful, but I’ve sort of cheated here by choosing a function (available) that I know is non-blocking. Let’s be a bit more realistic, and say that we want to actually read a line from the client socket’s input stream, which could indeed block! We want to wrap this blocking call in a Future, just like we wrapped the blocking server.accept:

import java.io.{ BufferedReader, InputStreamReader }

def readLine(client: Socket): Future[String] = {
  val in = new BufferedReader(new InputStreamReader(client.getInputStream))
  new Future(in.readLine)
}

But then we have a problem when we try compose this function with our futureClient using map:

val futureFutureLine: Future[Future[String]] = futureClient.map(readLine)

We only want a Future[String], just like we only wanted a Future[Int] previously. We don’t want to have to say .run().run(). Okay, again, this is still a blog post about monads in Scala, so we know we want our code to look like:

val futureLine: Future[String] = futureClient.flatMap(readLine)

The naïve implementation of flatMap is even simpler than map:

class Future[A](val run: () => A) {
  def map[B](f: A => B): Future[B] =
    new Future(() => f(run()))

  def flatMap[B](f: A => Future[B]): Future[B] =
    f(run())
}

Just run this and apply f! This is algebraically correct — the types line up and the monad laws are satisfied — but it’s totally inappropriate for our purposes because we don’t want to actually run anything yet. The map method didn’t cause our Future[A] to be evaluated, and neither should flatMap. We instead need the following:

class Future[A](val run: () => A) {
  def map[B](f: A => B): Future[B] =
    new Future(() => f(run()))

  def flatMap[B](f: A => Future[B]): Future[B] =
    new Future(() => f(run()).run())
}

So flatMap will produce a new Future[B], whose run function will run the old one and apply f (just like map). Since f produces a Future[B], and we don’t want to end up with a Future[Future[B]], we finally run the result of f. All good.

Now that we have this plumbing complete, we can write silly code like:

for {
  client1 <- new Future(server.accept)
  client2 <- new Future(server.accept)
  line1 <- readLine(client1)
  line2 <- readLine(client2)
} yield line1 == line2

… which is a Future[Boolean] indicating whether or not two clients sent the same message to the server. The important thing that this illustrates, similar to the Option example, is that all the callback-related shenanigans are pretty much hidden from the logic we care about here:

readLine(server.accept) == readLine(server.accept)

Inversion of Control with Reader

Now onto the topic of inversion of control. The general idea in IoC is that we expect that somebody, somewhere, will provide the environment our code depends on, rather than having to obtain it explicitly ourselves. For example, say we require a JDBC Connection to issue some queries. We could either create a Connection ourselves (using something horrible like JNDI, for example):

import java.sql.{ Connection, ResultSet }
import javax.naming.{ Context, InitialContext }
import javax.sql.DataSource

def query(sql: String): ResultSet = {
  val context: Context =
    new InitialContext
  val dataSource: DataSource =
    context.lookup("java:comp/env/jdbc/FooDB").asInstanceOf[DataSource]
  val connection: Connection =
    dataSource.getConnection
  connection.createStatement.executeQuery(sql)
}

Or we could just have it “injected,” which just means that somebody will call this function with the appropriate “dependency”:

def query(sql: String, connection: Connection): ResultSet =
  connection.createStatement.executeQuery(sql)

The latter is obviously cleaner, and allows us to vary DataSource as needed (for example, we’re likely to use a mock data source in unit tests). It’s particularly amenable to a pure functional style, because a computation that produces a value of type A and requires an environment E to be “injected” is really just a function:

E => A

We can make this explicit in our example:

def query(sql: String): Connection => ResultSet =
  _.createStatement.executeQuery(sql)

But it comes at a price: everywhere I need a Connection somewhere in my code, I have to provide it explicitly.

def doSomeQueries(connection: Connection): Int = {
  val foo: ResultSet = query("SELECT COUNT(*) FROM Foo")(connection) // ew!
  val bar: ResultSet = query("SELECT COUNT(*) FROM Bar")(connection) // ewww!!
  val baz: ResultSet = query("SELECT COUNT(*) FROM Baz")(connection) // ewwwww!!!
  foo.next().getInt(1) + bar.next().getInt(1) + baz.next().getInt(1)
}

Threading the connection dependency into doSomeQueries and then into query is messy and brittle. Well, to be fair, everything to do with JDBC is messy and brittle; we just don’t want to make the situation worse. Monads have successfully hidden our messes for us thus far, so let’s treat E => A as a monad as well.

First, as before, we wrap our “dependency injection” E => A in a class:

class Reader[E, A](val run: E => A)

The name “reader” is intended to imply that you can read from some externally-provided environment (with the subtlety that you can only read, not modify the environment in any way). So we could write, for example:

def query(sql: String): Reader[Connection, ResultSet] =
  new Reader(_.createStatement.executeQuery(sql))

This is like a slightly more sophisticated Future[ResultSet]: it lets us reference the query results that we will eventually have, once somebody provides a DataSource to issue the query against. Now, we want to clean up our verbose doSomeQueries in monadic style:

val doSomeQueries: Reader[Connection, Int] = for {
  foo <- query("SELECT COUNT(*) FROM Foo")
  bar <- query("SELECT COUNT(*) FROM Bar")
  baz <- query("SELECT COUNT(*) FROM Baz")
} yield foo.next().getInt(1) + bar.next().getInt(1) + baz.next().getInt(1)

So, as always, we need to provide appropriate map and flatMap implementations:

class Reader[E, A](val run: E => A) {
  def map[B](f: A => B): Reader[E, B] =
    new Reader(env => f(run(env)))

  def flatMap[B](f: A => Reader[E, B]): Reader[E, B] =
    new Reader(env => f(run(env)).run(env))
}

Here, map is simple function composition: to map some f: A => B over a Reader[E, A], we create a new Reader[E, B] whose run function runs the original reader and applies f. The flatMap method is a bit more interesting: we do the same composition as in map, but since f itself produces a Reader[E, B], we have to “inject” the environment there as well. Otherwise we’d end up with a Reader[E, Reader[E, B]]!

tl;dr

The thing I noticed — and again, I’m probably the last person to notice this, but I figured I’d write about it anyway — is that the map and flatMap implementations of Future and Reader look nearly identical:

// map
new Future(() => f(run()))
new Reader(env => f(run(env)))

// flatMap
new Future(() => f(run()).run())
new Reader(env => f(run(env)).run(env))

In other words, we could essentially write:

type Future[A] = Reader[Unit, A]

… and we wouldn’t be too far off. In reality, typical implementations of Future (e.g. in Finagle or Akka) will also deal with issues like failed or timed out computations; Reader doesn’t typically deal with those issues (although hey, it could). Regardless, I think this provides an interesting observation in the “which is cooler: objects or functions” debate:

  • If we model our world as objects, the problem of sequencing long-running tasks doesn’t seem at all related to the problem of avoiding hardcoded dependencies.
  • If we model our world as functions, on the other hand, these two problems have nearly identical solutions.
Tagged , , ,

One thought on “Reading Your Future

  1. […] Rosen (@mergeconflict) makes us think about composition and monads in Reading Your Future. This just might be my highlight of the […]

Comments are closed.