I’ve been known to badmouth object-oriented programming, and I want to formalize a bit of that badmouthing in this post.
Before I start: the actual definition of OOP remains a topic of some controversy, with some people arguing it doesn’t even exist (I disagree), so I’m not going to go down that rathole. I’ll settle here for “I know it when I see it.” Those were the famous words of U.S. Supreme Court Justice Potter Stewart when asked to define hardcore pornography, and I think it’s apropos here because I would also like to talk about obscene coupling.
Anyway, there seems to be some general hand-wavey consensus that “objects” are entities that have properties, behavior and identity:
When we talk about object properties, we usually mean “fields,” as in the stuff that lives in a
struct; more formally, the components of a product type. It’s commonly considered a best practice to regard fields as an “implementation detail,” hiding them away behind behaviors that express how to construct an object and how to accessing and modifying an object’s properties.
When we talk about object behavior, we usually mean “methods,” as in functions or procedures with open recursion. Typically this means that each function takes a handy, implicit
selfreference parameter, to provide a starting point from which to access fields of the receiving object. This is also complicated by dynamic dispatch: the caller of a method needn’t know whether that method is implemented in a superclass or a subclass.
When we talk about object identity, there’s some ambiguity: we might mean “value identity,” where two object references are considered identical if the fields of their referents are pairwise identical; we usually hand-write a method (like
equals) to express that. On the other hand, we might mean “reference identity,” where two object references are considered identical only if they point to the same memory address.
This is pretty standard in the mainstream object-oriented programming languages I’m familiar with. People seem to accept all this as normal, and at least in the Java ecosystem, most commonly-used frameworks require developers to program in this JavaBeans-ish style.
Now put all that aside for a moment. Another thing most engineers at least claim to agree with, and accept as normal, is the proposition that coupling is bad. For whatever definition of “module” you choose—and note that this is not at all specific to OOP—when one module is allowed to explicitly depend on another, that almost inevitably devolves into a ball of mud in which all modules are allowed to depend on all other modules.
The notion that decoupling or “loose coupling” is preferable gives rise to all kinds of awkwardly-stated laws and misguided ideas about what objects should know about what other objects and what things should be objects at all. Despite our best efforts, at the end of the day we end up with the same ball of mud regardless. I think this is because an object itself, as defined above, is already too tightly coupled. We have this long-standing collective cognitive dissonance in which, although we all know that coupling is bad, we’re perfectly happy to accept the idea that data, behavior and identity should all be lumped together.
Let’s go through a simple scenario in Java: say I have a simple class called
Dog (because I like dogs), which maybe has a few properties like name, height and weight… details aren’t important.
I want to log some information about
Dogobjects that are constructed, so I implement a
Dog.toStringmethod that formats the object’s properties as a
String; that’s pretty typical. Note that this introduces an API dependency on
String, although it’s not a big deal, since
Object.toStringare ubiquitous in Java. It’s very unlikely we’d break any client code by changing the API to take different parameters or return a different type.
Now let’s say I’m running a doggy daycare, and I want to print out a list containing the names of all the dogs scheduled for the day. Just the names, though, meaning our
Dog.toStringisn’t what we want. So we need another method,
Dog.getName. Oh, you assumed we already had that? Heh, cute. After implementing that, we can make a new
Dogsutility class with a helper method,
String getNames(List<Dog> dogs). Refactoring for the win.
The thing is, my doggy daycare application I’m building actually needs to be a REST web service, and I need to serialize my
Dogobjects to JSON. I pull in jackson-core as a dependency. Now, I don’t want to expose this to other clients of
Dog, so instead I provide a method
void toJson(OutputStream out)on the assumption that I’ll be writing these things to something like an
The next thing I need to deal with is, I’d like my list of
Dogobjects to be alphabetically sorted by name. So I make
Comparable<Dog>. This should be consistent with
equals, so I implement value identity based on name as well.
Everything finally works, and I decide it’d be nice to provide my code as a library for others to use in their doggy daycare businesses. I publish a binary JAR and the JavaDoc for my API, and suddenly I’m getting emails from users who say they need XML serialization, they need to write to NIO
OutputStream, and they need to sort dogs by their weight. They also don’t want the unnecessary runtime dependency on jackson-core, they want
toStringto format things differently for debugging purposes, and they want value identity to be based on all properties, not just name.
One user also has created a
Puppysubclass, and has a
SortedSet<Puppy>using a custom
Comparator<Puppy>which sorts puppies by age. She can’t pass that
getNamesmethod; she tries to convert it to a
List<Puppy>, which doesn’t work either. I get email telling me I should have used
Iterable<? extends Dog>.
Now I’m in trouble: my simple
Dogs classes are getting bigger and bigger, more tightly coupled to other libraries (including the perhaps innocuous Java standard library), and have to be safe for subclassing. I’m tempted to blame this bloat on my users, since I have to anticipate all of their needs, but it’s really not their fault. The fundamental problem is that every method I write hardcodes some assumption about how the
Dog class should be used, and every one of these assumptions induces a dependency of some sort.
So, if an object or a class is a kind of data type that has not only properties, but also polymorphic behavior and identity as well, then avoiding coupling in an object-oriented codebase must be damn near impossible. Sure, we expect many data types to have necessary dependencies on other constituent types (perhaps defined in other libraries), for example: if my
Dog type has a name property, I’m not going to waste time attempting to somehow abstract away the dependency on
String. But behavior and identity are another story, for example: even if you believe there’s only one sensible way to convert a
Dog to JSON, you certainly wouldn’t expect
Dog to depend on JSON.
Data is just data. Contrary to object-oriented idiom, data isn’t sentient and needn’t be anthropomorphized. I think it’s perfectly legit to have a tiny library that just defines data types—what Martin Fowler calls an “anemic domain model”—with no logic operating on those types at all! If you want to do stuff with that library in your application, that’s awesome, knock yourself out. And if you think I want to do the same kind of stuff with that library, that’s even more awesome, extract a library encapsulating that functionality. I might use it, I might not, or maybe I’ll use it only in certain scenarios (test vs. production, for example). The point is that the clear separation between data and behavior is what permits that choice.