How We Code: ORMs and Anemic Domain Models
Many esteemed books on code architecture were written by developers with years of experience in strong OOP languages, such as Smalltalk and Java. These languages grew up without a lot of the framework tooling we have today. In fact, these same authors and developers created most of the development practices we use today.
Many frameworks in our more "modern" languages were born from the lessons taught by these authors. However, these frameworks were often complex. While they provided many useful tools, they left it up to the developer to know how to implement them in an organized fashion. There was a need for something easier! Luckily (for all!) the enterprising people of 37 Signals filled that need.
Rails revolutionized web development. It focused on Rapid Application Development (RAD) by introducing opinions on how to solve common application problems. These opinions restricted our implementation choices, allowing us to focus on functionality instead of implementation.
Seeing the touted success of Rails, web frameworks in other languages quickly followed suit. Developers with deadlines and things to do (all of us) clung to these frameworks for dear life.
However, in this jump to RAD-oriented frameworks, we lost sight (or willingly ignored) many lessons that the old-guard had painstakingly learned!
We code very much within our tools, as they are excellent tools. However, our RAD frameworks tend to become our applications, instead of simply tools to implement our applications. For example, we often preach "skinny controller, fat model", but then cram business logic into our service (and persistence) layers instead of into models.
In my study of code architecture, I've come to notice that one thing in particular drives how (and where!) we code our business logic: Our Active Record ORMs. These powerful, time-saving models are also the cause of much friction within our applications.
In this article, I'm going to cover an example of implementing some business logic in our usual Active Record patterns, and then I'll show how the same logic can be applied using the Data Mapper pattern. In both cases, we'll see how we can move our business logic into our business entities, thus avoiding anemic business domain. We'll also cover pros and cons of the two styles of ORMs and how they affect the way we write our code.
The Rules
What we'll be making is a (suprise!) pretend Customer Support system. In this system, there are Tickets created when a new Message is sent by a customer looking for support.
Some rules:
- All Tickets are assigned a Category.
- A Ticket is assigned a Staffer at some point (not on creation).
- A Staffer has Categories of Tickets they are responsible for.
- A Staffer can only be assigned to a Ticket if they are allowed to handle the Category under which the Ticket is assigned.
- If a Ticket has a Status of "closed", any newly received Message will re-open the ticket.
Two Ways
As mentioned, we'll look at this problem on two fronts: First, in context of our "usual" way of using an Active Record style ORM. Then we'll look at moving our business logic into domain entities, and see how using a Data Mapper style ORM can aid in separating business logic from or data persistence.
We'll follow the patterns that we commonly see when coding between the two styles.
Active Record
First, let's make some models. Since we're using some sort of Active Record ORM, we usually don't have to define our fields and data - The Active Record implementation takes care of that for us by reading the schema of our database tables. We can just tell it what tables to use and the table relationships.
A Ticket:
<?php namespace Help;
use Some\Package\SomeOrm;
class Ticket extends SomeOrm {
$table = 'tickets';
$hasMany = 'Message';
$hasOne = array('Staffer', 'Category');
}
A Message:
<?php namespace Help;
use Some\Package\SomeOrm;
class Message extends SomeOrm {
$table = 'messages';
}
A Staffer:
<?php namespace Help;
use Some\Package\SomeOrm;
class Staffer extends SomeOrm {
$table = 'staffers';
$hasMany = 'Category'
}
(I've omitted a Category class for brevity. It's just a String for our use case).
So we have our models here, and we have some business logic to figure out. Let's review our rules as defined above. First, a ticket is assigned a category:
$ticket = new Ticket;
$ticket->subject = 'Some Subject';
$ticket->dateOpened = new \DateTime;
// A facade, or whatever :P
$category = Category::find('name', 'Default');
$ticket->category = $category->id;
$ticketId = $ticket->save();
Later, we need to assign a Staffer to that ticket.
$ticket = Ticket::find($ticketId);
$staffer = Staffer::find($stafferId)
// Assign staffer, but only if staffer has the
// Ticket's category available
foreach( $staffer->categories as $category )
{
if( $ticket->category == $category )
{
$ticket->staffer = $staffer;
}
}
if( is_null($ticket->staffer) )
{
// Validation error handling
// Staffer not assigned
}
So we can see we have some models, and we've implemented some logic. Let's examine what's going on here.
Where is the logic?
I've written the above business logic out of context; We can see that it's not directly in our models. So where do we put this logic? Often, we start by putting this directly in a controller. These leads to the long, terrible controllers we've all read about. It's hardly maintainable in the long run, as we likely will duplicate some of these operations throughout our code base if we keep business logic in our controllers. Any changes to our logic will trickle down to changes in many locations.
It's also worth noting that any Controller code run is the result of routing an HTTP request to code. It's really still part of the HTTP request cycle, which isn't necessarily something our application cares about. Our application just wants to know there was a request to do something. Whether it's an HTTP request or not shouldn't matter.
So after we decide to ditch a controller, we have to wonder where this logic can go. There's a lot of conflicting things said about this (I've written some of it myself). Some of us create a class for "forms", which are essentially "commands" for our application. These Form (Command) classes can take input and run business logic and other operations before persisting the results to a database.
Others might (also) create Repository classes and choose to put some of this logic in there. A Repository, as described originally by the Smalltalk/Java crew, is used to retrieve and save existing business "entities" to a database. If we use a Repository with an Active Record ORM, we create a useful abstraction, but we still blur separation of concerns and the single responsibility principle by adding business logic to the Repository who's responsibility should simply be data persistence.
When using a Repository with an Active Record ORM, we've been using Repositories in a different way than originally conceived. With Active Record, the Repository becomes an abstraction layer between our Active Record ORMs and the rest of our application. While useful in terms of a Strategy Pattern for ensuring behavior is present consistently throughout an application, the ORM is still ultimately responsible for persistence, rather than the Repository.
If a repositories purpose is data persistence, should we also put domain logic in there? Should a repository know that adding a new Message to a closed Ticket should also re-open the Ticket? That seems at odds with the notion of "Single Responsibility"!
So, we're still on the hunt for the "perfect" place for this business logic.
Let's add our business logic to the Model itself. This seems to make the most sense, as the Ticket should probably know about its own business logic - it's "single responsibility" is to itself! Perhaps we can add a "addMessage()" method which updates the status if needed! We can also create an "assignStaffer()" method which only lets a staffer be assigned if the Staffer has the right Category available.
This sounds great. Our business logic is encapsulated in one place, and we are making our model less anemic - it has business logic in it instead of simply being a wrapper for Active Record persistence logic.
With that in mind, let's take a critical look at the situation.
The Active Record pattern doesn't have a single responsibility. It's still ultimately responsible for its own persistence, and now on top of that, business logic.
Additionaly, we are still using our Models in many places in our application, places that shouldn't all have access to objects with the ability to change our databases. Effectively, we "leak" persistence logic for use by any class whenever our Repository classes return an Active Record ORM (even down into our Views!).
What this means is that while we have a great setup for when we're using Active Record ORMs (this article is certainly not advocating against them), we are left trusting other code (other developers) to not mess this all up.
How might it get messed up?
The business logic inside of our model can (in most cases) easily be circumvented. For example, while we have a assignStaffer()
method in our Ticket model, we still leave open the possibility that you (when on a deadline) or another programmer (who may not know the code base) could code something like $ticket->staffer = Staffer::find($someId); $ticket->save();
, thus ignoring our business logic constraints regarding Categories and Staffers.
So, what have we ended up with? Well, in the worst case, we have terrible, fat controllers. Our code is harder to maintain and likely not DRY. In the best case, we have better usage of business logic, in less places in our code. However, we're still relying on conventions. Our own code can circumvent required business logic! It's better, and depending on your needs, may be just fine! Many of us can stop there and be happy.
However, lets see how we can take this a step farther by choosing the Data Mapper pattern over Active Record.
Data Mapper
In the world of Data Mapping, we can start with "plain old objects" to model our business logic. Then we can create an "Entity Mapping" repository, which maps database data into our Entity objects. The Entities themselves don't have any persistence logic. Instead, the Repository is responsible for parsing out the data from these entities and performing the needed CRUD on the database.
Looking in code, this means we go from something like this:
$ticket = Ticket::find($id);
// Or, if we have a repository:
$ticket = $ticketRepo->findById($id);
// Some Operations on the Active Record model
$ticket->save();
To someting like this:
$ticket = $ticketRepo->find($id);
// Some operations on the Entity
$ticketRepo->save($ticket);
Note in the Active Record example, we didn't use a Repository to persist the data. Technically, we could have, although at best the Repository would hide this operation (calling the save()
method on the Active Record class), rather than do it itself (Not that there's anything wrong with that.).
If, however, our Repositories are truly solely responsible for handing persistence, that leaves our entities solely responsible for their business logic. Let's see what our Entities could then look like.
Here's our Ticket Class now:
<?php namespace Help;
class Ticket {
/**
* Subject
* @var string
*/
protected $subject;
/**
* Messages in Ticket
* Collection of Messages
* @var Array
*/
protected $messages = array();
/**
* Ticket Status
* @var string
*/
protected $status;
/**
* Staffer assigned to Ticket
* @var Staffer
*/
protected $staffer;
/**
* Category Assigned to Ticket
* @var String
*/
protected $category;
/**
* Create a new Ticket
* @param string $subject
* @param DateTime $dateOpened
* @param Customer $customer
* @param Array $messages
*/
public function __construct($subject, $status="open", $category, $messages=array())
{
$this->subject = $subject;
$this->updateStatus($status); // Use your setters!
$this->category = $category;
foreach( $messages as $message )
{
$this->addMessage($message);
}
}
/**
* Update the Ticket Status
* @param string $status
* @throws \InvalidArgumentException
* @return void
*/
public function updateStatus($status)
{
$status = mb_strtolower($status);
if( $status !== 'open' || $status !== 'closed' )
{
throw new \InvalidArgumentException('Illegal status given: '.$status);
}
$this->status = $status;
}
/**
* Add a Message to the Ticket
* @param Message $message
*/
public function addMessage(Message $message)
{
// Re-open on a new Message
// if Ticket is closed
if( $this->status === 'closed' )
{
$this->status = 'open';
}
$this->messages[] = $message;
}
/**
* Assign Ticket to Staffer
* @param Staffer $staffer
* @throws \DomainException
* @return void
*/
public function assignStaffer(Staffer $staffer)
{
if( ! $staffer->isAvailableFor($this->category) )
{
throw new \DomainException('Staffer cannot be assigned to a Ticket of category '.$this->category);
}
$this->staffer = $staffer;
}
/**
* Get Messages
* @return Array
*/
public function getMessages()
{
return $this->messages;
}
}
That's much more code in our models! You might also call it "less anemic" or "more rich".
Then we have our (simpler) Message and Staffer classes:
# File Message.php
<?php namespace Help;
class Message {
/**
* Date Message Received
* @var \DateTime
*/
protected $dateReceived;
/**
* Message
* @var String
*/
protected $message;
/**
* Create a new Message
* @param DateTime $dateReceived
* @param String $message
*/
public function __construct(\DateTime $dateReceived, $message)
{
$this->dateReceived = $dateReceived;
$this->message = $message;
}
}
# File Staffer.php
<?php namespace Help;
class Staffer {
/**
* Staffer email
* @var string
*/
protected $email;
/**
* Categories Staffer
* is assigned To
* @var Array
*/
protected $availableCategories;
public function __construct($email, Array $availableCategories)
{
$this->email = $email;
$this->availableCategories = $availableCategories;
}
/**
* Test if Staffer has the given Category
* available to them.
*
* @param String
* @return Boolean
*/
public function isAvailableFor($category)
{
return in_array($category, $this->availableCategories);
}
}
As we can see, our Ticket class is much richer. It has its attributes and methods clearly defined. The only way we can interact with these entities is through the entities themselves. Unlike our Active Record classes, we can't go to the persistence layer directly and make changes to our data. Sloppy development is not able to circumvent our business logic so easily.
Sidenote: I often keep attributes protected. I'll use a Trait which will implement the
__get()
magic method. This leaves it up to the code to set the attributes as appropriate to the business logic, thus fulfilling most use cases in my experience.
What we also gain here is clarity in our code. Similar to how an opinionated framework can make our lives easier by giving us less choices, domain entities make interactions much clearer by letting us know the few ways we can interact with our domain.
Also note that our business logic sits squarely within our domain entities. It does not leak out into a service layer (into validation, repositories, or command/form classes) nor a HTTP layer (controllers).
This gives us a very clear separation of concerns. Our business entities make sure our data is correct and follows the rules. Our Repositories can persist data, and convert that data to business entities. Our service/application layer can be responsible for orchestrating the interaction between a request and the enitities, without having to know the business logic behind the scenes (other than to be coded to react to responses such as thrown exceptions).
Think about testing as well. Since these are plain old (PHP) objects, we can create tests more simply. We dont even need to mock the dependencies, since the dependencies are themselves likely other domain entities (plain old objects).
Note what we don't have here. We don't inject validation classes or other items into our domain entities. While some services such as validation may be necessary in our application layer, the domain layer can be free and clear of those - unless certain validation rules are absolutely core to the business domain.
Determining what goes into your domain entities can be a bit subjective. It's the subject of "Domain Drive Design", which we are touching on, but not directly speaking to in this article.
There are also some more subtle differences here as well. For example, we are treating the Ticket entity as an "aggregate", meaning it contains other entities (a Staffer and Messages). More interestingly, a Ticket in this case can also be a "root aggregate". It is the entity from which you can access and modify the Staffer or the Messages associated with the Ticket.
Rather than allow direct access to a Ticket's Staffer, you instead go through a Ticket. To clarify - there's no code to do something like:
$staffer->assignedToTicket($ticket);
Instead, assigning a Staffer to a Ticket is done only within the Ticket itself. Similar to how Rails increased productivity by reducing our choices, we've made our business domain clearer and more useful to ourselves and other developers.
Conclusion
This is not a plea to stop the use of Active Record ORMs. Active Record's success is shown by its use in most of our most beloved frameworks (Rails, DJango, Laravel, SailsJS, many others). Its easy and convenient. A good ORM and it's ability to handle relationships can cover most of our needs in a clear fashion.
However, when our applications grow larger (or have a long shelf-life), it's good to know that there are other patterns available.
Keeping business logic in a less anemic Active Record model has potential pitfalls. However, those downsides are mostly "soft" issues, such as ensuring developers know to not accidently circumvent established business logic.
On the other hand, using a Data Mapper can give you a clear path to developing a rich, complex domain model while keeping clarity in code.
If you're a PHP developer looking for a Data Mapper, I suggest looking at (really the only game in town, for now) Doctrine2.