Allow me to weave you a tale of progressively stronger decoupling in Elixir.
I am working on a library that involves access to graph data, informed by a schema or model. Just to get things rolling I started out with a convention of holding graph nodes in a map:
map_graph = %{"node1" =>
%{"class_id" => "class1",
"data" => %{ "testProp" => "value" }}
%{"node2" =>
%{"class_id" => "class1",
"data" => %{ "testProp" => "value2" }}}
node = map_graph["node1"]
value = node["data"]["testProp"]
It’s quick, dirty, and gets the job done for proof-of-concept purposes. Clearly it’s inadequate for other data access patterns we’d like to be able to support. We can write transformation routines against this structure, but they won’t be able to work with anything else.
It’s time for some decoupling.
Behaviors
A Behavior defines an interface or contract that implementing modules must follow. It’s like saying “if you want to be a graph node accessor, you must implement these functions.”
defmodule Graph do
@callback get_node(node :: any()) :: [any()]
end
defmodule Graph.MapImpl do
@behaviour Graph
defstruct [:data]
@impl true
def get_node(data, node_id) do
data[node_id]
end
end
Now we can do
MapImpl.get_node(map_graph, "node")
Now we have some added flexibility to define a completely different implementation that dishes up nodes from the filesystem based on a root path:
def Graph.FilesystemImpl
@impl true
def get_node(%{root: root_path}, node_id) do
full_path = resolve_path(root_path, node_id)
case File.stat(full_path) do
{:ok, %File.Stat{type: :regular}} ->
%{
class: "file",
data: file_data(full_path)
}
{:ok, %File.Stat{type: :directory}} ->
%{
class: "folder",
data: folder_data(full_path)
}
{:error, _} -> nil
end
end
Notice here the data
parameter on get_node/2
serves a completely different purpose, instead providing a pointer to where to access the data. This is the first strange smell about this approach.
Now it’s possible to write methods that work on either the map or the filesystem implementation. But you still need to know which one you’re in, and data
needs to be formed correctly:
MapImpl.get_node(%{"node" => ...}, "node")
FileSystemImpl.get_node(%{"root" => "."}, "node")
You can make things more convenient with a wrapper struct that carries the implementation and the data, and delegates the functions:
defmodule Graph.Instance do
@type t :: %__MODULE__{
impl: module(),
data: any()
}
def get_node(%__MODULE__{impl: impl, data: data}, node_id) do
impl.get_node(data, node_id)
end
Then you can write transformation routines that work on instances, and are really independent of the underlying representation.
Protocols
Protocols provide a different form of decoupling. Here’s the equivalent to the Behavior definition from above:
defprotocol Graph do
def get_node(data, node_id)
end
Then the implementation comes in 2 parts:
defmodule Graph.MapGraph do
defstruct [:nodes]
def new() do
%__MODULE__{
nodes: %{}
}
end
end
defimpl Graph, for: Graph.MapGraph do
@impl true
def get_node(%{nodes: graph}, node_id) do
graph[node_id]
end
end
Now we don’t need to pass around the Instance
wrapper with its impl. Our transforms are written for “anything that can do X” rather than “modules that promise to do X”. The data itself carries the behavior.
graph = Graph.MapGraph.new()
Graph.get_node(graph, "node")
The extra cool part is that since the defimpl
is separate from the defmodule
, you can use protocols to extend existing types. So imagine we have SomeLibrary.DataStructure
that could be interpreted as a graph. With Behaviors we would need to write an entire wrapper module. But with protocols we write a defimpl Graph, for: SomeLibrary.DataStructure
. Then when we pass that structure to some graph routine, the method calls will look exactly the same:
graph = SomeLibrary.DataStructure.new()
Graph.get_node(graph, node)
This is a level of decoupling and flexibility that’s hard to imagine in traditional OOP languages.
Summary
With behaviors, you call GraphImpl.get_node(graph, id)
. With protocols, you call Graph.get_node(graph, id)
regardless of the graph’s actual type.
Behaviors are about modules promising to implement certain functions. The module author decides to follow the behavior contract. Complexity is pushed into the infrastructure.
Protocols are about data types being extended with new capabilities. You can implement a protocol for any existing type, even ones you didn’t create. Complexity is pushed into the type system.
I think there’s a nice categorical perspective as well: behaviors as functors in a slice category. Protocols as natural transformations, with the “legs” provided by the implementation. The naturality condition looks something like
graph |> Graph.get_node("node") == graph.nodes |> Map.get("node")
But I’d need to spend some time with pen and paper to convince myself on this one.