Judging by the activity of marketing hypemen and growth hackers, it seems “Agents” are a Big Important Thing. I can’t scroll 2 posts on social media without being offered a course or a low-code service. As usual, the grifters are trailing the edge by a few years. If you try to pin down a definition, it seems to boil down to “a distributed, fault-tolerant software system… with LLM calls”. So that means we just need to throw Claude on the BEAM. Easy enough! For those willing to read some Elixir code I direct you to a pair of excellent blog posts from 2023:
Why Elixir/OTP doesn’t need an Agent framework: Part 1
Why Elixir/OTP doesn’t need an Agent framework: Part 2
Sadly these are his only 2 articles. He makes a few short arguments and then backs it up with a razor-sharp tutorial demonstrating scalable LLM pipelines with minimal dependencies and highly readable code ready to extend with pattern-matching. Where else but Elixir are you going to find such ergonomic code compiling to such a battle-hardened VM?
I came to a similar conclusion implementing an “agentic” feature for a client app in Elixir, though my code admittedly wasn’t quite as clean. Others in the ecosystem are reaching similar conclusions: in an Elixir dev discussion on the topic, authors of the leading libraries admit struggling with whether their abstractions were worth it at all! When you’re working in a message-passing, actor-based execution model, you really get straight to the meat of what you want your “agent” to do.
In my experience there’s one key to interacting with LLM-based systems, whether in a chatbot or integrated in your app. That is:
Vigilant management of context.
I’ve done some informal over-the-shoulder mentoring of AI usage for several knowledge workers now. So far 100% of them fall into the same anti-patterns. Put up a sticky note: “Yes, really, you need to start a new chat already, this one is ruined.”
With that in mind I want to briefly survey the LLM dev tooling that I find useful, and suggest some ideas of what the next steps might look like.
Instructor -> Runtime Schemas
The only dependency used in the above tutorials is a port of the Instructor
library. The goal of this library is to coerce the LLM response into a structured output. It’s essential for making tasks composable. Why? Because if the output of Task A has a known data model, then it can be used as input to a templated prompt for Task B.
There isn’t really any magic here. Instructor
just pulls a subprocess under the hood. Imagine a short back-and-forth happening at the end of every prompt:
> Here's a JSON Schema. Make sure your response conforms.
> OK, here's my response: ___
> I couldn't parse that into JSON, please try again.
> Oops, here you go: ____
> This doesn't conform to the schema: The "name" attribute should be a string!
> Ooops, here you go: ___
and so on. Of course the initial prompt and the schema itself are the biggest factor on the quality of your outcome, but I think this abstraction really belongs in the core toolkit. Nobody wants to re-implement this in every task.
The shortcoming I’ve found working with this library is that your schemas (effectively) need to be defined at compile time. So imagine you’re doing data pipeline generation and want a workflow where one task defines a schema, and another to adhere to it. It’s a challenge with off-the-shelf tooling (to me, at least - YMMV and please let me know if it does).
I’ve done a proof-of-concept with my Paradigm
library, which is designed for runtime schema manipulation. But I haven’t gotten around to a full Paradigm_Instructor
implementation that handles the iterative coercion steps.
Zed -> Editor agents that don’t munch tokens
I’ve been watching the Zed text editor for a while, and recently made the switch from Sublime as my daily driver. Besides just being a nice modern and performant editor, it has LLM features baked in from the ground up. The two features I really love are
-
The “Text Thread” chat panel. There are shortcuts to add files and selections, and the entire thing just acts as a normal text buffer so you can rewrite history or delete tangents that would otherwise poison your context in a browser-based tool. This alone makes it worth using because (as I mentioned above), hawk-eyed management of context is key to successful use of LLMs.
-
The inline editor in your main text buffer. Basically your cursor (or selection) gives it the scope that it’s allowed to rewrite. Then, your Text Thread contents can be pulled in to provide extra context. To me, this is the perfect LLM workflow. It’s very efficient on tokens, too: I can use it all day long and spend pennies on API calls.
Their recent update leans heavily into Agentic editing. I’ve had mixed results. Sometimes it knocks through something viable, but sometimes it goes down terrible rabbit-holes where it gets nothing done. Either way it burns through tokens like crazy, so I tend to not reach for it. Even with minimal MCP handles, the baseline system prompt puts it at 5-6k tokens just to play ball. And that’s not something you can customize.
I’m looking for a middle ground between these two modes.
- Completely manual context management with scoped edit permissions (my current preferred way to work).
- Token-munching agentic mode with MCP handles for full codebase access and terminal (largely deranged and ineffective)
I’m not going to work on it, but I’ll know it when I see it! One sign I’m watching for is transparency in the MCP bootstrap process. Can we dispense with huge system prompts? Can we give it handles to expand tools, so that some MCP categories never even enter the context? It’s not just a matter of frugality of API calls. The workflow is critical to the quality of the outcome, and the full-scope editor apps are just not there yet.
Honorable Mention
For a really nice take on sandboxing + transparency (fish-tanking?) I have to shout out this claudebox project.