← back to articles

Stop Writing Agents as while true. A Typed State Machine Approach

Why you shouldn't trust Langchain or other agent orchestration frameworks for irreversible operations

I built a travel agent last week. Two tools: get_weather (read-only, no side effects) and book_flight (irreversible, charges $800). I changed one line in the system prompt and the agent booked a flight to Tokyo without asking anyone.

[STATE: ToolCalling] book_flight
  [EXEC] book_flight({"city":"Tokyo","date":"2023-10-07"}) — call #1, total cost: $0.01
  [DANGER] Booking flight... this is IRREVERSIBLE
[STATE: Observation] {"confirmation":"BOOK-7WTU7B","status":"confirmed"}

[AGENT] Your flight to Tokyo has been successfully booked!

These are simulated tools with fake data. No real flight was booked. But nothing in the code would have prevented it with real APIs. The while true loop treats get_weather and book_flight the same way: a function that takes a string and returns a string. The only thing between the user’s money and the LLM’s decision was a sentence in the system prompt.

LangChain’s AgentExecutor doesn’t work in production. Not because the code is bad. Because the abstraction is wrong. When your entire safety model is “the LLM will follow the system prompt,” you don’t have a safety model.

I rebuilt the solution as a typed state machine in Rust where invalid transitions don’t compile. This is what I found.

The Architecture Every Framework Shares

LangChain, CrewAI, AutoGen. Simplified from LangChain’s actual source:

while True:
    output = llm.plan(input, intermediate_steps)
    if isinstance(output, AgentFinish):
        return output.return_values
    observation = tools[output.tool].run(output.tool_input)
    intermediate_steps.append((output, observation))

LLM decides. Code executes. Result goes back. Repeat. The “state” is a growing list of tuples. There’s no Retrying. No AwaitApproval. No Failed. The loop doesn’t know what state it’s in.

Breaking It

I built three scenarios and broke each one.

The prompt flip

Same agent. Same tools. Same code. System prompt says “be proactive”: flight booked instantly. System prompt says “always check weather first”: flight not booked. The entire safety model is a natural language instruction that the LLM can ignore at any time.

$1,400 on incomplete data

I built a multi-agent pipeline. A Researcher checks weather in three cities, passes findings to a Booker that books the trip. Two of three API calls returned 503. The Researcher recommended São Paulo, the city it had no weather data for, because it was the cheapest. The Booker received this as text and executed:

  [BOOKER] Calling: book_flight
    [EXEC] book_flight({"city":"São Paulo","date":"2023-10-30"})
    [DANGER] IRREVERSIBLE: booking flight...
  [BOOKER] Calling: book_hotel
    [EXEC] book_hotel({"city":"São Paulo","checkin":"2023-10-30","nights":5})
    [DANGER] IRREVERSIBLE: booking hotel...

Flight: $800 | Hotel: $600 | Total: $1400

No validation between agents. No schema for the handoff. The Researcher even wrote “please confirm if you’d like to proceed” in its output. Nobody read it. The orchestrator passed the text to the Booker as an instruction.

Retry on vibes

get_weather fails with a 503. The error becomes a string: "Error: 503 Service Unavailable". Goes back to the LLM. The LLM decided to stop and ask the user what to do. On a different run, it might retry 15 times without waiting. No backoff. No limit per tool. No circuit breaker. Retry policy is whatever the LLM feels like doing that particular inference.

Five Problems, One Root Cause

All five come from the same place: state is implicit.

  1. Read and write tools are identical. get_weather and book_flight pass through the same code path.
  2. Retry is decided by the LLM. No backoff, no limit, no circuit breaker.
  3. Partial data propagates. Nothing validates completeness between pipeline steps.
  4. Budget is max_iterations. Each iteration can trigger multiple parallel tool calls. Actual cost is unpredictable.
  5. State is invisible. To know what the agent is doing, parse the entire message history and guess.

Typed State Machines in Rust

Each state is a type. Transitions are functions that consume one type and produce another. Invalid transitions don’t compile.

struct Idle;
struct Planning;
struct ToolCalling { retry_context: Option<RetryContext> }
struct AwaitApproval { tool: ToolCall }
struct Observing { result: ToolResult }
struct Retrying { attempts: u32, max_attempts: u32, last_error: String }
struct Responding { response: String }
struct Failed { reason: String }
struct Done { response: String }

struct Agent<S> {
    messages: Vec<Message>,
    total_cost: f64,
    state: S,
}

Transitions exist only on specific states:

impl Agent<Idle> {
    fn receive(self, input: String) -> Agent<Planning> { /* ... */ }
}

impl Agent<Planning> {
    fn call_tool(self, tool: ToolCall) -> Agent<ToolCalling> { /* ... */ }
    fn request_approval(self, tool: ToolCall) -> Agent<AwaitApproval> { /* ... */ }
    fn respond_directly(self, response: String) -> Agent<Responding> { /* ... */ }
}

Agent<Idle> has no .call_tool(). Doesn’t compile. Not a runtime check, not a test, not a convention.

Why Rust specifically

Rust’s ownership system is an affine type system. When a transition takes self, the previous state is consumed and destroyed. After calling .receive(), the Agent<Idle> no longer exists. You can’t hold a reference to it, clone it accidentally, or reuse it.

let agent = Agent::new(config);
let planning = agent.receive(input);
// agent.receive(other_input);  // ERROR: use of moved value

In Python, Java, TypeScript, Go (any garbage-collected language), the old value survives in memory. Nothing prevents you from holding a reference after the “transition.” The guarantee is impossible without move semantics.

The state type costs zero bytes at runtime. Unit structs disappear in the binary. Guarantees exist during compilation and nowhere else.

Solving the five problems

Read vs. write. Two separate transitions from Planning. Read tools go to ToolCalling directly. Write tools must pass through AwaitApproval:

impl Agent<AwaitApproval> {
    async fn wait_for_approval(self, gate: &dyn ApprovalGate)
        -> Result<Agent<ToolCalling>, Agent<Planning>>
    { /* ... */ }
}

Denied? Back to Planning. The booking never executes. Cost: $0.00.

System-controlled retry. This is where the loop-based approach fails hardest. In a while true loop, the LLM sees the error as text and decides what to do. Maybe retry. Maybe give up. Maybe retry 47 times. With a typed state, retry is a state with a counter. The LLM doesn’t get to decide:

impl Agent<Retrying> {
    async fn retry(self) -> Result<Agent<ToolCalling>, Agent<Failed>> {
        if self.state.attempts >= self.state.max_attempts {
            return Err(self.transition(Failed { /* ... */ }));
        }
        tokio::time::sleep(backoff_with_jitter).await;
        Ok(self.transition(ToolCalling { /* ... */ }))
    }
}

Returns Result. When retries are exhausted, the only possible result is Agent<Failed>. Failed has no transition methods. Terminal. There’s no path out.

Budget and timeout. Both are checked before tool execution. Budget is simple: if estimated cost exceeds the cap, transition to Failed immediately. I set max_attempts: 0 on the retry state, which means .retry() returns Failed on the first call. Ugly? Yes. But it means the budget check doesn’t need its own state or transition. It reuses the existing failure path.

Timeout wraps the tool execution with tokio::time::timeout. Timeout and tool error both produce the same result: Agent<Retrying>. The retry state doesn’t care why it’s retrying.

match tokio::time::timeout(duration, executor.execute(&tool)).await {
    Ok(Ok(result)) => Ok(self.transition(Observing { result })),
    Ok(Err(_))     => Err(self.transition(Retrying { /* ... */ })),
    Err(_timeout)  => Err(self.transition(Retrying { /* ... */ })),
}

Observability. Each transition emits a tracing span. The trace of a request is the sequence of states. “70% of requests spend 500ms in Retrying” is one query, not a log parsing exercise.

In Practice

The approval gate killed the booking. No code change needed. The transition from Planning to a write tool goes through AwaitApproval by definition.

SCENARIO: Write tool — DENIED

Approval denied by human
Booking denied. No irreversible action was taken.
Cost: $0.00 (no tool was executed)

The retry path handled a unreliable API without intervention. First call failed, backoff kicked in, second call succeeded. The agent never saw the error. The state machine handled it before the LLM was consulted again.

SCENARIO: Multi-step — weather then book

Weather score: 9/10
book_flight failed (503) → Retrying (attempt 1, backoff 226ms)
Weather is great (score 9). Flight booked: "FL-I3LG69"
Total cost: $0.03

What I Got Wrong Building This

First mistake: I put PhantomData<S> as the state field but wrote code that accessed self.state.attempts. PhantomData is zero-sized. It carries type information only. There’s no data there. It took me two iterations to realize that if my states carry data (retry counters, tool results, error messages), the field needs to be state: S, not _state: PhantomData<S>. Unit structs like Idle and Planning still cost zero bytes. PhantomData was the wrong tool.

Second mistake: I put the retry counter in ToolCalling. My first instinct was that ToolCalling should know how many times it’s been tried. Wrong. ToolCalling is stateless. It executes. The counter belongs in Retrying. The flow is ToolCalling fails, creates Retrying { attempts: 1 }, Retrying decides to try again, creates a new ToolCalling. The counter lives in the retry state and gets passed through RetryContext when a new ToolCalling is created. Separating “execution” from “retry policy” took a refactor.

Third mistake: Retrying had the same interface as Planning. I gave it call_tool, request_approval, respond_directly. Retrying does one thing: try the same tool again or give up. It doesn’t replan. It doesn’t pick a new tool. Collapsing this into a Result<Agent<ToolCalling>, Agent<Failed>> removed the ambiguity entirely. The caller does match on the result. The state machine decides, not the caller.

Side-by-Side

while TrueTyped State Machine
StateImplicit listType parameter
Invalid transitionsRuntime bugDoesn’t compile
Read vs. writeSame pathWrite needs approval
RetryLLM decidesSystem with backoff
Budgetmax_iterationsCost per call
TimeoutNonetokio::timeout
ObservabilityParse logsTracing spans

When Not to Use This

This is overkill for a 3-state chatbot. If your agent has two tools and no side effects, a loop is fine. The cost of this approach is real: more types, more boilerplate, slower iteration when you’re still figuring out what states you need.

Where it paid off for me: the moment I added a second write tool (book_hotel), every safety guarantee extended automatically. No new code. The type system forced the approval gate on the new tool. In a while true loop, I’d have had to remember to add the check manually.

I’d also look into session types for stronger guarantees, but the complexity jump is steep. Probably not worth it unless you’re building a framework, not an application. I haven’t tested this approach with more than 9 states. At some point the number of impl blocks might become its own maintenance problem.

Code

Full implementation: github.com/alk0x1/typed-state-machine

git clone https://github.com/alk0x1/typed-state-machine
cd typed-state-machine
cargo run

Nine states. Typed transitions. Async with timeout. Retry with backoff. Budget control. Approval gates. Tracing on every transition.

Uncomment _compile_time_demo() to see the compiler reject invalid paths.