Making the Agent Visible

|9 min read|

Building a page that shows an AI agent's reasoning and tool calls in real-time — and the polish round to make it actually pleasant to watch.

The chat widget I built a few weeks ago is already agentic. Ask it what I'm working on and it calls getProjects. Ask it to recommend a post and it calls searchPosts, then navigateToPost. Multiple tools, multiple steps, one answer.

But none of that is visible. You see the final response. The reasoning, the tool calls, the intermediate results — all of it happens silently behind the scenes.

I wanted to change that. Not to expose internals for their own sake, but because watching an agent work is one of the best ways to understand what it's actually doing. And I've been spending a lot of time trying to understand agents.

The Page

The /agent page lets you give the model a task and watch it execute. Every tool call appears as it happens. Every result streams in. The final answer builds up after.

Three preset tasks to start with:

  • Deep dive into ZJ's writing themes — calls searchPosts, reads several posts, synthesises themes
  • What's ZJ building? Give me the full picture — calls getProjects, cross-references with getTimeline
  • Trace ZJ's life journey from the beginning — walks the timeline, connects the dots

Or you can type anything.

The Streaming Protocol

The API route uses the same streamText setup as the chat widget, but the event types going down the wire are different:

app/api/agent/route.ts
if (part.type === "reasoning-delta") {
  emit({ type: "thinking", text: part.text });
} else if (part.type === "text-delta") {
  emit({ type: "text", text: part.text });
} else if (part.type === "tool-call") {
  emit({ type: "tool_call", name: part.toolName, input: part.input });
} else if (part.type === "tool-result") {
  emit({ type: "tool_result", name: part.toolName, output: part.output });
}
ts

Each is a newline-delimited JSON object. The frontend reads the stream and builds a list of steps — thinking blocks, tool calls paired with their results, the final text. Same approach as the chat widget but with the internal scaffolding exposed rather than hidden.

Three Ways to Show It

I wasn't sure which visual format would work best, so I built all three and added a toggle.

Terminal is the most literal. Tool calls look like function calls, results are indented below them, the final answer comes after a horizontal rule. Monospace font, a bit stark. Good for seeing exactly what's happening.

Timeline is the most spatial. A vertical line connects the steps. Each tool call is a node with a dot. The result is nested inside the same entry. It reads like a workflow trace — you can see the shape of what the agent did at a glance.

Cards is the closest to the rest of the site. Each step is a bordered card: thinking, tool, answer. The visual language is familiar. It's the most readable but the least raw.

I ended up preferring timeline most of the time. It gives you a clear sense of how many steps the agent took and in what order, without feeling like a debug log.

The Model

The chat widget runs on Claude Haiku via a LiteLLM proxy. For the agent page I wanted something that could go deeper — read more posts, make more tool calls, write longer answers — without the cost being a concern.

I switched to DeepSeek V4 Flash. It's cheap enough that I can set maxOutputTokens to 8000 and stopWhen: stepCountIs(10) without worrying. For a demo page that's meant to show the model working thoroughly, that matters.

One gotcha: the Vercel AI SDK v6 defaults to the OpenAI Responses API when you call the provider function directly. DeepSeek V4 Flash isn't a Responses API model. The fix was to use .chat() explicitly:

model: litellm.chat("deepseek-v4-flash"),
ts

Not obvious from the types alone — createOpenAI() returns a provider with both chat() and responses() methods, and the default callable hits responses.

The Streaming Oddity

After the agent page worked, I tried swapping the chat widget to DeepSeek too. It looked broken. Responses would hang for ten seconds and then dump the entire answer in one burst. No progressive streaming at all.

The streaming code on both ends was unchanged from when it worked with Claude. So something about the model swap was the cause. I ran the two endpoints side-by-side with curl -w "%{time_starttransfer}":

requesttime-to-first-bytebody duration
with tools declared8.3s1.1s
without tools1.3s1.1s

The body duration is identical — once bytes start, they stream at the same rate. But declaring tools in the request adds seven seconds of nothing at the front. The proxy or model is silently buffering the entire response to scan for function call syntax before releasing any chunks.

It happens because not every model emits OpenAI-format tool_calls natively. DeepSeek emits something closer to XML, and LiteLLM (or whatever sits between LiteLLM and the model) has to wait long enough to detect and translate before it dares stream. With seven tools and a fat system prompt, the chat widget hit ~14s TTFB. Looked indistinguishable from "not streaming."

Two routes through this:

  1. Don't pass tools when you don't need them. The TLDR button for blog summaries has no tools, so DeepSeek streams in under a second there. Same model, completely different feel.
  2. Pick a model whose tool-call format the proxy can pass-through. Claude Haiku doesn't have this problem on this proxy — tool calls in its SSE stream don't need to be retranslated. So the chat widget went back to Haiku, and the agent page (where TTFB matters less because of all the visible process) stayed on DeepSeek.

I wouldn't have noticed any of this without the side-by-side. Streaming "works" until it doesn't, and the failure mode is just a longer pause.

The Polish Round

The first version of the agent page worked but wasn't very pleasant to watch. Three things needed fixing.

Sticky input. As the agent ran, the output grew taller than the viewport. The input bar scrolled away with it. By the time the answer was streaming in, you couldn't see where to type your next question without scrolling all the way back up.

<div style={{ position: "sticky", top: 0, zIndex: 10, background: "var(--color-bg)" }}>
  <form></form>
  <ViewToggle />
</div>
tsx

One CSS property. The input + view toggle now pin to the top of the viewport while the output scrolls past behind them.

Auto-scroll, removed. Every streamed token was firing a scrollIntoView({ behavior: "smooth" }) on a sentinel at the bottom of the output. That meant every chunk yanked the page down. If you tried to scroll up to re-read an earlier tool call mid-stream, the next token would teleport you back to the bottom. Felt awful. Easier to just delete the effect and let the page sit still — the user can scroll wherever they want, and the new content arrives wherever it arrives.

Collapse the process when the answer arrives. This was the biggest one. By the time the model starts streaming its answer, the user has already seen the thinking and the tool calls. Leaving them inline above the answer makes the answer scroll-jump every time a new chunk arrives, because the page above it is itself still growing.

So once a text-delta event lands, every preceding thinking / tool_call / tool_result step tucks behind a single chip:

▼ 2 thoughts · 3 tools

Click it to expand back into the full process view. Otherwise, the answer takes the foreground and the scaffolding stays out of the way. This is the kind of detail that's invisible if you only ever look at static screenshots — you don't notice it until you sit and actually use the page through several runs.

Markdown. The final answer was rendering as plain whiteSpace: pre-wrap text. The model writes in markdown — headers, bullets, bold for emphasis, occasional code blocks — and none of it was being parsed. So I wired up marked for the answer field:

import { Marked } from "marked";
const md = new Marked({ gfm: true, breaks: true });
 
function Markdown({ text }: { text: string }) {
  const html = useMemo(() => md.parse(text.replace(/</g, "&lt;")) as string, [text]);
  return <div className="agent-md" dangerouslySetInnerHTML={{ __html: html }} />;
}
ts

The replace(/</g, "&lt;") is a small safety belt — the model can write whatever HTML it wants, and a stray <script> shouldn't get a chance to execute. Markdown syntax never uses < (except inside autolinks, which I'm fine sacrificing), so escaping it before marked runs is the cheapest sanitiser I can think of without pulling in DOMPurify.

These four changes took an hour combined and made the difference between "interesting demo" and "actually pleasant to watch."

What Watching It Teaches You

The most interesting thing about making an agent's steps visible isn't the individual tool calls. It's the order.

On "deep dive into ZJ's writing themes", the agent always starts with a broad searchPosts("") to get the full list, picks a few slugs, reads them with readPost, then synthesises. It doesn't summarise from memory — it actually reads. The final answer is noticeably better because of it.

On "what's ZJ building", it almost always combines getProjects with getTimeline. The two together give it enough context to answer "what and why" rather than just listing things.

And occasionally it does something I didn't expect. Asked to "trace ZJ's life journey," it sometimes follows the timeline call with a searchPosts on a year that looked busy, then reads the post for that year before continuing. Not a strategy I prompted for. The visible scaffolding makes those moments easy to spot — when you can only see the final answer, you'd assume the model "just knew." When you can see the calls, you can tell it didn't, and that the answer was earned.

This kind of structured curiosity is the thing I keep trying to build with agents. Not just calling a tool when asked, but knowing which tools to combine to get a richer answer. Seeing it work — step by step, in real-time — makes it easier to reason about why it succeeds or fails.

The demo is live at /agent if you want to try it.