Protos AI: Building Reliable & Trustworthy AI Agents

September 18, 2025

This is #2 of the Founder Chronicles series, where co-founders Joel and Simeon share the Protos Labs story from their perspective. A peek into the inner workings from our 1-year stealth building an AI agent for cyber threat intelligence.

‍

1. AI Agents (In Their Truest Form) Are Hard to Build

It’s easy to string together a prompt and plug it into GPT and call it an “agent.” But building a real agent—one that can plan, execute, reflect, and adapt reliably—is far more complex than it seems.

‍

Early on, hallucinations were a major roadblock. But what surprised us even more was how unpredictable large models can be. The same prompt, same tools, same inputs—yet occasionally, the agent would respond in wildly different ways. The issue wasn’t always the prompt. Often, it came down to tool latency, subtle context gaps, or a lack of grounding signals the model could reason with.

‍

We experimented with a number of design principles:

Structured memory and contextual recall, to track entities and relationships across tasks.
Modular reasoning flows, separating planning, execution, and feedback into distinct layers.
Validated tool interfaces, to reduce cascading failures and ensure safe execution.
State-aware prompting, evolving with each step to maintain coherence and depth.

In the end, what mattered most wasn’t a clever prompt or a better model—though both helped. The breakthrough came when we embraced the model’s generative nature, while anchoring every step in structured, auditable logic. That’s what turned an unpredictable LLM into a reliable analyst.

2. UX Is as Critical as Intelligence

In the early days, our agents felt a bit like black boxes. They would run long, linear, “autonomous” investigations sometimes taking up to 10 minutes per task without visibility, control, or interruption. The outcomes? Overconfident guesses, occasional silence, and understandable user frustration.

‍

That’s when it clicked: in cyber, transparency builds trust. Users need to stay informed because context matters, and assumptions can be costly.

‍

We rethought the UX around a few key principles:

Multi-turn, interruptible workflows so users can interact mid-task
Graph state introspection to explore memory states
Full audit logs for every step, decision, and tool call
Structured outputs in Markdown, JSON, and chat-ready summaries

These changes made all the difference. In cybersecurity, explainability isn’t a luxury it’s essential. And trust doesn't come from intelligence alone; it comes from clarity.

3. Things Move Fast — and You Don’t Need to Build Everything from Scratch

When we began, technologies like Anthropic’s Model Context Protocol (MCP) had just emerged. Five months on, they’ve become vital integration standards. Similarly, LangGraph matured just in time, allowing us to build declarative, tool-aware DAG workflows with clear separation of memory and control.

‍

From the outset, we made a conscious decision: we wouldn’t rebuild what others had already solved. Our focus would remain on the agent’s reasoning how it plans, adapts, and decides not on the plumbing behind it, and that decision paid off. It meant we could move faster where it mattered most:

LangGraph for modular and traceable execution
MCP to standardise tool interactions
Azure OpenAI with logic for retries, rate-limiting, and failover handling

By standing on the shoulders of best-in-class tools, we focused on what makes Protos AI smart — not just functional.

4. This Is Just the Beginning

What we’ve launched today is our starting point, not the finish line.

Protos AI already supports:

Full planning–execution–reflection loops
Structured memory and context
Multi-format outputs (chat, reports, structured data)
Asynchronous tool orchestration
Secure, agent-aware task execution

And what’s ahead?

Document-linked task grounding (agentic RAG)
Human-in-the-loop escalation, validation, and overrides
Agent-to-agent task passing (e.g. from threat analyst to underwriter)
QA and benchmarking workflows for performance visibility

Our belief is simple: AI shouldn’t replace humans, but rather empower them especially in environments where context is vast, signals are noisy, and time is tight. In cyber, that’s not the exception it’s the rule.

Final Thoughts

We didn’t build Protos AI just to impress with flashy demos. We built it because modern cyber defence demands reasoning, not just summarisation. Agentic AI isn’t about wrapping GPT in a UI. It’s a system of memory, planning, tools, reflection, and outputs woven into an experience that’s built to earn trust. We hope what we’ve built sparks ideas, collaboration, and maybe even a few breakthroughs of your own. If you’re exploring similar paths or curious where Protos AI fits into your workflows we’d love to chat.