In the age of AI, our applications are becoming more dynamic and intelligent. We're moving beyond static code to build agentic workflows that can reason, adapt, and execute complex tasks. But this newfound power brings a new challenge: How do we know if our agent's approach is the best one? How do we optimize business logic that's executed by an AI?
Traditional A/B testing is a cornerstone of optimizing user interfaces. We test button colors, headlines, and user flows to see what drives clicks and conversions. It's time to bring that same rigor to the backend. We need a way to A/B test our business logic, our AI prompts, and our agent's behavior. This is the next frontier of Business-as-Code: creating systems that not only execute but also learn and improve.
Experimenting with backend processes and AI agents isn't as simple as changing a button color. The challenges are unique:
To experiment effectively, we need a way to isolate variables, execute them reliably, and measure the outcomes precisely. The solution lies in breaking down our complex workflows into their smallest, most fundamental units.
At action.do, we believe every complex process is just a sequence of simple, well-defined tasks. We call these atomic actions: self-contained, single-responsibility units of work that either succeed completely or fail gracefully, leaving no partial state behind.
Think of actions like send-welcome-email, create-user-in-db, or charge-credit-card.
This atomic-first approach is the perfect foundation for experimentation because it provides:
Imagine you could define an A/B test directly within your workflow automation code. This is the core idea behind experiment.branch—a powerful way to direct traffic between different business logic paths and measure the results.
Instead of executing a single action, you can define an experiment with multiple variants, each triggering a different atomic action or a different set of inputs.
Let's see this in action. Suppose we want to A/B test two different AI prompts for summarizing a customer support ticket. One prompt is designed for conciseness, the other for detail. Which one leads to faster resolution times?
With a framework built on atomic principles, your experiment code could look like this:
import { Dō } from '@do-sdk/core';
const dō = new Dō({ apiKey: 'YOUR_API_KEY' });
// A/B Test two different agent prompts for summarizing a support ticket
const summaryResult = await dō.experiment.branch({
name: 'ticket-summary-agent-test-v2',
// Use a stable identifier to ensure the same user gets the same experience
branchOn: ticket.id,
variants: {
'concise-prompt': {
weight: 50, // 50% of traffic
action: {
name: 'summarize-text-with-ai',
input: {
text: ticket.body,
prompt: 'Summarize this support ticket in one sentence for an agent to review.'
},
},
},
'detailed-prompt': {
weight: 50, // 50% of traffic
action: {
name: 'summarize-text-with-ai',
input: {
text: ticket.body,
prompt: 'Summarize this ticket, identifying user sentiment, the core problem, and suggest a next action.'
},
},
},
},
});
// The result contains both the chosen variant and its execution outcome
console.log(summaryResult);
// {
// variant: 'detailed-prompt',
// result: {
// success: true,
// summary: 'User is frustrated about a billing error. Suggest issuing a refund.'
// }
// }
In this example, the experiment.branch flawlessly handles routing 50% of requests to the concise-prompt variant and 50% to the detailed-prompt. Each execution is logged, and you can now correlate the variant used for a ticket with its eventual resolution time, enabling true, data-driven optimization of your AI agent's behavior.
The possibilities are vast, transforming guesswork into data-backed decisions.
By defining experiments as code, you gain reliability and maintainability. Your tests are version-controlled, auditable, and co-located with the business logic they are meant to improve.
The era of fire-and-forget automations is over. Modern, agentic systems must be designed for continuous improvement. The principle of atomic actions provides the reliable foundation, and concepts like experiment.branch provide the tools to build truly intelligent, self-optimizing workflows.
Stop guessing and start measuring. Apply the proven power of A/B testing to your backend logic and build the most effective, efficient, and robust agentic systems possible.
What is an 'atomic action' in this context?
An atomic action is a self-contained, single-responsibility task that either completes successfully or fails entirely, without partial states. Think of it as the smallest indivisible unit of work in your workflow, like send-email, create-user, or charge-card.
How is this different from using a serverless function?
While similar, a platform like action.do is designed specifically for agentic workflows. It provides built-in orchestration, state management, logging, and security context for your actions, which you would otherwise need to build and manage yourself on top of a generic serverless platform. It's a higher-level abstraction for executing and composing business logic.
Can actions be chained together after an experiment?
Absolutely. The output of an experiment.branch can be fed directly into the next step of a larger workflow.do. This allows you to compose complex processes where experiments are just one part of a larger, robust automation.