A/B testing isn't just for button colors anymore. In a world driven by automation and AI, the most impactful optimizations happen behind the scenes—in the core business logic and the behavior of our autonomous agents. But how do you safely test a new pricing algorithm, a different fraud detection model, or a more sophisticated LLM prompt in a live production environment?
Traditionally, this has been a messy affair involving complex if/else statements, cumbersome feature flag systems, and manual data analysis. It's slow, risky, and hard to clean up.
Today, we're introducing a new primitive on the .do platform designed to solve this problem: experiment.branch. It's a clean, declarative, and powerful way to A/B test, canary release, and gradually roll out changes to your core business logic and agentic workflows.
Imagine you have a workflow that processes loan applications. A critical step is the credit scoring action.do. Your data science team has developed a new machine-learning model that promises higher accuracy. How do you roll it out?
With experiment.branch, you can measure the performance of both variants in real-time, under real-world conditions, and make data-driven decisions without compromising the stability of your system.
experiment.branch is a new construct within your .workflow.do agents that allows you to define two or more paths (variants) for your workflow to take. The .do platform handles the traffic distribution, execution, and—most importantly—the observability.
Each variant in an experiment branch executes a different action.do. This aligns perfectly with our core philosophy: encapsulate logic into atomic, reusable units. You're not testing random snippets of code; you're testing entire, version-controlled actions against each other.
Let's see it in action with our credit scoring example.
import { Do } from '@do-platform/sdk';
// Initialize the .do client
const-do = new Do(process.env.DO_API_KEY);
// Define the two actions we want to test
const legacyScoringAction = await-do.action.get('score-with-legacy-model');
const newMlScoringAction = await-do.action.get('score-with-ml-model');
// Define the workflow that uses the experiment
const processLoanWorkflow = await-do.workflow.create({
name: 'process-loan-application',
handler: async (inputs) => {
// ... other steps like initial data validation ...
// A/B test the scoring models
const scoringResult = await-do.experiment.branch({
name: 'credit-scoring-model-v2-test',
// The rest of the workflow is agnostic to the input source
// It only cares about the common 'applicantId'
inputs: {
applicantId: inputs.applicantId
},
variants: [
{
name: 'legacy-model', // Control group
actionId: legacyScoringAction.id,
weight: 90, // 90% of traffic
},
{
name: 'ml-model', // Challenger
actionId: newMlScoringAction.id,
weight: 10, // 10% of traffic
}
]
});
// The workflow continues, using the output from whichever action was chosen
if (scoringResult.output.score > 700) {
await-do.action.run('approve-loan', { id: inputs.applicationId });
} else {
await-do.action.run('reject-loan', { id: inputs.applicationId });
}
return { success: true, score: scoringResult.output.score, variant: scoringResult.variantName };
}
});
The beauty of this approach is its simplicity and power. The workflow logic remains clean. It simply asks the experiment to provide a score, and the .do platform manages the complex task of routing, executing, and tracking the chosen variant.
Integrating experimentation directly into the workflow layer unlocks several key benefits:
Your business logic doesn't need to be polluted with if/else checks for experiments. The experimentation rules are defined declaratively and managed by the platform. This makes your workflows easier to read, reason about, and maintain.
This is crucial. Every workflow execution that passes through an experiment.branch is automatically tagged with the experiment name and the variant that was run (e.g., experiment:credit-scoring-model-v2-test, variant:ml-model). You can instantly filter your logs, metrics, and traces in the .do dashboard to compare the performance, cost, and error rates of each variant side-by-side.
experiment.branch is your ultimate safety net for deployments. Before rolling out a new version of a critical action.do, you can configure an experiment to send just 1% of traffic to it. Monitor its performance and error rates. If everything looks good, you can gradually increase the weight—from 1% to 10% to 50% to 100%—all without a single new deployment.
Once you've picked a winner, the cleanup is trivial. Simply update your workflow to call the winning action.do directly and delete the experiment.branch block. No code-archaeology is needed to hunt down and remove old feature flags.
experiment.branch isn't just for financial models. You can apply it to nearly any part of your agentic workflows and business automation:
By treating experimentation as a first-class citizen, experiment.branch fundamentally changes how we build and evolve reliable, intelligent systems. It allows you to de-risk change, innovate faster, and make decisions based on quantitative data, not just intuition.
This is the next step in Business-as-Code: building systems that are not only automated but also self-optimizing.
Ready to stop guessing and start measuring? Dive into the .do platform and start your first backend experiment today.