alpha · local Six clones live — Slack, Stripe, Notion, HubSpot, GitHub, Linear

Break your agent's integrations
before your customers do.

Full, running clones of the SaaS tools your agent acts in. Reproduce any bug as a clone id, reset in seconds, and ship without the release-day fire drill.

~/evals — asymmetric
~10s
cold spin to live API
1×
command to reset to byte-for-byte state
6
SaaS clones live today
100%
of agent work is rows you own
The loop

Five beats. Each one command.

Every asymmetric session is the same shape — spin a clone, seed it to the exact state your test needs, point your agent at it, read the result straight from the database, then reset for the next run.

1

Spin asymmetric spin slack

Creates a clone, allocates ports, runs migrations, and waits until healthy. You get a live API endpoint back.

2

Seed asymmetric seed

Load a deterministic fixture for repeatable evals, or generate realistic data with an LLM. Either leaves the clone in a known starting state.

3

Run your agent → HTTP API / MCP

Point your agent at the clone's HTTP API or MCP server. Every action lands in the clone's own database.

4

Inspect asymmetric query · logs

Read the database directly with query to score exactly what the agent did. Stream logs to debug.

5

Reset / destroy asymmetric reset

reset drops and re-seeds to the identical starting state for the next trial. destroy tears it down and frees the ports.

eval-run.sh
# 1 · spin a fresh clone $ asymmetric spin slack slack-a1b2 · http://127.0.0.1:3001   # 2 · seed deterministic fixture $ asymmetric seed slack-a1b2 --fixture acme-corp 6 channels · 142 messages · 18 users   # 4 · score straight from the db $ asymmetric query slack-a1b2 \     "select count(*) from messages" count
───────
147
  # 5 · reset for the next trial $ asymmetric reset slack-a1b2
The thesis

Why a clone, not a dummy workspace

A dummy workspace you maintain by hand can't be reset, shared, or reproduced. A pile of mocks is never faithful enough to trust. asymmetric is the middle ground — real enough your agent can't tell, disposable enough to throw away.

High fidelity, not stubs

Real NestJS backends on real Postgres with real JWT auth. Stateful, referentially consistent, returning authentic errors — not canned responses.

Reproducible to the row

Every clone has a stable id, its own database, and a recorded seed. Hand a teammate the id to reproduce a bug exactly — no "which channels, which users, what config." reset rebuilds it byte-for-byte.

Scorable

The agent's work is just rows in a database you own. Read them with query to grade a run objectively — not by scraping output.

Local-first

Everything runs through Docker on your machine. No cloud account, no external dependencies. A connected control-plane mode is planned.

The mental model

Two nouns, two altitudes

A clone is the unit you build and compose. An environment is the unit you point an agent at and score.

The unit you build

Clone

One running instance of a SaaS product: a real backend, its own database, real auth. The unit you build and compose.

The unit you score

Environment

A named composition of clones on shared infrastructure, defined declaratively so you can version-control your agent's target. Environments are the primitive: as their entropy rises, agent behavior shifts — and that's the data almost no one has yet. Read the thesis →

acme.env.yaml
# acme.env.yaml name: acme clones: - template: slack mode: api seed: acme-corp
spin the whole environment asymmetric env spin acme.env.yaml
Under the hood

How it works

Same commands drive local Docker today and a remote control plane later. Swap the engine, keep the loop.

asymmetric

The CLI

The spin → seed → inspect → reset toolchain you drive. One binary, the whole loop.

@repo/clone-contract

The contract

The shared CloneProvider interface, types, and named errors. The CLI depends on the contract — not on any clone.

LocalDockerProvider

Providers

LocalDockerProvider today (shells to docker compose). CloudProvider planned. Swap the engine, keep the commands.

asym-shared

Shared infra

One shared Postgres + one shared Redis on an asym-shared network; each clone isolated by its own database clone_<id> and Redis prefix. A fleet stays cheap to run, fully separate.

Where we're going

The sneak peek

Honest about what's roadmap — but this is the direction. The compounding flywheel: every run makes the next agent better.

Multi-clone environments Roadmap

Several real products sharing one identity graph — the same user, org, and account consistent across Slack, Jira, GitHub… so an agent operates a whole connected workspace, not one app. This shared cross-clone identity is the moat.

Snapshot / restore the whole environment Roadmap

As a transaction — the real eval & RL primitive: snapshot → rollout → restore. Branch a world, run an agent, roll it back instantly.

The trace / rollout store Roadmap

Record every agent trajectory against an environment. It becomes your eval dataset, and later your RL training data. Every run makes the next agent better.

Slack Stripe Notion HubSpot GitHub Linear Jira soon Salesforce soon

Spin your first clone.

A real environment for your agent in one command. Reproduce bugs as a clone id, reset to a known state, and ship without the release-day fire drill.