Ship agents like software. Run them like experiments. Agentix turns any importable Python function into a sandboxed rollout primitive. Your trainer, evaluator, or orchestration script passes a function object. agentix build packages that code and its dependencies into a bundle image; Agentix then overlays that bundle onto a task image and gives the typed result back.
cd examples/hello-agentix
uv sync
uv run agentix build . --name hello-agentix  # builds hello-agentix:0.1.0
uv run python run.py
from agentix import RuntimeClient, SandboxConfig, session
from agentix.bash import run
from agentix.deployment.docker import DockerDeployment

config = SandboxConfig(
    image="python:3.13-slim",
    bundle="hello-agentix:0.1.0",
)
async with session(DockerDeployment(), config) as sandbox:
    async with RuntimeClient(sandbox.runtime_url) as client:
        result = await client.remote(run, command="echo hello from $(uname -a)")
The unit of composition is not a custom runner. It is a function.

The Core Idea

Agentix keeps the execution layer deliberately small:
  • Remote calls run an installed function inside a sandbox worker. client.remote(fn, ...) derives the target from fn.__module__ and fn.__qualname__, ships args/kwargs as one pickle blob, and imports the function inside the worker.
  • Bundles package one Python project and its declared dependencies into a deploy-ready image. Agents, tools, benchmark scorers, and user code are just packages in the same runtime venv.

Remote Calls

Call sandbox code with ordinary Python functions instead of bespoke RPC clients.

Bundles

Build a reproducible bundle from pyproject.toml dependencies.

Deployments

Run the same bundle through local Docker or a deployment backend plugin.

Integrations

Wrap an agent CLI, benchmark harness, or internal tool as a narrow Python API.

Why This Matters

Agent work tends to sprawl: each agent CLI, benchmark harness, sandbox provider, and training loop grows its own adapter. Agentix collapses that matrix into one rule: if it is installed in the bundle and importable by Python, the host can call it.
You haveYou exposeYou call
Claude Code, Codex, Aider, OpenHands, or an internal agentasync def run(...) -> RunResultawait client.remote(run, ...)
Bash, file operations, repo setup, or local toolsasync def run(command: str) -> BashResultawait client.remote(bash_run, ...)
SWE-bench, MLE-Bench, or an internal evaluatorasync def score(...) -> Scoreawait client.remote(score, ...)

Where It Sits

Trainer / evaluator / script
  imports a function
  calls RuntimeClient.remote(fn, *args, **kwargs)
        |
        v
Sandbox bundle
  imports the same module
  runs the function in a worker process
  returns the result
The same pattern works for a toy function, a CLI-driven coding agent, a benchmark scorer, or a full RL rollout step.

Example Rollout

Start with the complete runnable demo in agentix-cookbook/examples/hello-agentix:
uv run agentix build . --name hello-agentix
uv run python run.py
This SWE-bench-shaped flow composes three independent packages in one sandbox: a shell primitive prepares the repo, an agent wrapper edits it, and a scorer wrapper grades the patch.
from datasets import load_dataset
from agentix import RuntimeClient
from agentix.bash import run as bash_run
from agentix.claude_code import run as run_claude
from agentix.swebench import score

inst = dict(load_dataset("princeton-nlp/SWE-bench_Verified", split="test")[0])

async with RuntimeClient(sandbox.runtime_url) as client:
    await client.remote(
        bash_run,
        command=(
            f"git clone https://github.com/{inst['repo']}.git /testbed && "
            f"cd /testbed && git checkout {inst['base_commit']}"
        ),
    )

    await client.remote(
        run_claude,
        instruction=inst["problem_statement"],
        workdir="/testbed",
        env={"ANTHROPIC_API_KEY": api_key},
    )

    diff = await client.remote(
        bash_run,
        command="cd /testbed && git add -A && git diff --cached --no-color",
    )
    report = await client.remote(score, instance=inst, patch=diff.stdout)
Agentix does not special-case Claude Code, SWE-bench, or bash. They are installed modules with functions the worker can import.

Quickstart

Build a tiny bundle, launch it in Docker, and call a remote function.

Remote Calls

Learn the target string, call shapes, and typing model.

Bundles

See how dependency declarations become bundles.

Architecture

Follow the request from client to server to worker and back.