An Agentix agent integration is a Python package with a function that runs inside the sandbox. For CLI agents, that function usually starts a subprocess, waits for it to finish, and returns a typed result. There is no base class, registry, or framework-specific adapter layer. The integration is useful because callers can import it and write:
from agentix.claude_code import run

result = await client.remote(run, instruction="fix the bug", workdir="/testbed")

Integration Contract

Keep the public surface narrow. A good first integration exposes one run function and one result type.
async def run(
    instruction: str,
    *,
    workdir: str = "/testbed",
    timeout: float = 600,
    env: dict[str, str] | None = None,
) -> RunResult:
    ...
The body runs in the sandbox, so it can call binaries, edit files, read the repository under /testbed, or invoke a local Python framework.

1. Package Layout

agentix-claude-code/
├── pyproject.toml
├── default.nix
└── src/agentix/
    └── claude_code/
        └── __init__.py
Use default.nix when the agent needs a system binary that should be available on the worker PATH. Pure Python integrations can omit it.

2. Function Surface

src/agentix/claude_code/__init__.py
from __future__ import annotations

import asyncio
import os
from dataclasses import dataclass


@dataclass
class RunResult:
    exit_code: int
    stdout: str
    stderr: str


async def run(
    instruction: str,
    *,
    workdir: str = "/testbed",
    timeout: float = 600,
    model: str | None = None,
    max_turns: int | None = None,
    env: dict[str, str] | None = None,
) -> RunResult:
    cmd = ["claude", "-p", instruction, "--print", "--permission-mode", "bypassPermissions"]
    if model:
        cmd += ["--model", model]
    if max_turns is not None:
        cmd += ["--max-turns", str(max_turns)]

    proc = await asyncio.create_subprocess_exec(
        *cmd,
        cwd=workdir,
        env={**os.environ, **(env or {})},
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    try:
        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
    except TimeoutError:
        proc.kill()
        await proc.communicate()
        return RunResult(exit_code=-1, stdout="", stderr=f"timed out after {timeout}s")

    return RunResult(
        exit_code=proc.returncode or 0,
        stdout=stdout.decode(errors="replace"),
        stderr=stderr.decode(errors="replace"),
    )
The result type should describe the sandbox-side command, not the whole rollout. Patch extraction, scoring, and trace assembly usually belong in the caller that composes several remote calls.

3. Packaging

pyproject.toml
[project]
name = "agentix-claude-code"
version = "0.1.0"
requires-python = ">=3.11"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/agentix"]
After installation, callers can import the function normally:
from agentix.claude_code import run
When a bundle includes this package, the sandbox worker can import the same module and execute run.

4. System Binary

Many agent CLIs are easiest to ship with Nix. The derivation should copy the executable into bin/. During agentix build, Agentix links those binaries under /nix/runtime/bin in the final image. See the agentix-cookbook/claude-code recipe for a complete working package.

5. Caller Composition

Keep generic rollout operations outside the agent wrapper. For example, extracting a patch is a shell operation that works for any agent that edited files:
from agentix.bash import run as bash_run

diff = await client.remote(
    bash_run,
    command="cd /testbed && git add -A && git diff --cached --no-color",
)
patch = diff.stdout
The caller decides how to compose repo setup, the agent run, patch extraction, scoring, and logging.

Streaming Output

For long-running agents, expose a streaming function next to run. Annotate it as AsyncIterator[T]; Agentix automatically uses the stream transport.
from collections.abc import AsyncIterator
from dataclasses import dataclass
from typing import Literal


@dataclass
class TokenChunk:
    text: str
    type: Literal["token"] = "token"


@dataclass
class RunDone:
    result: RunResult
    type: Literal["done"] = "done"


async def run_stream(instruction: str, **kwargs) -> AsyncIterator[TokenChunk | RunDone]:
    proc = await asyncio.create_subprocess_exec(
        "claude",
        "-p",
        instruction,
        "--output-format",
        "stream-json",
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    async for raw in proc.stdout:
        event = json.loads(raw)
        if event["type"] == "token":
            yield TokenChunk(text=event["text"])
        if event["type"] == "done":
            yield RunDone(result=_build_result(event))
Callers iterate over the remote result:
from agentix.claude_code import run_stream

async for event in client.remote(run_stream, instruction="fix the bug"):
    match event:
        case TokenChunk(text=text):
            print(text, end="")
        case RunDone(result=result):
            return result

Boundaries

  • Pass secrets per call with env, for example env={"ANTHROPIC_API_KEY": "..."}.
  • Use a stable default workdir such as /testbed for benchmark repos.
  • Bound timeout aggressively. Agent CLIs can hang.
  • Keep scoring in a scorer module. Keep repo setup in a primitive such as agentix.bash.