Agent-Harness

Open-source test harness for AI agents that take real-world actions.

Why it exists

Publicly reported incidents include a long-running multi-agent loop with on the order of tens of thousands of dollars in API spend (Dev.to post-mortem, TowardsAI, Hacker News), a Replit AI workflow that deleted production records and added fabricated data with vendor acknowledgment (Business Insider, The Register, Fast Company, Replit), and an Amazon-internal coding-agent change that removed and recreated production cloud configuration and caused a multi-hour outage (Financial Times, AI Incident Database, Reddit r/aws). Those map to tool-use failures (unbounded loops, destructive writes, inherited operator access), not only bad text output. Agent-Harness is for automated checks over which tools ran, in what order, and with what arguments in test runs.

What it does

You write tests against a trace of tool calls. For example, you can check that:

One tool ran before another (assert_called_before).
A tool appears an exact number of times in order (assert_call_count).
Tool spans finish without error status where that applies (assert_completion).
Two conflicting tools are not both used in the same run (assert_mutual_exclusion).
A numeric argument stays at or below a bound (assert_arg_lte).
Calls to a sensitive tool include an approval-style field (assert_approval_gate).

Assertions can attach citation strings for documentation and reporting; they do not by themselves establish legal compliance.

Quickstart (Python)

from agentharness import (
    assert_approval_gate,
    assert_arg_lte,
    assert_called_before,
    scenario,
)


@scenario("examples/01_customer_support_langgraph/scenarios/happy_path.yaml")
def test_happy_path(run):
    assert_called_before(run.trace, "lookup_order", "issue_refund")
    assert_arg_lte(run.trace, tool="issue_refund", arg="amount", value=100)
    assert_approval_gate(run.trace, tool="issue_refund")

Install

From a clone of the repository (run at repo root):

pip install -e ".[langgraph,dev]"

An alpha is published on PyPI as pytest-agentharness; version pins and extras are listed in the repository README. Treat the project as early software: APIs and packaging can change between alphas.

Looking for collaborators

This is an alpha-stage, mostly solo-maintained project with a broad roadmap (more adapters, harder CI stories, optional compliance-oriented reporting). If you have shipped Python test tooling, agents, or safety-sensitive systems and want to contribute, start on GitHub (issues, docs, or code). There is no company behind this page and no user-count to show; interest is welcome, not promised.

github.com/Suirotciv/Agent-Harness

Why it exists

What it does

Quickstart (Python)

Install

Looking for collaborators

Links