Open-source test harness for AI agents that take real-world actions.
Publicly reported incidents include a long-running multi-agent loop with on the order of tens of thousands of dollars in API spend (Dev.to post-mortem, TowardsAI, Hacker News), a Replit AI workflow that deleted production records and added fabricated data with vendor acknowledgment (Business Insider, The Register, Fast Company, Replit), and an Amazon-internal coding-agent change that removed and recreated production cloud configuration and caused a multi-hour outage (Financial Times, AI Incident Database, Reddit r/aws). Those map to tool-use failures (unbounded loops, destructive writes, inherited operator access), not only bad text output. Agent-Harness is for automated checks over which tools ran, in what order, and with what arguments in test runs.
You write tests against a trace of tool calls. For example, you can check that:
assert_called_before).assert_call_count).assert_completion).assert_mutual_exclusion).assert_arg_lte).assert_approval_gate).Assertions can attach citation strings for documentation and reporting; they do not by themselves establish legal compliance.
from agentharness import (
assert_approval_gate,
assert_arg_lte,
assert_called_before,
scenario,
)
@scenario("examples/01_customer_support_langgraph/scenarios/happy_path.yaml")
def test_happy_path(run):
assert_called_before(run.trace, "lookup_order", "issue_refund")
assert_arg_lte(run.trace, tool="issue_refund", arg="amount", value=100)
assert_approval_gate(run.trace, tool="issue_refund")
From a clone of the repository (run at repo root):
pip install -e ".[langgraph,dev]"
An alpha is published on PyPI as pytest-agentharness; version pins and extras are listed in the repository README.
Treat the project as early software: APIs and packaging can change between alphas.
This is an alpha-stage, mostly solo-maintained project with a broad roadmap (more adapters, harder CI stories, optional compliance-oriented reporting). If you have shipped Python test tooling, agents, or safety-sensitive systems and want to contribute, start on GitHub (issues, docs, or code). There is no company behind this page and no user-count to show; interest is welcome, not promised.
github.com/Suirotciv/Agent-Harness