First Eval Walkthrough

Use the current seeded local path to create a run, stream events, and inspect ranking output without inventing setup that is not in the repo.

This walkthrough sticks to what the repo already supports today: seed local data, create a run, stream events, and inspect the result.

1. Bring up the local stack

From the repo root:

./scripts/dev/start-local-stack.sh

If you want the browser UI too:

cd web
pnpm install
pnpm dev

2. Seed a runnable fixture

Back in the repo root:

./scripts/dev/seed-local-run-fixture.sh

That script seeds enough data to create a local run through the API.

3. Create the run

You can hit the API directly:

./scripts/dev/curl-create-run.sh

Or, if you are using the CLI against a prepared workspace, create and follow the run there:

agentclash run create --follow

4. Inspect the result

Once you have a run ID, inspect its status and ranking:

agentclash run get <RUN_ID>
agentclash run ranking <RUN_ID>

If the web app is running, open the workspace run detail view in the browser and inspect the replay and scorecard surfaces from there.

What you should see

a run record created in the workspace
event streaming during execution when you follow the run
a ranking view once the backend has enough completed run-agent results to score

Warning