Challenge packs
Input sets & cases
How cases bind challenges, structured inputs, expectations, assets, and legacy payloads—grounded in challengepack.CaseDefinition.
Input sets are the unit AgentClash schedules per deployment/candidate. Each input_sets[] entry contains cases[] (CaseDefinition in backend/internal/challengepack/bundle.go).
Case identity
challenge_key— must reference an existingchallenges[].keycase_key/ legacyitem_key— both accepted; normalization duplicates missing side from the other
EffectiveKey() chooses case_key when present for stored rows.
Three authoring styles (coexist)
- Legacy payload-only — fill
payloadmap; omit structured inputs/expectations - Structured eval —
inputs[]+expectations[]with explicitkindfields - Artifact heavy —
assets[]+artifacts[]referencing declared version/challenge assets
IsLegacyPayloadOnly detects style (1) for storage compatibility.
Stored document shape
When modern fields exist, StoredPayload() marshals StoredCaseDocument JSON with schema_version: 1, preserving:
payloadinputsexpectationsartifactsassets
This is what scoring + replay pull back—not the raw YAML fragment.
Case inputs (inputs[])
CaseInput fields:
| Field | Role |
| --- | --- |
| key | Stable id for templates / UI |
| kind | Drives rendering + validator binding (text, artifact, etc.—product-specific kinds should match worker expectations) |
| value | Inline scalar/object |
| artifact_key | Pull bytes from declared asset map |
| path | Optional relative path inside asset bundle |
Validators can address values through case.inputs.<key> evidence paths.
Expectations (expectations[])
CaseExpectation parallels inputs:
key,kind,value,artifact_key, plussourcetelling graders where dynamic gold values originate (input:promptpattern seen in CLI template packs)
Use expectations for:
- deterministic string compares
- supplying LLM judge
reference_frombindings - filesystem validators comparing outputs to expected files
Assets on cases
Case-level assets[] references use the same AssetReference structure as version-level entries (key, path, optional artifact_id). Validation ensures cross-references exist before publish succeeds.
Input set metadata
Optional description on an input set is preserved for UI/discovery; there is no behavioral magic—selection happens by id/key at run creation time.
Choosing input set at run time
CLI eval start accepts --input-set when multiple sets exist; otherwise TTY flows prompt. API consumers pass the chosen input_set_id when creating runs (see OpenAPI CreateRun family).