AgentClash

AgentClash challenge packs ship as one YAML file decoded into challengepack.Bundle (backend/internal/challengepack/bundle.go). Publication stores a JSON manifest combining the pack body with metadata for execution and scoring (ManifestJSON in the same file).

Top-level keys

All keys listed here are persisted or validated somewhere in publish/validate—not decorative.

`pack` (required)

Human metadata surfaced in API and CLI lists:

slug — required, workspace-unique branding string
name — required display name
family — required grouping/category string
description — optional

`version` (required)

The executable spine of the bundle:

| Field | Notes | | --- | --- | | number | Required positive integer (int32). Bumps when you materially change validators, tooling, sandbox, scores, etc. | | execution_mode | native or prompt_eval. Empty accepts as legacy but you should always set explicitly. See execution rules below. | | tool_policy | Arbitrary-shaped map mirrored into manifest tool_policy. Must be omitted when execution_mode is prompt_eval (validated in ValidateBundle). | | filesystem | Optional filesystem constraints blob (same manifest path as today). | | sandbox | Optional SandboxConfig. Forbidden when execution_mode is prompt_eval. | | evaluation_spec | Required scoring contract unmarshalled through scoring package strict decode (scoring.StrictDecodeEvaluationSpec). Typos surface at parse time rather than silently defaulting. | | assets | Version-scoped AssetReference[] keyed for cases and uploads. |

SandboxConfig (bundle.go) currently supports:

network_access (bool)
network_allowlist (CIDR strings; invalid CIDR rejects publish)
env_vars (map of string literals—see Sandbox & E2B)
additional_packages (APT-style names constrained by regexp in validation)
sandbox_template_id (optional provider template override)

Manifest JSON merges sandbox_template_id from version.sandbox into the serialized version block for backends that historically keyed off that field separately.

`tools` (optional)

Optional map keyed by integration style; authoring today uses tools.custom as an array of composed tools (validation.go). Must be empty when mode is prompt_eval.

See Tools, primitives & policy.

`challenges` (required, non-empty)

Each challenge includes:

| Field | Required | Purpose | | --- | --- | --- | | key | yes | Stable id referenced by cases | | title | yes | Display | | category | yes | Stored metadata | | difficulty | yes | Stored metadata (easy/medium/… as free text unless your org standardizes it) | | instructions | often | Prompt body; mirrored into definition.instructions if missing there | | definition | optional | Extra JSON-compatible bag for product-specific authoring | | assets | optional | Challenge-scoped files | | artifact_refs | optional | Artifact key references validated against declared artifacts |

`input_sets` (required)

Defines runnable cases:

key and name required on each set.
Prefer modern cases array. Legacy items aliases are normalized into cases during normalizeBundle; do not rely on items in new authoring.

Cases are modeled by CaseDefinition with challenge_key, case_key, optional rich inputs/expectations, artifacts, assets, plus legacy payload for blob-only authoring.

Deep dive: Input sets & cases.

Execution mode compatibility

Validated in ValidateBundle:

When execution_mode is prompt_eval:

tools block must not be present
version.sandbox must not be present
version.tool_policy must be empty

When native you may populate sandbox, allowed tool kinds, and composed tools—the worker will hydrate the richer runtime path (native_executor flow).

Choose prompt_eval for pure model-output workloads; promote to native when you rely on sandbox files, primitives, validators that read captured files (file: evidence), etc.

After publish — IDs returned

Publishing returns stable identifiers referenced by runs (see authoring guide):

challenge_pack_id
challenge_pack_version_id
evaluation_spec_id
input_set_ids
Optional bundle_artifact_id

Run creation binds challenge_pack_version_id specifically—YAML filenames stop mattering immediately after publish.

Bundle YAML reference

Top-level keys

`pack` (required)

`version` (required)

`tools` (optional)

`challenges` (required, non-empty)

`input_sets` (required)

Execution mode compatibility

After publish — IDs returned

See also

Bundle YAML reference

Top-level keys

pack (required)

version (required)

tools (optional)

challenges (required, non-empty)

input_sets (required)

Execution mode compatibility

After publish — IDs returned

See also

`pack` (required)

`version` (required)

`tools` (optional)

`challenges` (required, non-empty)

`input_sets` (required)