Architecture¶
Hawk provides infrastructure for running Inspect AI evaluations in a Kubernetes environment using YAML configuration files.
Overview¶
flowchart TD
User["<b>hawk eval-set config.yaml</b>"]
API["Hawk API Server<br/><i>FastAPI on ECS Fargate</i>"]
EKS["Kubernetes (EKS)<br/><i>Scaled by Karpenter</i>"]
Runner["Runner Pod<br/><i>Creates virtualenv, runs inspect_ai.eval_set()</i>"]
Sandbox["Sandbox Pod(s)<br/><i>Isolated execution · Cilium network policies</i>"]
S3[("S3<br/><i>Eval logs</i>")]
EB["EventBridge → Lambda<br/><i>Tag, import to warehouse</i>"]
DB[("Aurora PostgreSQL<br/><i>Results warehouse</i>")]
Viewer["Web Viewer<br/><i>CloudFront · Browse, filter, export</i>"]
Middleman["Middleman<br/><i>LLM Proxy</i>"]
LLMs["LLM Providers<br/><i>OpenAI · Anthropic · Google · etc.</i>"]
User --> API
API -- "Validates config & auth<br/>Creates Helm release" --> EKS
EKS --> Runner
EKS --> Sandbox
Runner -- "Writes logs" --> S3
Runner <-- "API calls" --> Middleman
Middleman --> LLMs
S3 -- "S3 event" --> EB
EB --> DB
Viewer --> DB
Detailed Architecture¶
graph TB
subgraph "User's Computer"
CLI[hawk eval-set]
end
subgraph "Hawk API Service"
API[FastAPI Server]
AUTH[JWT Auth]
end
subgraph "Kubernetes Control Plane"
HELM1[Helm Release 1<br/>Hawk]
CHART1[Helm Chart 1<br/>Inspect Runner Job Template]
HELM2[Helm Release 2<br/>inspect-k8s-sandbox]
CHART2[Helm Chart 2<br/>Sandbox Environment Template]
end
subgraph "Inspect Runner Pod"
HAWKLOCAL[hawk local]
VENV[Virtual Environment]
RUNNER[hawk.runner]
INSPECT[inspect_ai.eval_set]
K8SSANDBOX[inspect_k8s_sandbox]
end
subgraph "Sandbox Environment"
POD2[Sandbox Environment Pod]
end
subgraph "AWS Infrastructure"
S3[S3 Bucket<br/>Log Storage]
EB[EventBridge]
L1[eval_updated<br/>Lambda]
L2[eval_log_reader<br/>Lambda]
L3[token_broker<br/>Lambda]
OL[S3 Object Lambda<br/>Access Point]
end
CLI -->|HTTP Request| API
API -->|Authenticate| AUTH
API -->|Create Release| HELM1
HELM1 -->|Deploy| CHART1
CHART1 -->|Run| HAWKLOCAL
HAWKLOCAL -->|Create venv| VENV
VENV -->|Execute| RUNNER
RUNNER -->|Call| INSPECT
INSPECT -->|Invoke| K8SSANDBOX
K8SSANDBOX -->|Create Release| HELM2
HELM2 -->|Deploy| CHART2
CHART2 -->|Create Pod| POD2
INSPECT -->|Run Commands| POD2
RUNNER -->|Get Scoped Credentials| L3
L3 -->|Validate Permissions| S3
INSPECT -->|Write Logs| S3
S3 -->|Object Created Event| EB
EB -->|Trigger| L1
CLI -->|Read Logs| OL
OL -->|Check Permissions| L2
L2 -->|Authorized Access| S3
Components¶
CLI (hawk)¶
The CLI is the primary interface for users. It handles configuration file parsing, API communication, and credential storage using keyring.
API Server¶
FastAPI-based REST API running on ECS Fargate. Handles JWT authentication, job orchestration via Helm, and configuration validation using Pydantic models.
Helm Chart: Inspect Runner Job Template¶
Each job gets its own isolated namespace. Resources created per job:
- Namespace — runner namespace, plus a separate sandbox namespace for eval sets
- Job — the Kubernetes job that runs the evaluation
- ConfigMap — stores the eval set configuration and per-job kubeconfig
- Secret — API keys, common env vars, and user-provided secrets
- ServiceAccount — with AWS IAM role annotation and RoleBinding
- CiliumNetworkPolicy — network isolation allowing egress only to sandbox namespace, kube-dns, API server, and external services
Namespace Naming¶
- Runner namespace:
{prefix}-{job_id}(e.g.,inspect-my-eval-123) - Sandbox namespace:
{runner_namespace}-s(e.g.,inspect-my-eval-123-s) - Maximum job ID: 43 characters (enforced to stay within Kubernetes' 63-char namespace limit)
Runner (hawk.runner.entrypoint)¶
The runner pod's entrypoint creates an isolated Python virtualenv, installs dependencies (inspect_k8s_sandbox, task/solver/model packages), then executes the evaluation. This isolation prevents dependency conflicts.
Sandbox Environments¶
Sandbox pods run within a StatefulSet (auto-recreated on crash) with resource constraints and limited network access, provided by inspect_k8s_sandbox.
Log Storage¶
Evaluation logs are written directly to S3 by Inspect AI:
*.eval— individual evaluation result files (zip archives)logs.json— metadata mapping eval file paths to headers.buffer/— in-progress state (SQLite databases)
Lambda Functions¶
| Lambda | Purpose |
|---|---|
| eval_updated | Triggered by S3 events on .eval and logs.json creation. Tags files by model, publishes completion events to EventBridge. |
| eval_log_reader | S3 Object Lambda for authenticated log access. Checks user permissions via model group membership. |
| token_refresh | Refreshes OAuth access tokens used by other Lambda functions. |
| token_broker | Exchanges user JWTs for scoped AWS credentials, enforcing multi-tenant isolation at the credential level. |
Log Access Flow¶
- User requests logs via
hawk weborhawk transcript - Request routes through S3 Object Lambda Access Point
eval_log_readerLambda validates user permissions against model groups- Authorized users receive the requested log data
Users should never access the underlying S3 bucket directly — always through the Object Lambda Access Point.
Repository Structure¶
infra/ Pulumi infrastructure (Python)
__main__.py Entrypoint
core/ VPC, EKS, ALB, ECS, RDS, Route53, S3
k8s/ Karpenter, Cilium, Datadog agent, RBAC
hawk/ Hawk API (ECS), Lambdas, EventBridge, CloudFront
datadog/ Monitors, dashboards (optional)
lib/ Shared config, naming, tagging helpers
hawk/ Hawk application (Python + React)
cli/ CLI (Click-based)
api/ API server (FastAPI)
runner/ Kubernetes job runner
core/ Shared types, DB models, log importer
www/ Web viewer (React + TypeScript + Vite)
terraform/ Lambda function modules (OpenTofu)
examples/ Example eval and scan configs
tests/ Unit, E2E, and smoke tests
middleman/ LLM proxy (OpenAI, Anthropic, Google Vertex)
Pulumi.example.yaml Documented config reference