Quick Start¶
This gets you from zero to a working Hawk deployment on AWS. You'll need an AWS account and a domain name. You can use your existing OIDC identity provider for authentication, or a Cognito user pool by default.
1. Install prerequisites¶
Or on Linux, install Pulumi, uv, the AWS CLI, Python 3.13+, and jq.
2. Clone the repo¶
3. Set up Pulumi state backend¶
Create an S3 bucket and KMS key for Pulumi state:
aws s3 mb s3://my-pulumi-state
aws kms create-alias --alias-name alias/pulumi-secrets \
--target-key-id $(aws kms create-key --query KeyMetadata.KeyId --output text)
Log in to the S3 backend:
4. Create and configure your stack¶
cd infra
pulumi stack init my-org --secrets-provider="awskms://alias/pulumi-secrets"
cp ../Pulumi.example.yaml ../Pulumi.my-org.yaml
Edit Pulumi.my-org.yaml with your values. At minimum, you need:
config:
aws:region: us-west-2
hawk:domain: hawk.example.com
hawk:publicDomain: example.com
hawk:primarySubnetCidr: "10.0.0.0/16"
hawk:createPublicZone: "true"
That's enough to get started. The environment name defaults to your stack name. Hawk will create a Cognito user pool for authentication automatically.
If you already have an OIDC provider (Okta, Auth0, etc.), you can use it instead:
# Optional: use your own OIDC provider instead of Cognito
hawk:oidcClientId: "your-client-id"
hawk:oidcAudience: "your-audience"
hawk:oidcIssuer: "https://login.example.com/oauth2/default"
5. Deploy¶
This creates roughly 200+ AWS resources including a VPC, EKS cluster, ALB, ECS services, Aurora PostgreSQL, S3 buckets, Lambda functions, and more. First deploy takes about 15-20 minutes.
6. Set up LLM API keys¶
Hawk routes model API calls through its built-in LLM proxy (Middleman). You need to provide at least one provider's API key:
This stores the key in Secrets Manager and restarts Middleman. You can set multiple keys at once:
Replace <env> with your hawk:env value (e.g., production). Supported providers: OpenAI, Anthropic, Gemini, DeepInfra, DeepSeek, Fireworks, Mistral, OpenRouter, Together, xAI.
7. Create a user (Cognito only)¶
If you're using the default Cognito authentication, create a user:
The script reads the Cognito user pool from your Pulumi stack outputs, creates the user, and prints the login credentials. Skip this step if you're using your own OIDC provider.
8. Install the Hawk CLI and run your first eval¶
uv pip install "hawk[cli] @ git+https://github.com/METR/hawk-preview#subdirectory=hawk"
# Configure the CLI to point to your deployment
uv run python scripts/dev/generate-env.py <stack> > ~/.config/hawk-cli/env
hawk login
hawk eval-set hawk/examples/simple.eval-set.yaml
hawk logs -f # watch it run
hawk web # open results in browser