Configuration
Current gpu.jsonc reference for GPU CLI
Configuration
GPU CLI uses a gpu.jsonc file in your project root. gpu init creates a starting config, and the checked-in JSON schema powers autocomplete and validation.
JSON Schema
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json"
}Core Project Settings
project_id, provider, profile
{
"project_id": "my-ml-project",
"provider": "runpod",
"profile": "default"
}project_idstabilizes the project identity across machines.providerchooses the provider implementation.profileselects credential isolation for auth and SSH keys.
gpu_types
Specify preferred GPU types with optional counts:
{
"gpu_types": [
{ "type": "RTX 4090" },
{ "type": "A100", "count": 4 }
]
}If omitted, GPU CLI falls back to intelligent selection.
min_vram, max_price, regions, cloud_type
{
"min_vram": 24,
"max_price": 1.5,
"regions": ["US-TX-1", "US-CA-1"],
"cloud_type": "secure"
}Use these to constrain fallback GPU selection.
encryption
encryption controls volume encryption-at-rest behavior for the selected storage strategy.
{
"encryption": true
}- On RunPod built-in storage, this uses the provider-native encrypted volume behavior.
- On providers that support GPU CLI-managed LUKS, this flag enables that path.
- This setting does not mean every provider uses the same zero-trust storage model, so keep the provider-specific security model in mind.
storage_mode
storage_mode chooses the primary storage strategy for the workspace.
{
"storage_mode": "built-in"
}Valid values:
built-in— use the provider's default workspace storagenetwork— prefer an attached network volumemanaged— use provider-managed persistent storage when available
Use this with network_volume_id or volume_mode when you want explicit network-volume behavior.
Sync and Output Control
include
include force-includes gitignored files when syncing to the pod.
{
"include": ["data/weights.bin", "models/", "*.onnx"]
}outputs and exclude_outputs
outputs controls what syncs back from the pod. exclude_outputs removes paths from that set.
{
"outputs": ["outputs/", "checkpoints/", "*.safetensors"],
"exclude_outputs": ["*.tmp", "*.log"]
}outputs_enabled, outputs_root, outputs_backfill_seconds
{
"outputs_enabled": true,
"outputs_root": "outputs",
"outputs_backfill_seconds": 3600
}outputs_enableddisables daemon-managed output watching entirely.outputs_rootscopes the watch subtree.outputs_backfill_secondslimits how far back the daemon rescans after a restart.
extra_outputs
extra_outputs lets you sync files from absolute paths outside the workspace on the pod.
{
"extra_outputs": [
{
"remote": "/gpu-cli-workspaces/cache/checkpoints",
"local": "checkpoints/"
}
]
}vault
Use vault for outputs that should stay encrypted at rest on your local machine instead of syncing into the workspace.
{
"outputs": ["logs/**"],
"vault": {
"patterns": ["checkpoints/**", "generated/**"]
}
}Downloads
GPU CLI can pre-stage models, repos, and assets before your command runs.
HuggingFace
{
"download": [
{
"strategy": "hf",
"source": "black-forest-labs/FLUX.1-dev"
}
]
}HTTP
{
"download": [
{
"strategy": "http",
"source": "https://example.com/model.bin",
"target": "models/model.bin"
}
]
}Git
{
"download": [
{
"strategy": "git",
"source": "https://github.com/comfyanonymous/ComfyUI",
"target": "ComfyUI",
"tag": "v0.3.7"
}
]
}Civitai
{
"download": [
{
"strategy": "civitai",
"source": "4384"
}
]
}For persistent model storage, prefer ${workspace_base} in targets so the path stays aligned with the provider's mounted workspace.
Pod Runtime Settings
keep_alive_minutes
Cooldown in minutes after the last meaningful activity.
{
"keep_alive_minutes": 10
}persistent_proxy, resume_timeout_secs, max_queued_requests, health_check_paths
These settings control wake-on-request behavior for long-lived services.
{
"persistent_proxy": true,
"resume_timeout_secs": 180,
"max_queued_requests": 100,
"health_check_paths": ["/health", "/ready"]
}persistent_proxy: keep the local proxy listening after the pod stops so incoming requests can resume it.resume_timeout_secs: how long to wait for the pod to come back.max_queued_requests: cap queued requests while the pod is resuming.health_check_paths: paths that should never count as meaningful activity.
docker_image, dockerfile, workspace_size_gb, target_architecture
{
"docker_image": "ghcr.io/gpu-cli/base:latest",
"dockerfile": "Dockerfile",
"workspace_size_gb": 50,
"target_architecture": "linux/amd64"
}Use target_architecture to override the provider default when you need ARM builds.
startup
startup sets the default command for gpu run when you do not pass an explicit command. It is also the main entrypoint for many template workflows.
{
"startup": "python app.py --host 0.0.0.0 --port 8080"
}inputs
inputs defines the prompts shown when someone runs gpu use <template>.
{
"inputs": [
{
"type": "select",
"key": "model_variant",
"label": "Model Variant",
"options": [
{ "value": "small", "label": "Small" },
{ "value": "large", "label": "Large" }
],
"default": "small",
"required": true
},
{
"type": "text",
"key": "system_prompt",
"label": "System Prompt",
"placeholder": "You are a helpful assistant",
"required": false
},
{
"type": "boolean",
"key": "enable_auth",
"label": "Enable Auth",
"default": true
}
]
}- Use
keynames that match the values your template expects to substitute. typecontrols the prompt UI. Common choices areselect,text,number,boolean,model, andsecret.- Callers can answer interactively or prefill values with
gpu use --input key=value.
Ports and Activity Routing
ports supports both simple port numbers and rich port objects.
Simple ports
{
"ports": [8000, 8080]
}Rich ports
{
"ports": [
{
"port": 8080,
"description": "ui",
"http": {
"activity_paths": ["/api/chat", "/api/generate"],
"ignore_paths": ["/health", "/queue"],
"ignore_methods": ["OPTIONS", "HEAD"]
},
"websocket": {
"data_frames_are_activity": true,
"ping_pong_is_activity": false
}
}
]
}How activity routing works
- If a port has no HTTP rules, all HTTP requests count as activity.
ignore_methodsandignore_pathsblock known background traffic.- If
activity_pathsis set, only those paths reset the cooldown timer. - WebSocket data frames keep a connection warm, but the primary cooldown signal is still HTTP activity.
Use this for apps that poll frequently, such as queues, health checks, chat UIs, or dashboard frontends.
Hooks
Hooks run scripts during pod lifecycle events.
{
"hooks": {
"readiness": {
"type": "command",
"name": "service-ready",
"run": ["curl", "-sf", "http://localhost:8000/health"],
"retry_count": 30,
"retry_delay_secs": 2,
"timeout_secs": 10
}
}
}The most common use is a readiness hook that waits for a service to become available before GPU CLI marks the pod ready.
Network Volumes
network_volume_id
Attach a specific network volume by ID or unique friendly name.
{
"network_volume_id": "vol_abc123xyz"
}volume_mode
{
"volume_mode": "global"
}Options:
"global"— use the shared global volume"dedicated"— use a project-specific volume"none"— use ephemeral storage only
dedicated_volume_id and dedicated_volume_name
{
"volume_mode": "dedicated",
"dedicated_volume_name": "my-project-models"
}dedicated_volume_id wins if both are set.
Environment
{
"environment": {
"python": {
"requirements": "requirements.txt"
},
"system": {
"apt": [{ "name": "ffmpeg" }]
},
"shell": {
"steps": [{ "run": "pip install -e ." }]
}
}
}Use this to declare Python packages, system packages, and shell setup steps.
Serverless
The serverless block configures RunPod Serverless deployment. For the workflow and request examples, see Serverless Endpoints.
Main fields
{
"serverless": {
"template": "vllm",
"gpu_type": "NVIDIA A100 80GB PCIe",
"gpu_types": ["NVIDIA L4"],
"scaling": {
"min_workers": 0,
"max_workers": 3,
"idle_timeout": 5
},
"volume": {
"name": "my-project-vol",
"size_gb": 200,
"mount_path": "/runpod-volume"
},
"runpod": {
"flashboot": true,
"cached_model": "meta-llama/Llama-3.1-8B-Instruct",
"env": {
"MODEL_NAME": "meta-llama/Llama-3.1-8B-Instruct"
}
}
}
}Current caveats
serverless.prewarmexists in the schema but is not wired in the current runtime.- Deploy-time
--warmand--write-idsare accepted by the CLI but not wired through today. serverless.runpod.idscan exist in config, but GPU CLI does not currently auto-populate it during deploy.
Full Schema
See the complete checked-in schema at gpu-cli.sh/schema/v1/gpu.json.