GPU CLI

Configuration

Current gpu.jsonc reference for GPU CLI

Configuration

GPU CLI uses a gpu.jsonc file in your project root. gpu init creates a starting config, and the checked-in JSON schema powers autocomplete and validation.

JSON Schema

{
  "$schema": "https://gpu-cli.sh/schema/v1/gpu.json"
}

Core Project Settings

project_id, provider, profile

{
  "project_id": "my-ml-project",
  "provider": "runpod",
  "profile": "default"
}
  • project_id stabilizes the project identity across machines.
  • provider chooses the provider implementation.
  • profile selects credential isolation for auth and SSH keys.

gpu_types

Specify preferred GPU types with optional counts:

{
  "gpu_types": [
    { "type": "RTX 4090" },
    { "type": "A100", "count": 4 }
  ]
}

If omitted, GPU CLI falls back to intelligent selection.

min_vram, max_price, regions, cloud_type

{
  "min_vram": 24,
  "max_price": 1.5,
  "regions": ["US-TX-1", "US-CA-1"],
  "cloud_type": "secure"
}

Use these to constrain fallback GPU selection.

encryption

encryption controls volume encryption-at-rest behavior for the selected storage strategy.

{
  "encryption": true
}
  • On RunPod built-in storage, this uses the provider-native encrypted volume behavior.
  • On providers that support GPU CLI-managed LUKS, this flag enables that path.
  • This setting does not mean every provider uses the same zero-trust storage model, so keep the provider-specific security model in mind.

storage_mode

storage_mode chooses the primary storage strategy for the workspace.

{
  "storage_mode": "built-in"
}

Valid values:

  • built-in — use the provider's default workspace storage
  • network — prefer an attached network volume
  • managed — use provider-managed persistent storage when available

Use this with network_volume_id or volume_mode when you want explicit network-volume behavior.

Sync and Output Control

include

include force-includes gitignored files when syncing to the pod.

{
  "include": ["data/weights.bin", "models/", "*.onnx"]
}

outputs and exclude_outputs

outputs controls what syncs back from the pod. exclude_outputs removes paths from that set.

{
  "outputs": ["outputs/", "checkpoints/", "*.safetensors"],
  "exclude_outputs": ["*.tmp", "*.log"]
}

outputs_enabled, outputs_root, outputs_backfill_seconds

{
  "outputs_enabled": true,
  "outputs_root": "outputs",
  "outputs_backfill_seconds": 3600
}
  • outputs_enabled disables daemon-managed output watching entirely.
  • outputs_root scopes the watch subtree.
  • outputs_backfill_seconds limits how far back the daemon rescans after a restart.

extra_outputs

extra_outputs lets you sync files from absolute paths outside the workspace on the pod.

{
  "extra_outputs": [
    {
      "remote": "/gpu-cli-workspaces/cache/checkpoints",
      "local": "checkpoints/"
    }
  ]
}

vault

Use vault for outputs that should stay encrypted at rest on your local machine instead of syncing into the workspace.

{
  "outputs": ["logs/**"],
  "vault": {
    "patterns": ["checkpoints/**", "generated/**"]
  }
}

Downloads

GPU CLI can pre-stage models, repos, and assets before your command runs.

HuggingFace

{
  "download": [
    {
      "strategy": "hf",
      "source": "black-forest-labs/FLUX.1-dev"
    }
  ]
}

HTTP

{
  "download": [
    {
      "strategy": "http",
      "source": "https://example.com/model.bin",
      "target": "models/model.bin"
    }
  ]
}

Git

{
  "download": [
    {
      "strategy": "git",
      "source": "https://github.com/comfyanonymous/ComfyUI",
      "target": "ComfyUI",
      "tag": "v0.3.7"
    }
  ]
}

Civitai

{
  "download": [
    {
      "strategy": "civitai",
      "source": "4384"
    }
  ]
}

For persistent model storage, prefer ${workspace_base} in targets so the path stays aligned with the provider's mounted workspace.

Pod Runtime Settings

keep_alive_minutes

Cooldown in minutes after the last meaningful activity.

{
  "keep_alive_minutes": 10
}

persistent_proxy, resume_timeout_secs, max_queued_requests, health_check_paths

These settings control wake-on-request behavior for long-lived services.

{
  "persistent_proxy": true,
  "resume_timeout_secs": 180,
  "max_queued_requests": 100,
  "health_check_paths": ["/health", "/ready"]
}
  • persistent_proxy: keep the local proxy listening after the pod stops so incoming requests can resume it.
  • resume_timeout_secs: how long to wait for the pod to come back.
  • max_queued_requests: cap queued requests while the pod is resuming.
  • health_check_paths: paths that should never count as meaningful activity.

docker_image, dockerfile, workspace_size_gb, target_architecture

{
  "docker_image": "ghcr.io/gpu-cli/base:latest",
  "dockerfile": "Dockerfile",
  "workspace_size_gb": 50,
  "target_architecture": "linux/amd64"
}

Use target_architecture to override the provider default when you need ARM builds.

startup

startup sets the default command for gpu run when you do not pass an explicit command. It is also the main entrypoint for many template workflows.

{
  "startup": "python app.py --host 0.0.0.0 --port 8080"
}

inputs

inputs defines the prompts shown when someone runs gpu use <template>.

{
  "inputs": [
    {
      "type": "select",
      "key": "model_variant",
      "label": "Model Variant",
      "options": [
        { "value": "small", "label": "Small" },
        { "value": "large", "label": "Large" }
      ],
      "default": "small",
      "required": true
    },
    {
      "type": "text",
      "key": "system_prompt",
      "label": "System Prompt",
      "placeholder": "You are a helpful assistant",
      "required": false
    },
    {
      "type": "boolean",
      "key": "enable_auth",
      "label": "Enable Auth",
      "default": true
    }
  ]
}
  • Use key names that match the values your template expects to substitute.
  • type controls the prompt UI. Common choices are select, text, number, boolean, model, and secret.
  • Callers can answer interactively or prefill values with gpu use --input key=value.

Ports and Activity Routing

ports supports both simple port numbers and rich port objects.

Simple ports

{
  "ports": [8000, 8080]
}

Rich ports

{
  "ports": [
    {
      "port": 8080,
      "description": "ui",
      "http": {
        "activity_paths": ["/api/chat", "/api/generate"],
        "ignore_paths": ["/health", "/queue"],
        "ignore_methods": ["OPTIONS", "HEAD"]
      },
      "websocket": {
        "data_frames_are_activity": true,
        "ping_pong_is_activity": false
      }
    }
  ]
}

How activity routing works

  • If a port has no HTTP rules, all HTTP requests count as activity.
  • ignore_methods and ignore_paths block known background traffic.
  • If activity_paths is set, only those paths reset the cooldown timer.
  • WebSocket data frames keep a connection warm, but the primary cooldown signal is still HTTP activity.

Use this for apps that poll frequently, such as queues, health checks, chat UIs, or dashboard frontends.

Hooks

Hooks run scripts during pod lifecycle events.

{
  "hooks": {
    "readiness": {
      "type": "command",
      "name": "service-ready",
      "run": ["curl", "-sf", "http://localhost:8000/health"],
      "retry_count": 30,
      "retry_delay_secs": 2,
      "timeout_secs": 10
    }
  }
}

The most common use is a readiness hook that waits for a service to become available before GPU CLI marks the pod ready.

Network Volumes

network_volume_id

Attach a specific network volume by ID or unique friendly name.

{
  "network_volume_id": "vol_abc123xyz"
}

volume_mode

{
  "volume_mode": "global"
}

Options:

  • "global" — use the shared global volume
  • "dedicated" — use a project-specific volume
  • "none" — use ephemeral storage only

dedicated_volume_id and dedicated_volume_name

{
  "volume_mode": "dedicated",
  "dedicated_volume_name": "my-project-models"
}

dedicated_volume_id wins if both are set.

Environment

{
  "environment": {
    "python": {
      "requirements": "requirements.txt"
    },
    "system": {
      "apt": [{ "name": "ffmpeg" }]
    },
    "shell": {
      "steps": [{ "run": "pip install -e ." }]
    }
  }
}

Use this to declare Python packages, system packages, and shell setup steps.

Serverless

The serverless block configures RunPod Serverless deployment. For the workflow and request examples, see Serverless Endpoints.

Main fields

{
  "serverless": {
    "template": "vllm",
    "gpu_type": "NVIDIA A100 80GB PCIe",
    "gpu_types": ["NVIDIA L4"],
    "scaling": {
      "min_workers": 0,
      "max_workers": 3,
      "idle_timeout": 5
    },
    "volume": {
      "name": "my-project-vol",
      "size_gb": 200,
      "mount_path": "/runpod-volume"
    },
    "runpod": {
      "flashboot": true,
      "cached_model": "meta-llama/Llama-3.1-8B-Instruct",
      "env": {
        "MODEL_NAME": "meta-llama/Llama-3.1-8B-Instruct"
      }
    }
  }
}

Current caveats

  • serverless.prewarm exists in the schema but is not wired in the current runtime.
  • Deploy-time --warm and --write-ids are accepted by the CLI but not wired through today.
  • serverless.runpod.ids can exist in config, but GPU CLI does not currently auto-populate it during deploy.

Full Schema

See the complete checked-in schema at gpu-cli.sh/schema/v1/gpu.json.

On this page