Back to Blog
announcementsgpuagentstraining

GPU CLI Pro is free now, and we added three more clouds

James Lal
James Lal
GPU CLI Pro is free now, and we added three more clouds

I train LoRAs almost every day, and I haven't started one of those jobs by hand in weeks. Agents drive them. Today GPU CLI Pro is free for everyone, commercial use included, and we added three new GPU providers.

I train LoRAs nearly every day now for the models behind our other products. The jobs run for hours, but I haven't started one of them by hand in weeks, because I can drive them through Claude, Codex, and Opencode using GPU CLI. Because we built this with agents in mind, all an agent has to do is pick a GPU, launch the run, watch it, and tear it down when it's done, meaning users can manage results instead of keystrokes.

Built to be driven agentically

A coding agent is good at exactly what a long training run needs. Launch a job, watch for it to finish, then shut it down when it's done. The trouble is most tools assume a person is sitting there: interactive prompts, output meant for eyes instead of parsers, state you can only see by looking.

GPU CLI doesn't assume that. Every command runs without a TTY. Your workspace syncs up to the GPU automatically, and anything a run needs along the way, an env var or an API key, comes with it when you pass --env.

Take a LoRA I'm using right now as an example; it's a fine-tuned Gemma model trained with Unsloth on a Vast.ai RTX PRO 6000, 96 GB of VRAM.

# set up the training env on the pod (one time)
gpu run --no-output python unsloth_env.py install

# launch the run, syncing your workspace up to the GPU
gpu run --no-output --force-sync \
  -- python unsloth_env.py run -- train_lora_unsloth.py \
  --model-id unsloth/gemma-4-E4B-it \
  --max-steps 480 --lora-r 16 --learning-rate 2.1e-4

# wait on it for a few hours, then react when it lands
until gpu logs --tail 40 | grep -qE "train_runtime|OutOfMemoryError"; do
  sleep 90
done
gpu logs --tail 40 | grep -iE "train_runtime|train_loss|eval_loss"

# tear the instance down when the run is done
gpu instances terminate <instance-id>

What did creating this involve? Honestly, not much. I just asked the agent to train the model and with GPU CLI the agent could handle the rest. The agent fired off the run, monitored the logs for a few hours, caught the run landing (or an out-of-memory if the GPU was too small), and shut the instance down after.

A real run on a Vast.ai A5000: launch it, watch it land, tear it down.

Using this approach has drastically reduced the tedium of training models as a six-hour run costs the same amount of attention as a six-minute one. None.

GPU availability is a real problem

I didn't pick that GPU because it was ideal. I picked it because it was the one with capacity that morning. If you've tried to get a GPU lately you know the rest. The card you want is out of stock, in the region you want it, at the price you want to pay.

The fix isn't a better single provider. It's more of them. We added three:

  • Vast.ai — a marketplace with 70+ GPU types and spot pricing across 20+ regions
  • Thunder Compute — A6000, A100, and H100 capacity
  • io.net — 100+ GPU types on a pre-paid model built for long-running jobs

Together with RunPod, that's four clouds and hundreds of GPU types to pull from. Just pass GPU CLI a list of GPUs you'll accept and it works down it, so when your first choice is out of stock it grabs a compatible one instead of failing. More places to ask means your job gets a machine instead of a queue.

Live GPU stock and pricing across RunPod and Vast.ai.

GPU CLI Pro is now free!

The commercial license, unlimited concurrent sessions, detached jobs without upgrade nags. All of it is free for everyone now, commercial use included.

That said, what are you waiting for? Try it and let us know what you think!

curl -fsSL https://gpu-cli.sh/install.sh | sh