Getting started#
This guide walks you through setting up Navarch for local development and testing.
Prerequisites#
- Go 1.21 or later.
- Git for cloning the repository.
Installation#
Clone and build Navarch:
This builds the following binaries in bin/:
bin/control-plane- The central management server.bin/navarch- The command-line interface.bin/node- The node agent (runs on GPU instances).bin/simulator- A testing tool for simulating GPU fleets.
Quick start with fake provider#
The fake provider simulates GPU instances without cloud costs. Use it for local development and testing.
Step 1: Create a configuration file#
Create config.yaml:
server:
address: ":50051"
autoscale_interval: 10s
providers:
fake:
type: fake
gpu_count: 8
pools:
dev:
provider: fake
instance_type: gpu_8x_h100
region: local
min_nodes: 2
max_nodes: 5
cooldown: 10s
autoscaling:
type: reactive
scale_up_at: 80
scale_down_at: 20
health:
unhealthy_after: 2
auto_replace: true
Step 2: Start the control plane#
The control plane starts and provisions two fake nodes (the min_nodes value).
Step 3: List nodes#
In a new terminal:
Output:
┌───────────┬──────────┬────────┬──────┬───────────────┬────────┬─────────┬────────────────┬──────┐
│ Node ID │ Provider │ Region │ Zone │ Instance Type │ Status │ Health │ Last Heartbeat │ GPUs │
├───────────┼──────────┼────────┼──────┼───────────────┼────────┼─────────┼────────────────┼──────┤
│ fake-1 │ fake │ local │ │ gpu_8x_h100 │ Active │ Healthy │ 5s ago │ 8 │
│ fake-2 │ fake │ local │ │ gpu_8x_h100 │ Active │ Healthy │ 5s ago │ 8 │
└───────────┴──────────┴────────┴──────┴───────────────┴────────┴─────────┴────────────────┴──────┘
Step 4: Manage nodes#
To cordon a node (prevent new workloads):
To drain a node (evict workloads and cordon):
To view node details:
Next steps#
To connect real cloud providers, see the configuration reference.
To learn about autoscaling strategies, see pool management.
To deploy in production, see the deployment guide.
Using the simulator#
The simulator tests Navarch behavior without running the full system. It uses scenario files to define fleet configurations and events.
Run a scenario#
Interactive mode#
Run the simulator in interactive mode to test CLI commands:
Then use the CLI in another terminal:
For more information, see simulator documentation.
Troubleshooting#
Connection refused#
If navarch list returns "connection refused":
- Verify the control plane is running.
- Check the address matches (default is
http://localhost:50051). - Use the
-sflag to specify a different address.
No nodes appear#
If nodes do not appear after starting the control plane:
- Check the control plane logs for errors.
- Verify the pool configuration has
min_nodesgreater than zero. - Confirm the provider is configured correctly.
Build errors#
If the build fails:
- Verify Go 1.21 or later is installed:
go version - Run
go mod downloadto fetch dependencies. - Check for missing dependencies if building with GPU support.