Concepts#
Navarch manages GPU fleets through a few core abstractions. This section explains how they work together.
-
Components
Control plane and node agent architecture.
-
Pools & Providers
Organizing nodes by workload and cloud provider.
-
Health Monitoring
Health checks, status types, and failure detection.
-
Node Lifecycle
Instance provisioning, node states, and transitions.
-
Autoscaling
Scaling strategies, limits, and cooldown behavior.