Architecture
┌──────────────────────────────┐
│ Orchestrator │
└──────────────┬───────────────┘
│ gRPC
┌──────────────▼─────────────────┐
│ Controller │
│ ┌──────────────────────────┐ │
│ │ Orchestrator Service │ │
│ │ - QueryPhysicalTopology │ │
│ │ - GetValidPlacementsMGD │ │
│ └──────────────────────────┘ │
│ ┌────────────────────────┐ │
│ │ Topology Mapper (CSP) │ │
│ └────────────────────────┘ │
│ ┌─────────────────────────┐ │
│ │ Daemon Service │ │
│ │ - RegisterDaemon │ │
│ │ - HeartbeatStream │ │
│ └────────────┬────────────┘ │
└───────────────┼────────────────┘
│ gRPC
┌─────────────────────┼─────────────────────┐
│ │ │
┌──────────▼─────────┐ ┌────────▼───────────┐ ┌───────▼───────────┐
│ Agent (Host 1) │ │ Agent (Host 2) │ │ Agent (Host N) │
│ │ │ │ │ │
│ Device Discovery │ │ Device Discovery │ │ Device Discovery │
│ (UMD) │ │ (UMD) │ │ (UMD) │
└────────────────────┘ └────────────────────┘ └───────────────────┘
Components
Agent (
tt-fabric-manager-agent): Runs on each host. Uses UMD to discover local Tenstorrent ASICs (unique IDs, board type, arch, memory, PCI address), intra-host ethernet connections, and cross-host exit nodes. Registers topology with the controller and maintains a bidirectional heartbeat stream.Controller (
tt-fabric-manager-controller): Centralized coordinator. Aggregates physical topology from all registered agents, tracks host health via heartbeats, and exposes an orchestrator-facing gRPC API. Uses tt-metalium’s CSP solver to map logical mesh descriptors onto physical ASICs.
Data Flow — Startup & Registration
Agent (Host N) Controller
───────────── ──────────
│ │
│ 1. Discover local ASICs (UMD) │
│◄─────────────────────┐ │
│ │ │
│ 2. RegisterDaemon(HostPhysicalTopology)
│─────────────────────────────────────►│
│ │ 3. Store topology
│ RegisterResponse │ in memory
│◄─────────────────────────────────────│
│ │
│ 4. HeartbeatStream (bidirectional) │
│◄────────────────────────────────────►│
│ - Periodic keepalive │
│ - Health monitoring │
│ │
Data Flow — Mesh Placement Query
Orchestrator Controller Agent(s)
──────────── ────────── ────────
│ │ │
│ GetValidPlacementsMGD │ │
│ (MGD textproto) │ │
│────────────────────────►│ │
│ │ │
│ │ Aggregate topology │
│ │ from registered agents │
│ │ │
│ │ Run CSP mapper │
│ │ (map_mesh_to_physical) │
│ │ │
│ PlacementResponse │ │
│ (host→ASIC assignments)│ │
│◄────────────────────────│ │
│ │ │