# Architecture ``` ┌──────────────────────────────┐ │ Orchestrator │ └──────────────┬───────────────┘ │ gRPC ┌──────────────▼─────────────────┐ │ Controller │ │ ┌──────────────────────────┐ │ │ │ Orchestrator Service │ │ │ │ - QueryPhysicalTopology │ │ │ │ - GetValidPlacementsMGD │ │ │ └──────────────────────────┘ │ │ ┌────────────────────────┐ │ │ │ Topology Mapper (CSP) │ │ │ └────────────────────────┘ │ │ ┌─────────────────────────┐ │ │ │ Daemon Service │ │ │ │ - RegisterDaemon │ │ │ │ - HeartbeatStream │ │ │ └────────────┬────────────┘ │ └───────────────┼────────────────┘ │ gRPC ┌─────────────────────┼─────────────────────┐ │ │ │ ┌──────────▼─────────┐ ┌────────▼───────────┐ ┌───────▼───────────┐ │ Agent (Host 1) │ │ Agent (Host 2) │ │ Agent (Host N) │ │ │ │ │ │ │ │ Device Discovery │ │ Device Discovery │ │ Device Discovery │ │ (UMD) │ │ (UMD) │ │ (UMD) │ └────────────────────┘ └────────────────────┘ └───────────────────┘ ``` ## Components - **Agent** (`tt-fabric-manager-agent`): Runs on each host. Uses UMD to discover local Tenstorrent ASICs (unique IDs, board type, arch, memory, PCI address), intra-host ethernet connections, and cross-host exit nodes. Registers topology with the controller and maintains a bidirectional heartbeat stream. - **Controller** (`tt-fabric-manager-controller`): Centralized coordinator. Aggregates physical topology from all registered agents, tracks host health via heartbeats, and exposes an orchestrator-facing gRPC API. Uses tt-metalium's CSP solver to map logical mesh descriptors onto physical ASICs. ## Data Flow — Startup & Registration ``` Agent (Host N) Controller ───────────── ────────── │ │ │ 1. Discover local ASICs (UMD) │ │◄─────────────────────┐ │ │ │ │ │ 2. RegisterDaemon(HostPhysicalTopology) │─────────────────────────────────────►│ │ │ 3. Store topology │ RegisterResponse │ in memory │◄─────────────────────────────────────│ │ │ │ 4. HeartbeatStream (bidirectional) │ │◄────────────────────────────────────►│ │ - Periodic keepalive │ │ - Health monitoring │ │ │ ``` ## Data Flow — Mesh Placement Query ``` Orchestrator Controller Agent(s) ──────────── ────────── ──────── │ │ │ │ GetValidPlacementsMGD │ │ │ (MGD textproto) │ │ │────────────────────────►│ │ │ │ │ │ │ Aggregate topology │ │ │ from registered agents │ │ │ │ │ │ Run CSP mapper │ │ │ (map_mesh_to_physical) │ │ │ │ │ PlacementResponse │ │ │ (host→ASIC assignments)│ │ │◄────────────────────────│ │ │ │ │ ```