Researchers from MIT want to speed up the process of monitoring, diagnosing, and fixing problems with multi-billion-dollar supercomputers by visualizing the hardware in the Unity 3D Game engine, similar to the one found in some game titles. Supercomputers are an extremely complex collection of hardware, as they operate with thousands of interconnected systems. However, there might be bottlenecks in the system, and diagnosing those bottlenecks can cost supercomputing facilities time and reduced performance.
The average supercomputer has plenty of components in the system. Each part of the system is called a node, and each node contains a specific set of hardware components. As a vastly oversimplified and basic explanation, some nodes are designed for storing data while other nodes are for computing. These compute nodes typically contain processors and main system memory.
Engineers continuously test the machine during the installation process, encountering problems along the way. For example, there could be storage, processor, and even networking problems in the system, and diagnosing the root cause can be difficult with such large-scale systems. For example, the upcoming Frontier supercomputer should have around 100 racks containing 10s of nodes each, resulting in thousands of nodes to diagnose and monitor.